Platform · AI Funnel

AI Funnel — describe your edge AI flow in TOML.

AI Funnel is the MOS4 declarative edge-AI engine. The customer ships a TOML graph plus a model and dataset; Cloud Connect retrains, optimises, validates against reference hardware, then delivers the signed bundle over-the-air (OTA). The on-device runtime dispatches the processing graph: camera → GPU (Graphics Processing Unit) → NPU (Neural Processing Unit) with zero CPU pixel copies.

AI · intelligence layer
~100 TOPS AI-class silicon up to ~100 TOPS (tera operations per second) on the AI-class tier
5 camera inputs MIPI-CSI, GMSL2, UVC, RTSP, WebRTC
0 CPU pixel reads entire hot path — camera to NPU
Three-stage AI funnel pipeline on dark backdrop. Stage 1 customer-provides — TOML scroll, ONNX/TFLite chip, dataset cube. Stage 2 Munic cloud — retrain, quantise, validate. Stage 3 on-device runtime — ECU with NPU and GPU sharing memory.

Deployment lifecycle

From TOML to first labelled frame.

Three phases. Build runs in Cloud Connect: triage retrain, NPU-only optimization, hardware-in-the-loop validation. Provision runs on device when the bundle lands: the processing graph is dispatched, models loaded, GPU and NPU configured. Run executes per frame: triage detects, the AI funnel routes, the customer container terminates the flow.

1 — Build · Cloud Connect

Customer ships, Cloud Connect validates.

One TOML graph, one model, one labelled dataset. Cloud Connect — the managed cloud service that bridges the fleet to MOS4 micro services — retrains the unified triage detector across customer funnels. It enforces NPU-only operations, optimizes for low-precision inference on the target NPU, and gates on reference hardware. Output is a signed deployment bundle. The model registry is Munic-hosted or customer-hosted.

AI Funnel is part of the no-code platform. Cloud Connect detail: learn more · get started.

Build phase of the AI funnel deployment lifecycle, inside Cloud Connect. The customer container — funnel.toml, models, COCO dataset, and container code — is pushed to a registry that is either Munic-hosted or customer-hosted. From the registry, the artefacts feed the triage trainer, which merges multiple customer funnels into one unified triage detector. The result feeds the NPU optimizer, which enforces NPU-only operations and low-precision inference. The optimized model feeds the hardware-in-the-loop validator, which runs the candidate against reference target hardware and gates on accuracy and latency. A bundle that passes is signed and emitted as the deployment bundle.

flowchart LR
  CB["Customer container<br/>funnel.toml + models<br/>+ COCO dataset + container code"]
  REG[("Registry<br/>Munic-hosted or<br/>customer-hosted")]
  TT["Triage trainer<br/>multiple funnels → one triage"]
  OPT["NPU optimizer<br/>NPU-only ops · low-precision inference"]
  HIL["HW-in-the-loop validator<br/>reference target hardware<br/>accuracy + latency gate"]
  BUNDLE["Signed deployment bundle"]
  CB --> REG --> TT --> OPT --> HIL --> BUNDLE
  class TT,OPT,HIL ai-node
Build phase. AI-class steps are amber: the triage trainer (multiple funnels merged into one detector), the NPU optimizer (NPU-only ops, low-precision inference), and the hardware-in-the-loop validator that gates on accuracy and latency.

A bundle that fails the hardware-in-the-loop accuracy or latency gate is rejected back to the customer; only signed, gated bundles reach the OTA channel.

2 — Provision · Device

DAG dispatched. Pipeline armed.

When the bundle lands on the device, the processing graph compiled from funnel.toml is dispatched: camera and sensor routing configured, models loaded into the NPU, the GPU shader configured for tensor format, letterboxing, and region-of-interest (ROI) extraction. Model updates ride the same OTA channel as code, with fleet rollback, staged rollout, and version pinning.

Provision phase of the AI funnel deployment lifecycle, on the device. The signed bundle is pulled through the same OTA channel as code updates. The DAG compiled from funnel.toml is dispatched, and three configurations fan out in parallel: camera and sensor routing, the AI runtime loading the models into the NPU, and the GPU ROI shader configuring tensor format, letterboxing, and ROI extraction. All three converge at a ready signal — the device is armed for the first frame.

flowchart LR
  PULL["Bundle pulled<br/>same OTA channel as code"]
  DAG["DAG dispatched<br/>compiled from funnel.toml"]
  CFG_CAM["Camera / sensor<br/>routing configured"]
  CFG_NPU["AI runtime<br/>models loaded into NPU"]
  CFG_GPU["GPU ROI shader<br/>tensor format · letterbox · ROI"]
  READY["Ready · first frame"]
  PULL --> DAG
  DAG --> CFG_CAM --> READY
  DAG --> CFG_NPU --> READY
  DAG --> CFG_GPU --> READY
  class CFG_NPU ai-node
Provision phase. The bundle pull and DAG dispatch are system steps; mos-ai-runtime loading the model into the NPU is the AI-class step (amber).

3 — Run · Per frame

Triage routes; the DAG terminates.

Each frame goes camera → GPU triage tensor → NPU triage. The AI funnel inspects the labels and routes per detection: the GPU re-crops the region of interest (ROI), the NPU runs the downstream model, the result either loops back to the router for the next stage or exits to the customer container — tracking, map, cloud message, CAN (Controller Area Network) — depending on the processing graph terminal node.

Run phase of the AI funnel deployment lifecycle, per frame on the device. The camera or sensor produces a shared GPU frame. The GPU produces a triage tensor. The NPU triage model emits labels and bounding boxes to the AI funnel DAG router. The router dispatches per detection: the GPU re-crops the region of interest with letterboxing and normalization, the NPU runs the downstream model and returns labels and confidence. The router either loops back to chain another model in the DAG or exits to the customer container, where the workload terminates as tracking, mapping, a cloud message, or a CAN message — the terminal node depends on the DAG.

flowchart LR
  SRC["Camera / sensor<br/>shared GPU frame"]
  GPU1["GPU<br/>triage tensor"]
  NPU1["NPU triage<br/>labels + bboxes"]
  AIF["AI funnel<br/>DAG router"]
  GPU2["GPU<br/>per-detection ROI<br/>letterbox · normalize"]
  NPU2["NPU model<br/>labels + confidence"]
  SINK["Customer container<br/>tracking · map · cloud message · CAN"]
  SRC --> GPU1 --> NPU1 --> AIF
  AIF --> GPU2 --> NPU2 --> AIF
  AIF --> SINK
  class NPU1,NPU2 ai-node
Run phase. NPU inference steps are amber. The router-to-model-back-to-router edges show the DAG loopback; the final edge to the customer container is the DAG terminal.

For the camera-side input matrix (MIPI-CSI, GMSL2, UVC, RTSP, WebRTC) see Vision capabilities.

Observability

14 built-in metrics across the pipeline.

Every pipeline stage emits Prometheus metrics — calls, errors, duration histograms, watchdog heartbeat. 14 metrics across the GPU region-of-interest (ROI) shader and AI runtime, no per-service instrumentation cost. See the full operations catalog →

Test doubles

Develop without hardware.

Provider-published test doubles for CI and off-target development
Fake component Test scope CI-runnable
AI runtime (test double) Model inference stub — no NPU needed Yes — no NPU hardware
GPU ROI shader (test double) Region-of-interest extraction stub — no GPU needed Yes — no GPU required
Frame source / sink (test double) Frame bus publish/subscribe Yes — pure host
File-based camera source Camera input replay from file Yes — file-based

Build and validate the complete pipeline on any laptop. See the SDK and developer workflow →

FAQ

Frequently asked questions

  • Do pixel bytes ever cross the CPU in the hot path?

    No, by design. Camera, GPU, and NPU exchange the same shared-memory frame via handle passing — no buffer copies at any step.

  • Which model formats does the AI runtime accept?

    TFLite (TensorFlow Lite) directly. ONNX (Open Neural Network Exchange) is auto-converted in the pipeline so teams using ONNX export paths can supply either format.

  • Can I run multiple models concurrently?

    Yes. The AI runtime loads multiple models simultaneously. Same-model re-entry is prevented by construction. Serialisation mode (per-model or global single-lane) is configurable without recompilation.

  • Can I develop without NPU hardware?

    Yes. Provider-published test doubles (AI runtime, GPU ROI shader, frame source/sink, file replay) let you develop and integration-test the full pipeline without NPU hardware, a camera sensor, or a build-system dependency.

Architecture FAQ

Implementation details

  • How does the zero-copy hand-off work technically?

    Camera frames cross the process boundary via handle passing. The GPU ROI shader imports them and runs crop, resize, and normalize compute shaders entirely on the GPU. The output tensor handle is handed to the AI runtime, which drives the silicon-vendor NPU delegate.

  • Is GPU-to-NPU shared memory available on the MIPI-CSI AI-class variant?

    No. GPU-to-NPU shared memory is available only on the serialiser-bridge AI-class variant. The MIPI-CSI variant uses GPU ROI extraction and hands the tensor to the NPU as a standard buffer. Both variants share the same Vulkan ROI pipeline.

Text-AI platform

AI Language — the text-intelligence companion.

AI Funnel is the visual-intelligence build engine. The platform companion for text is AI Language — an on-device large language model (LLM), a grounded retrieval pipeline (RAG), four-layer prompt-injection defence, and multilingual voice, all offline by default.

Both platforms run on the same device, sharing the MOS4 EventBus and OTA channel. AI Funnel runs the visual pipeline; AI Language runs the language pipeline. Together they form the MOS4 AI Software Suite runtime layer.

See AI Language platform →

Explore further

Related capabilities

AI-class hardware

Form factors, silicon tiers, and connectivity options for AI Funnel deployment.

See hardware →

No-code platform

AI Funnel is one of four declarative engines. Combine with multi stacks, signal processing, and event processing — all configured in TOML or YAML, no code required.

See the no-code platform →

SDK and developer tools

Hot-swap micro services, CLI tooling, and a test-double library for off-target development.

See the SDK →

Bring the model and dataset.

Bring the inference task; engineering will walk through the capture-to-NPU path on a target device.

Building on MOS4?

One reply from engineering, ~24h. No deck, no NDA.

Talk to engineering