Platform · AI Funnel

Describe your AI flow in TOML.

One TOML graph, one model, one OTA. Cloud Connect retrains, optimizes, and validates against reference hardware; the device dispatches the DAG and runs camera → GPU → NPU with zero pixel-copy.

AI · intelligence layer
~100 TOPS AI-class silicon QCS family, binary-compatible with QCS6490
5 camera inputs MIPI-CSI, GMSL2, UVC, RTSP, WebRTC
0 CPU pixel reads entire hot path — camera to NPU
Four-stage funnel narrowing left to right: ingest, preprocess, infer, publish. Amber accent on the inference diamond.

Deployment lifecycle

From TOML to first labelled frame.

Three phases. Build runs in Cloud Connect: triage retrain, NPU-only optimization, hardware-in-the-loop validation. Provision runs on device when the bundle lands: DAG dispatched, models loaded, GPU and NPU configured. Run executes per frame: triage detects, ai-funnel routes, the customer container terminates the flow.

1 — Build · Cloud Connect

Customer ships, Cloud Connect validates.

One TOML graph plus the model and a COCO dataset. Cloud Connect retrains the unified triage detector across all customer funnels, enforces NPU-only operations, quantizes against the target NPU, and gates the result on reference hardware. Output is a signed deployment bundle. The registry is Munic-hosted or customer-hosted.

Build phase of the AI funnel deployment lifecycle, inside Cloud Connect. The customer container — funnel.toml, models, COCO dataset, and container code — is pushed to a registry that is either Munic-hosted or customer-hosted. From the registry, the artefacts feed the triage trainer, which merges multiple customer funnels into one unified triage detector. The result feeds the NPU optimizer, which enforces NPU-only operations and INT8 quantization. The optimized model feeds the hardware-in-the-loop validator, which runs the candidate against reference target hardware and gates on accuracy and latency. A bundle that passes is signed and emitted as the deployment bundle.

flowchart LR
  CB["Customer container<br/>funnel.toml + models<br/>+ COCO dataset + container code"]
  REG[("Registry<br/>Munic-hosted or<br/>customer-hosted")]
  TT["Triage trainer<br/>multiple funnels → one triage"]
  OPT["NPU optimizer<br/>NPU-only ops · INT8 quant"]
  HIL["HW-in-the-loop validator<br/>reference target hardware<br/>accuracy + latency gate"]
  BUNDLE["Signed deployment bundle"]
  CB --> REG --> TT --> OPT --> HIL --> BUNDLE
  class TT,OPT,HIL ai-node
Build phase. AI-class steps are amber: the triage trainer (multiple funnels merged into one detector), the NPU optimizer (NPU-only ops, INT8 quantization), and the hardware-in-the-loop validator that gates on accuracy and latency.

A bundle that fails the hardware-in-the-loop accuracy or latency gate is rejected back to the customer; only signed, gated bundles reach the OTA channel.

2 — Provision · Device

DAG dispatched. Pipeline armed.

When the bundle lands on the device, the DAG compiled from funnel.toml is dispatched: camera and sensor routing configured, models loaded into the NPU, the GPU shader configured for tensor format, letterboxing, and ROI extraction. Model updates ride the same OTA channel as code, with fleet rollback, staged rollout, and version pinning.

Provision phase of the AI funnel deployment lifecycle, on the device. The signed bundle is pulled through the same OTA channel as code updates. The DAG compiled from funnel.toml is dispatched, and three configurations fan out in parallel: camera and sensor routing, the AI runtime loading the models into the NPU, and the GPU ROI shader configuring tensor format, letterboxing, and ROI extraction. All three converge at a ready signal — the device is armed for the first frame.

flowchart LR
  PULL["Bundle pulled<br/>same OTA channel as code"]
  DAG["DAG dispatched<br/>compiled from funnel.toml"]
  CFG_CAM["Camera / sensor<br/>routing configured"]
  CFG_NPU["AI runtime<br/>models loaded into NPU"]
  CFG_GPU["GPU ROI shader<br/>tensor format · letterbox · ROI"]
  READY["Ready · first frame"]
  PULL --> DAG
  DAG --> CFG_CAM --> READY
  DAG --> CFG_NPU --> READY
  DAG --> CFG_GPU --> READY
  class CFG_NPU ai-node
Provision phase. The bundle pull and DAG dispatch are system steps; mos-ai-runtime loading the model into the NPU is the AI-class step (amber).

3 — Run · Per frame

Triage routes; the DAG terminates.

Each frame goes camera → GPU triage tensor → NPU triage. ai-funnel inspects the labels and routes per detection: the GPU re-crops the ROI, the NPU runs the downstream model, the result either loops back to the router for the next stage of the DAG or exits to the customer container — tracking, map, cloud message, CAN — depending on the DAG terminal node.

Run phase of the AI funnel deployment lifecycle, per frame on the device. The camera or sensor produces a shared GPU frame. The GPU produces a triage tensor. The NPU triage model emits labels and bounding boxes to the AI funnel DAG router. The router dispatches per detection: the GPU re-crops the region of interest with letterboxing and normalization, the NPU runs the downstream model and returns labels and confidence. The router either loops back to chain another model in the DAG or exits to the customer container, where the workload terminates as tracking, mapping, a cloud message, or a CAN message — the terminal node depends on the DAG.

flowchart LR
  SRC["Camera / sensor<br/>shared GPU frame"]
  GPU1["GPU<br/>triage tensor"]
  NPU1["NPU triage<br/>labels + bboxes"]
  AIF["AI funnel<br/>DAG router"]
  GPU2["GPU<br/>per-detection ROI<br/>letterbox · normalize"]
  NPU2["NPU model<br/>labels + confidence"]
  SINK["Customer container<br/>tracking · map · cloud message · CAN"]
  SRC --> GPU1 --> NPU1 --> AIF
  AIF --> GPU2 --> NPU2 --> AIF
  AIF --> SINK
  class NPU1,NPU2 ai-node
Run phase. NPU inference steps are amber. The router-to-model-back-to-router edges show the DAG loopback; the final edge to the customer container is the DAG terminal.

Shared-memory pipeline

Camera → GPU → NPU. No copies.

The frame handle travels between camera, GPU, and NPU. The pixel data stays in place. No CPU pixel reads in the hot path.

Shared-memory pipeline sequence. The camera publishes a frame in shared GPU memory and passes the handle across process boundaries. The frame transport relays the handle to the GPU ROI shader, which imports the frame and runs crop, resize, and normalize compute shaders entirely on the GPU. The output tensor handle is handed to the AI runtime, which drives the silicon-vendor NPU delegate; on Qualcomm, GPU-to-NPU shared memory (rpcmem/ION) is reused so the NPU does not re-import. The AI runtime returns selection metadata — bounding boxes and model IDs, tens of bytes, no pixels — back to the GPU ROI shader.

sequenceDiagram
  participant Cam as Camera
  participant Frm as Frame transport
  participant Roi as GPU ROI shader
  participant Ai as AI runtime
  Cam->>Frm: shared GPU frame
  Frm-->>Roi: frame handle
  Note over Roi: GPU import · crop · resize · normalize
  Roi-->>Ai: tensor handle (shared GPU-NPU memory on Qualcomm)
  Note over Ai: NPU inference via vendor delegate
  Ai-->>Roi: SelectRois(bbox + model_id) — tens of bytes, no pixels
No CPU pixel reads in the hot path. The frame handle moves; the pixel data stays in place.

01 — Capture

Camera capture

Five inputs behind one service API. Produces a shared GPU frame regardless of backend. Same entry contract across MIPI-CSI, GMSL2, USB UVC, RTSP, and WebRTC.

02 — GPU crop and resize

GPU ROI shader

Crop, resize, and normalise run entirely on the GPU. A CPU pixel read is a design bug — the rule, not a target. Portable across iMX8M Plus and Qualcomm; Qualcomm additionally uses GPU-to-NPU shared memory.

03 — Inference

AI runtime

Receives the tensor handle. Drives the silicon-vendor NPU delegate on each target. ONNX auto-conversion is available so teams using ONNX export paths can supply either format.

Frame transport

One frame, many consumers — each at its own rate.

One publisher, many subscribers, independent cadences. The same bus carries video frames and inference tensors with identical zero-copy semantics. A slow consumer never stalls the producer.

60 Hz

Video frames

Camera publishes at full frame rate.

30 Hz

GPU crop

The GPU ROI shader consumes at the inference rate without blocking the camera.

10 Hz

Pose tracking · other consumers

Any Python, C++, Rust, Go, or Lua container can pull frames from the same bus through the MQTT bridge — no SDK adoption.

One-to-many fanout. The camera publishes a single shared GPU frame stream. The same bus carries it to three subscribers running on independent cadences — a 60 Hz video consumer (dashcam encoder), a 30 Hz GPU crop consumer (the ROI shader), and a 10 Hz pose tracker. The ROI shader feeds its tensor output to the AI runtime for NPU inference. A slow consumer never stalls the producer.

flowchart TD
  P[Camera<br/>shared GPU frame publisher]
  P --> S60[60 Hz video<br/>dashcam encoder]
  P --> S30[30 Hz GPU crop<br/>ROI shader]
  P --> S10[10 Hz pose tracker]
  S30 --> NPU[AI runtime<br/>NPU inference]
  class NPU ai-node
  class S30 ai-node
One publisher, N subscribers, each on its own cadence. A slow consumer cannot stall the producer.

AI-class audiences

From drones to vehicles.

Quad-rotor drone seen from below — four down-facing cameras each showing a tiled ground patch. Amber on the central tile.

AI-class silicon targets any platform where a camera feeds an NPU — aerial drones, ADAS cameras, DMS units, and autonomous vehicles. The same zero-copy pipeline applies across all.

Camera inputs

Five inputs. One shared-frame contract.

Camera inputs and silicon backends
Input Silicon target Backend Entry contract
MIPI-CSI iMX8M Plus libcamera / V4L2 shared GPU frame
GMSL2 Qualcomm QCS only V4L2 subdev + QMMF shared GPU frame
USB UVC All targets V4L2 shared GPU frame
RTSP / ONVIF All targets GStreamer shared GPU frame
WebRTC All targets GStreamer shared GPU frame

GMSL2 is a Qualcomm-only path today. iMX8M Plus uses MIPI-CSI direct.

GDPR anonymisation

Active from the first frame.

Face and plate blur runs as a shader stage before any frame leaves the pipeline. The policy is set at boot time and is active from the first frame on the device.

What is anonymised

  • · Faces — bounding-box region blur before frame hand-off
  • · Licence plates — same shader stage, same shared-memory path

Deployment constraint

The policy is set at boot time. A runtime hot-toggle without reboot is not available today. Enable once per device configuration; the frame plane enforces it continuously.

Observability

14 built-in metrics across the pipeline.

GPU ROI shader

  • · calls_total
  • · errors_total
  • · rois_extracted_total
  • · frame_fetch_failures_total
  • · frame_fetch_timeouts_total
  • · duration_seconds (histogram)

AI runtime

  • · inference_requests_total
  • · inference_errors_total
  • · model_load_duration_seconds
  • · inference_duration_seconds (histogram)
  • · active_models (gauge)
  • · watchdog_heartbeat_budget_seconds
  • · plus error dedup

All at zero per-component author cost — emitted by the framework automatically.

Test doubles

Develop without hardware.

Provider-published fakes for CI and off-target development
Fake component Test scope CI-runnable
AI runtime fake Model inference stub Yes — no NPU hardware
GPU ROI shader fake GPU ROI extraction stub Yes — no GPU required
Mock frame source / sink Frame bus pub/sub Yes — pure host
Sim file source Camera input replay Yes — file-based

FAQ

Frequently asked questions

  • Do pixel bytes ever cross the CPU in the hot path?

    No, by design. Camera, GPU, and NPU exchange the same shared-memory frame via handle passing — no buffer copies at any step.

  • Which model formats does the AI runtime accept?

    TFLite directly. ONNX is auto-converted in the pipeline so teams using ONNX export paths can supply either format.

  • Can I run multiple models concurrently?

    Yes. The AI runtime loads multiple models simultaneously. Same-model re-entry is prevented by construction. Serialisation mode (per-model or global single-lane) is configurable without recompilation.

  • How does GDPR live anonymisation work?

    Face and plate blur runs as a shader stage before any frame leaves the pipeline. The policy is set at boot time — active from the first frame. Live hot-toggle without reboot is not available today.

  • Can I develop without NPU hardware?

    Yes. Provider-published fakes (AI runtime, GPU ROI shader, frame source/sink, file replay) let you develop and integration-test the full pipeline without NPU hardware, a camera sensor, or Bazel.

Architecture FAQ

Implementation details

  • How does the zero-copy hand-off work technically?

    NV12 dmabufs from the camera component cross the process boundary via SCM_RIGHTS fd passing. The GPU ROI shader imports them into Vulkan via VK_EXT_external_memory_dma_buf and runs WGSL crop, resize, and normalize compute shaders entirely on the GPU. The output tensor dmabuf handle is handed to the AI runtime, which drives the TFLite C API with the silicon-vendor delegate.

  • Is GPU-to-NPU shared memory available on iMX8M Plus?

    No. GPU-to-NPU shared memory via rpcmem/ION (QnnMem_register) is Qualcomm-specific. iMX8M Plus uses GPU ROI extraction without rpcmem/ION. Both targets share the same Vulkan ROI pipeline.

Bring your model and dataset.

Show us the inference task; engineering will walk through the capture-to-NPU path on a target device.