A modular, containerized edge computer vision system for industrial quality assurance. Ingests multi‑camera feeds, runs on‑device inference, writes results to plant systems (Ignition, MQTT, OPC UA), and streams events and metrics to storage for monitoring, retraining, and audit.
Goal: repeatable inspection that scales from one line to many, with traceability, safe rollouts, and operator‑friendly outputs.
What you’re seeing:
- Bounding boxes actively sweep, dart, and then settle as the detector interrogates candidate regions on the part. The main red box converges near the area of interest while smaller probes appear briefly, scan across the frame, and disappear as hypotheses are ruled out.
- The top controls show live telemetry relevant to operators:
- p95 latency (ms): a rolling end‑to‑end measure of capture → preprocess → inference → publish; stays within the target envelope during the sequence.
- Pre / Infer (ms): average preprocessing and model inference times, indicating where compute is spent.
- FPS and Drops: actual capture rate and any dropped frames; stable during steady state.
- Results / MQTT / OPC UA: cumulative publishes to plant systems; the “msg/s” dial on the telemetry card reflects the current outbound publish rate.
- Threshold: the decision threshold you can adjust to tune sensitivity vs precision in real time.
The telemetry card to the right visualizes recent performance and signal mix:
- Latency trace: a compact sparkline of recent p95 values for quick trend recognition.
- Class mix bars: distribution of detection classes observed in the last window.
- Messages per second: current publish rate to plant systems.
- Why
- Outcomes and SLOs
- Architecture
- Components
- Data Contracts
- Repository Layout
- Quickstart
- Configuration
- Deployment
- Observability
- Provenance and Compliance
- Testing
- Performance Tuning
- Security
- Roadmap
- Related Work
- License
Factories need inspection that is reliable, traceable, and easy to roll out. EdgeSight QA packages capture, preprocessing, inference, and results publication into small services that can be deployed on a Jetson or GPU industrial PC, tied into Ignition or other SCADA stacks, and monitored centrally.
- Operators see defect overlays or alarms in ≤ 200 ms end‑to‑end at p95
- 99.9% system uptime per line (target)
- False alarm rate within agreed budget, tracked over time
- Every decision is traceable to model version, data snapshot, and configuration
- Updates are health‑gated and automatically rolled back on error budget breach
- Capture: time-synced frames with IDs and drop counters
- Preprocess: deterministic pipeline (resize/normalize/mask)
- Inference: ONNX/TensorRT; versioned weights; warm start
- Results: publish detections to MQTT/OPC UA; emit Prom metrics
- GitOps: KServe serve + Argo canaries; OPA policies gate unsafe changes
[ PoE Cameras ] → [ Capture ] → [ Preprocess ] → [ Inference ] → [ Results Adapter ]
↘
[ Events, Metrics, Frames ] → [ Storage / Cloud ]
↘ ↗
[ Operator UI ] ← [ Ignition / Alarms ] ← [ MQTT / OPC UA ]
- Edge: fast decisions, overlays, alarms, store‑and‑forward
- Cloud or on‑prem object store: events, sample frames, metrics, retraining
Works online or with intermittent connectivity. All outputs are idempotent.
- Enterprise: reduce scrap, improve first‑pass yield, safety and compliance
- Information: events, detections, lineage metadata, retention policy
- Computational: four services, stable interfaces, explicit contracts
- Engineering: pods, probes, queues, network zones, air‑gap sync
- Technology: Jetson or GPU IPC, GigE/PoE cameras, ONNX Runtime or TensorRT, MQTT or OPC UA, Kubernetes or OpenShift
-
Capture
- GigE, RTSP, or file source
- Time sync, frame IDs, dropped‑frame counters
-
Preprocess
- Resize, crop, normalize, color convert, masks
- YAML‑driven pipeline, deterministic
-
Inference
- ONNX Runtime or TensorRT on Jetson or GPU IPC
- Model registry integration, warm start, batching if needed
-
Results Adapter
- Publishes detections to Ignition via MQTT or OPC UA
- Renders overlays for Operator UI
- Persists events and frame snippets, emits Prometheus metrics
-
Operator UI
- Live feed, bounding boxes, adjustable threshold, recent detections
- Role based controls for acknowledge, tag, comment
{
"event_id": "a2b6ae8e-7b7a-4f6f-9a2a-1bc9d6d7f1ee",
"ts_utc": "2025-09-06T12:34:56.789Z",
"line_id": "line-3",
"camera_id": "cam-A",
"model": {"name": "defect-v7", "version": "7.2.1", "threshold": 0.62, "hash": "sha256:..."},
"detections": [
{"label": "scratch", "confidence": 0.83, "bbox": [412, 156, 64, 48]}
],
"latency_ms": 142,
"frame_ref": "frames/2025-09-06/line-3/cam-A/123456789.jpg",
"provenance": {
"container_digests": {"inference": "sha256:..."},
"config_version": "preproc-1.3.0",
"data_snapshot": "s3://bucket/datasets/steel-plates/v5/manifest.json"
}
}factory/{site}/{line}/qa/events/{event_id}
factory/{site}/{line}/qa/alerts
factory/{site}/{line}/qa/metrics/*
ns=2;s=Factory.Lines.Line3.QA.LastEvent
ns=2;s=Factory.Lines.Line3.QA.AlertActive
ns=2;s=Factory.Lines.Line3.QA.Threshold
services/
capture/
preprocess/
inference/
results_adapter/
governance_exporter/
ui/operator/
deploy/
compose/docker-compose.yml
k8s/base/*.yaml
helm/edgesight-qa/*
openshift/*.yaml
terraform/*
assets/
scripts/
docs/
- Docker and Docker Compose
- Optional: NVIDIA Container Toolkit if using a GPU
- Git, Python 3.10+, Node 18+ for local builds
git clone https://github.com/your-org/edgesight-qa.git
cd edgesight-qa
./scripts/fetch_demo_model.sh assets # Windows: run via WSL or manually download to assets/config/cameras.yaml
site: sample-plant
line: line-1
cameras:
- id: cam-A
source: file:../data/sample.mp4
fps: 20
resolution: [1280, 720]
time_sync: ntp.pool.orgconfig/preprocess.yaml
pipeline:
- op: resize
width: 640
height: 360
- op: normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
- op: to_rgb
- op: mask
polygon: [[100,100], [540,100], [540,320], [100,320]]config/model.yaml
name: defect-v7
version: 7.2.1
format: onnx
path: models/defect-v7-7.2.1.onnx
threshold: 0.62
execution_provider: tensorrt
batch_size: 1config/outputs.yaml
operator_ui:
enabled: true
host: 0.0.0.0
port: 5173
mqtt:
enabled: true
broker: mqtt://localhost:1883
topic_base: factory/sample-plant/line-1/qa
opcua:
enabled: false
endpoint: opc.tcp://localhost:4840
storage:
events_dir: data/events
frames_dir: data/frames
cloud:
enabled: false
provider: aws
bucket: edgesight-eventsStart the stack and demo:
docker compose -f deploy/compose/docker-compose.yml up --build -d
curl -X POST http://localhost:9001/start
open http://localhost:5173 # on Windows, browse manuallyOpen the Operator UI at http://localhost:5173. Events stream via SSE from the results adapter.
- All services read a single YAML config file each
- Environment variables can override any YAML key with
DOT.SEPARATED.KEY=value - Hot reload is supported for non‑structural changes, for example threshold updates
Env vars:
MQTT_BROKER=mosquitto
MQTT_PORT=1883
MQTT_USERNAME= (optional)
MQTT_PASSWORD= (optional)
MQTT_QOS=1 # 0/1/2
MQTT_RETAIN=false # retain flag for alerts
MQTT_TLS_ENABLED=false # enable TLS
MQTT_TLS_INSECURE=false # allow insecure TLS (testing only)
CONF_THRESHOLD=0.5 # publish when any detection >= threshold
LINE_ID=line-1
Topics:
edgesight/line/{LINE_ID}/defect
Env vars:
OPCUA_ENABLED=0
OPCUA_ENDPOINT=opc.tcp://localhost:4840
OPCUA_DEFECT_NODE=ns=2;s=Factory.Lines.{line}.QA.LastEvent
When enabled, the adapter writes a JSON string payload to the configured node. Production deployments should use an address space agreed with controls and a trust store for TLS.
The pipeline propagates an X-Correlation-ID header across services, echoed in SSE events and structured logs, to stitch metrics/logs together. OpenTelemetry can be enabled via envs to emit spans for capture → preprocess → inference → adapter.
Minimal collector setup (OTLP gRPC at 4317):
receivers:
otlp:
protocols:
grpc:
exporters:
logging: {}
service:
pipelines:
traces:
receivers: [otlp]
exporters: [logging]
Env vars (services): OTEL_ENABLED=1, OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317, OTEL_EXPORTER_OTLP_INSECURE=1, OTEL_SERVICE_NAME=<service>.
To see exemplars in Grafana, pair Prometheus with a traces backend (Grafana Tempo) and enable the OTEL→Tempo pipeline. Grafana can link metrics panels (e.g., e2e latency histograms) to spans when correlation IDs or trace IDs are present.
With docker-compose, a Tempo instance is included. Open Grafana (http://localhost:3000), then:
- Use the “Explore” tab, Tempo datasource, and query
{service.name="results-adapter"}to browse recent traces. - From the “EdgeSight QA” dashboard, click a point on the “E2E Latency p95/p99” graph; exemplars should pop and link to a trace.
- In the operator UI, each event includes a “View trace” link (when available) that opens the specific trace in Grafana Explore (Tempo datasource).
Apply base manifests:
kubectl apply -f deploy/k8s/baseOr install via Helm:
helm install edgesight-qa deploy/helm/edgesight-qa -n edgesight --create-namespace \
--set image.repository=ghcr.io/your-org --set image.tag=latest- Use
DeploymentConfigif required by policy - Annotate pods for GPU scheduling where applicable
- Route Operator UI through an OpenShift Route
Use the helper script to bundle images and charts:
./scripts/package_offline.shLoad images on the target and configure a registry mirror or ImageContentSourcePolicy (OpenShift). See deploy/helm/edgesight-qa/values.yaml for environment overrides.
Each service exposes Prometheus metrics at /metrics.
Key metrics:
edgesight_latency_ms{stage="capture→adapter"}
edgesight_fps{camera_id="cam-A"}
edgesight_false_alarm_rate
edgesight_model_confidence_bucket
edgesight_dropped_frames_total
edgesight_queue_backlog
Grafana dashboards are provided under ops/grafana-dashboards.
Logs are structured JSON and include event_id, trace_id, and model.version.
- Versioned artifacts: model files, container digests, configuration hashes
- SBOM generation for each image
- Signed releases and optional cosign verification
- Lineage logs: requirement ID, story ID, code commit, container digest, model hash
- Markdown report generation:
docs/provenance-report-template.mdpopulated from logs
Report example front matter:
report_id: prov-2025-09-06-001
period: 2025-09-01..2025-09-06
lines: [line-1]
models:
- name: defect-v7
version: 7.2.1
hash: sha256:...
slis:
latency_p95_ms: 175
uptime: 99.95
false_alarm_rate: 0.8%- Unit tests for preprocessing ops, schema validation, topic mapping
- Integration tests use a synthetic camera that replays video
- Golden frames validate boxes and scores within tolerance
- Soak tests run at target FPS for hours, verify p95 latency and no memory leaks
- Failover tests kill inference container and verify auto recovery and idempotent plant writes
Run tests:
pytest -q- MQTT topic:
edgesight/line/{line_id}/defect - Example payload includes
frame_id,detections[],ts,model_hash,config_digest. - OPC UA: stub
write_defect_tag(line_id, payload)provided; replace with plant tag writes.
- Camera URL: set
CAMERA_URL=file:/app/assets/sample.mp4for demo; RTSP URLs also supported. - Time sync: ensure device NTP/PTP; pipeline uses monotonic timestamps.
- GPU: if using NVIDIA GPUs, install NVIDIA Container Toolkit; CPU fallback is supported.
- Permissions: bind‑mount
assets/for model/video files; ensure readable by containers. - Windows: if
scripts/fetch_demo_model.shcannot run, download a small ONNX model and a short MP4 clip intoassets/manually.
- Quantize to INT8, prune layers, prefer TensorRT on Jetson and NVIDIA GPUs
- Move preprocessing to GPU when possible
- Pin CPU affinities for capture and adapter
- Use zero‑copy buffers between preprocess and inference when available
- Batch small images if accuracy allows
- Secrets injected via Kubernetes Secrets or OpenShift equivalents
- Least privilege on MQTT and OPC UA
- RBAC for Operator UI actions, audit all state changes
- Network policies isolate services; only adapter speaks to plant systems
- Multi‑line orchestration and fleet management
- Adapter plugins for Rockwell and Siemens stacks
- Built‑in drift detection and auto revalidation jobs
- Automated model registry integration and signed model promotion
-
PerceptionLab: a camera pipeline lab with visualizations and thresholds, useful for tuning inspection settings.
MIT. See LICENSE file.
- Inductive Automation Ignition for SCADA integration patterns
- ONNX Runtime and TensorRT for efficient inference on edge devices
- Systems Integrators needing an edge CV stack that slots into Ignition, supports air‑gapped or near‑air‑gapped deployments, and scales from 1 line → 25+ lines with repeatable practices.
- Regulated manufacturers (e.g., pharma, food/bev like Fronterra) that require traceability, reproducibility, and data hygiene with clear acceptance criteria (e.g., 99.99% accuracy target for advisory outputs where required).
- Plant‑ready: MQTT/OPC UA adapters for Ignition; offline‑tolerant queues; operator overlays and alarms; time‑sync and device health baked in.
- Regulatory‑aware: Provenance ledger (model, data, config digests), e‑record/e‑sig hooks, retention policies, audit reports.
- Platform‑native: Kubernetes/OpenShift manifests; optional Terraform modules; artifact bundles for air‑gap installs.
- Scale discipline: Canary rollouts, health‑gated promotion, automated rollback; fleet metrics (latency, FPS, false‑alarm rate).
- Pharma: blister‑pack missing pill, vial cap/seal defect, label OCR mismatch, fill‑level variance.
- Food & beverage: foreign‑object detection, cap/crown mis‑seat, under/over‑fill, label skew/date‑code OCR.
Tip: Include 2–3 domain presets in your
configs/and a folder of example clips/images for instant demos.
Enterprise view: Business goal → Reduce defects/downtime; compliance → traceable decisions; stakeholders → quality, operations, validation.
Information view: Artifacts tracked → dataset.hash, model.hash, config.digest, run.id, operator.id, timestamps; retention & PHI/PII handling policy.
Computational view: Services → capture, preprocess, inference, results‑adapter, governance‑exporter; interfaces → gRPC/REST for internal; MQTT/OPC UA/REST for external.
Engineering view: Deploy → Kubernetes/OpenShift; packaging → OCI images; observability → Prometheus exporters + Grafana; storage → local TSDB + cloud sinks when available.
Technology view: Jetson Orin NX or x86+GPU IPC; PoE cameras + controlled lighting; NTP; secure boot (where supported); registry mirror for air‑gap.
Kubernetes/OpenShift + Terraform) Manifests
deploy/
k8s/
namespace.yaml
capture-deploy.yaml
preprocess-deploy.yaml
inference-deploy.yaml
results-adapter-deploy.yaml
services/*.yaml
hpa/*.yaml
openshift/
routes/*.yaml
imagestreams/*.yaml
Helm values (excerpt)
imagePullPolicy: IfNotPresent
resources:
requests: { cpu: "500m", memory: "1Gi" }
limits: { cpu: "2", memory: "4Gi", nvidia.com/gpu: 1 }
probes:
liveness: /healthz
readiness: /readyz
Terraform skeleton (deploy/terraform/)
providers.tf(OpenShift/K8s + optional AWS/Azure)main.tfcreates namespaces, secrets, configmaps, service accounts, RBACoutputs.tfexposes service endpoints for Ignition or gateway
Air‑gap install
./bin/package_offline.shto export Helm charts + images as tarballs- Local registry mirror +
ImageContentSourcePolicy(OpenShift)
Create docs/provenance-report.md describing:
- Run manifest:
run.json(model hash, dataset hash, config digest, code commit, device ID, time window) - Append‑only decision log: JSONL with signed entries (Ed25519), monotonic timestamps
- Report generator:
governance-exporterconverts logs → human‑readable PDF/Markdown; includes confusion matrix, p95 latency, FPS, error budget adherence - Policy examples: retention (e.g., 30/90/365), operator access (RBAC), audit export (CSV/PDF, signature block)
- Standards mapping (informative): FDA 21 CFR Part 11 (e‑records/e‑signatures), GxP/GMP concepts, HACCP/SQF controls (non‑normative guidance)
- Performance gates: p95 latency ≤ 200 ms, FPS ≥ target; health probes green for N hours across canary pool
- Statistical accuracy gate (advisory outputs): demonstrate ≥ 99.99% accuracy on N trials with binomial CI ≥ target; attach dataset card and sampling method
- Stress & failover: camera unplug/replug; network drop; disk‑full; thermal throttling; verify degraded but safe behavior
- Traceability: each test emits signed
run.json+ decision log for audit
Include sample tests/ with pytest for services and a soak.sh script.
- Operator UI: live overlay, threshold slider, last‑N events, “send to re‑inspect”.
- Results adapter: translates model output → Ignition tags/events; supports MQTT, OPC UA, and REST webhooks.
- Language model copilot (optional): natural‑language queries over logs (“show last 24h of cap‑mis‑seat at line 3”), retrieval‑augmented from decision logs; never alters edge decisions.
- Compute: NVIDIA Jetson Orin NX or x86 IPC + RTX A2000/4000 (fan‑out via PoE switch)
- Cameras: Industrial PoE, fixed lens matched to working distance, polarizing filters as needed
- Lighting: ring/bar, diffusers; strobe option
- Networking: PoE switch, VLAN plan; NTP source
- Enclosure: IP‑rated; thermal plan; mounting hardware
Add docs/bom.md with part classes and lead‑time notes.