Real-time computer vision in the browser. Object detection, instance segmentation, and monocular depth estimation — running entirely client-side via ONNX Runtime Web on WebGPU.
No server, no API calls. Your camera feed never leaves your device.
- Object Detection — RF-DETR with neon bounding boxes and confidence scores
- Instance Segmentation — RF-DETR Seg with per-instance colored masks
- Depth Estimation — Depth Anything V2 with Turbo colormap visualization
- Live Webcam — Real-time inference on camera feed
- Image Upload — Single-shot inference on static images
- Hot Model Switching — Switch models seamlessly without restarting the camera
- Embed Mode — Clean iframe integration via
?embed=true
All models are Apache 2.0 licensed.
All models are FP16 quantized for fast download and efficient WebGPU inference.
| Model | Task | Resolution | Size |
|---|---|---|---|
| RF-DETR Nano | Detection | 384x384 | 52 MB |
| RF-DETR Seg Nano | Segmentation | 312x312 | 59 MB |
| Depth Anything V2 ViT-S | Depth | 518x518 | 48 MB |
- Chromium-based browser with WebGPU support (Chrome 113+, Edge 113+)
- WebGPU must be enabled (it is by default on recent versions)
# Install dependencies
npm install
# Download ONNX models into public/models/
# (see "Models Setup" below)
# Start dev server
npm run devThe app serves at http://localhost:5173/framesight/.
ONNX model files are gitignored on main due to size, but included in the gh-pages deployment branch. For local development, download the FP16 models and place them in public/models/:
public/models/
rfdetr-nano-fp16.onnx (52 MB)
rfdetr-seg-nano-fp16.onnx (59 MB)
depth-anything-v2-vits-fp16.onnx (48 MB)
RF-DETR models are exported from Roboflow RF-DETR, Depth Anything V2 ViT-S from Depth-Anything-V2. All quantized to FP16 via onnxconverter-common.
| Command | Description |
|---|---|
npm run dev |
Start dev server |
npm run build |
Production build to dist/ |
npm run preview |
Preview production build |
npm run lint |
Run ESLint |
npm run deploy |
Build + deploy to GitHub Pages |
All ML inference runs in a Web Worker to keep the UI thread free.
Main Thread Web Worker
─────────── ──────────
Camera/Image
→ createImageBitmap()
→ postMessage(bitmap) ─────────→ bitmap decode (OffscreenCanvas)
→ preprocess (OpenCV.js)
→ session.run() (ONNX Runtime WebGPU)
→ postprocess
render overlay (Canvas) ←───────── postMessage(results + timing)
The frame loop uses a requestAnimationFrame cycle with an isProcessing gate — the UI runs at 60fps while inference is dispatched one frame at a time. No frame queueing, no memory pressure.
Model switches are serialized in the worker with a generation counter to prevent WebGPU buffer races.
- React 19 with React Compiler
- Vite 7 (rolldown)
- Tailwind CSS 4
- Framer Motion
- ONNX Runtime Web (WebGPU backend)
- OpenCV.js for image preprocessing
Append ?embed=true to hide the header, footer, and background glow — designed for iframe embedding:
<iframe
src="https://user41pp.github.io/framesight/?embed=true"
allow="camera; microphone"
style="width: 100%; height: 85vh; border: none;"
></iframe>MIT