Thanks to visit codestin.com
Credit goes to github.com

Skip to content

SSusantAchary/video-point-tracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video Point Tracker

Local-first video point tracking with a React frontend, an Express/BullMQ backend, ffmpeg-based video processing, and OpenAI-compatible multimodal model inference through Ollama, LM Studio, or llama.cpp.

What it does

  • Upload a local video file.
  • Choose a local provider and vision-capable model.
  • Set a tracking target such as ball, hand, or player.
  • Sample frames at a configurable FPS and send them to a local multimodal endpoint.
  • Receive normalized 2D points per frame.
  • Preview the track over the original video in the browser.
  • Download a rendered tracked MP4 and the raw JSON result.

UI Preview

Person Tracking

Person tracking UI

Robot Arm Pick Tracking

Robot arm tracking UI

Stack

  • Frontend: React 18, TypeScript, Vite, Zustand, Axios
  • Backend: Node 20, Express 4, BullMQ, Redis, OpenAI SDK, fluent-ffmpeg, Zod, Winston
  • Infra: Docker, Docker Compose, nginx, Redis

Repository layout

.
├── backend
├── frontend
├── nginx
├── docker-compose.yml
├── docker-compose.dev.yml
└── .env.example

Quickstart

Ollama (recommended)

ollama pull llava

cp .env.example .env
docker compose up --build

Open http://localhost:3000.

LM Studio

  1. Open LM Studio.
  2. Load a vision-capable model and keep it loaded.
  3. Start the local OpenAI-compatible server on port 1234.
cp .env.example .env

Edit .env and set:

LLM_PROVIDER=lmstudio

Then run:

docker compose up --build

Notes:

  • The backend uses LM Studio JSON-schema output when the loaded model supports it, which improves coordinate parsing reliability.
  • If a loaded LM Studio vision model has a custom ID that does not match the usual vision-name heuristics, the app now falls back to showing all loaded LM Studio models instead of hiding them.

llama.cpp

Start the multimodal server first:

./llava-server \
  -m llava-v1.6-mistral-7b.gguf \
  --mmproj mmproj-model-f16.gguf \
  --port 8080 \
  --host 0.0.0.0

Then:

cp .env.example .env

Edit .env and set:

LLM_PROVIDER=llamacpp

Run:

docker compose up --build

Development mode

Start the Redis-backed backend and Vite frontend with hot reload:

cp .env.example .env
docker compose -f docker-compose.yml -f docker-compose.dev.yml up --build
  • Frontend dev server: http://localhost:5173
  • Backend API: http://localhost:4000
  • Full proxied app: http://localhost:3000

API surface

  • POST /api/track
  • GET /api/track/progress/:jobId
  • GET /api/track/result/:jobId
  • GET /api/track/download/:jobId/:filename
  • GET /api/models?provider=ollama
  • GET /api/health

Environment variables

See .env.example for the full list. The key values are:

  • LLM_PROVIDER
  • OLLAMA_BASE_URL
  • LMSTUDIO_BASE_URL
  • LLAMACPP_BASE_URL
  • MAX_UPLOAD_MB
  • MAX_VIDEO_SECS
  • QUEUE_CONCURRENCY
  • FRAME_TMP_DIR

Notes

  • The in-browser result player overlays points on the original uploaded clip for immediate inspection.
  • The backend also renders a downloadable tracked MP4 using ffmpeg drawbox filters.
  • In Docker, the backend prefers system ffmpeg and ffprobe binaries; FFMPEG_PATH and FFPROBE_PATH can override detection if needed.
  • When the SSE client disconnects, the backend aborts the active tracking job and cleans up runtime artifacts.
  • Completed job artifacts are scheduled for deletion 30 minutes after completion.

Validation

The frontend production build and backend TypeScript build are part of the implementation workflow. The final Compose integration still depends on a locally available multimodal provider and a real sample video.

About

Local-first multimodal video point tracking tool for people monitoring, object tracking, and precise object localization.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages