This project scaffolds a Python app that calls NVIDIA NIM OCR and LLM microservices, with Docker, docker-compose, Kubernetes manifests, Prometheus, and Grafana.
project-root/
├── Dockerfile
├── docker-compose.yaml
├── k8s/
│ ├── nim-helm-values.yaml
│ └── deploy-app.yaml
├── mock_nim/
│ ├── Dockerfile
│ └── server.py
├── app/
│ ├── main.py
│ ├── nim_client.py
│ ├── requirements.txt
│ └── config.yaml
├── monitoring/
│ ├── prometheus.yaml
│ └── grafana-dashboard.json
├── scripts/
│ ├── build.sh
│ └── run_compose.sh
├── tests/
│ └── test_nim_client.py
└── README.md
- Build and start:
./scripts/build.sh
./scripts/run_compose.sh
Note: ./scripts/run_compose.sh
runs docker compose up --build
, so it will build images automatically; running build.sh
is optional.
- The app runs and calls a local mock NIM service
mock-nim
athttp://mock-nim:8000
(exposed on the host ashttp://localhost:8000
).
2a. The app's web server is available at http://localhost:9000
:
/
basic status/run
executes the OCR→LLM pipeline/metrics
Prometheus metrics/healthz
health check
- Monitoring:
- Prometheus:
http://localhost:9090
- Grafana:
http://localhost:3000
(add Prometheus data source, import dashboard JSON frommonitoring/grafana-dashboard.json
)
Note: The compose file includes a local mock-nim
for development. Replace it (and any placeholder images) with your actual NVIDIA NIM proxy/services when deploying.
- Apply
k8s/deploy-app.yaml
(namespacednim
). - Use
k8s/nim-helm-values.yaml
with your Helm charts for NIM services (proxy, OCR, LLM) and monitoring.
app/config.yaml
controlsnim_host
, model name, timeouts, and thresholds. You can overrideNIM_HOST
andCONFIG_PATH
via env vars.
python -m venv .venv && source .venv/bin/activate
pip install -r app/requirements.txt
pytest -q
- Loads config, encodes
sample_scan.png
as data URL, sends to OCRPOST /v1/infer
, then sends extracted text to LLMPOST /v1/chat/completions
and prints the result.
- Replace placeholder NIM images and proxy with your actual NIM services.
- Consider adding auth (API keys) and TLS per your deployment.