XYNE Paddlex based OCR Module with High-Stability Serving

Production deployment of PP-StructureV3 on Triton Inference Server, extended for XYNE/Juspay workloads. This project starts from the PaddleX High-Stability Serving (HPS) SDK for PP-StructureV3¹² and layers operational tooling, performance tuning, and optional multimodal captioning that we run in production at xynehq/xyne and across other Juspay services.

Architecture Overview

paddlex-server container packages the PP-StructureV3 layout pipeline as a Triton Python backend (model layout-parsing) and ships the pipeline configs from server/.
blip-server (optional) hosts a separate Triton instance serving BLIP image captioning (blip-caption) that model.py can call asynchronously for richer markdown output.
status_server.py exposes /instance_status and /paddlex_instance_status on port 8081, aggregating per-instance JSON heartbeats written by each Triton backend.
All services inherit the PaddleX HPS runtime (HTTP :8000, gRPC :8001, Metrics :8002) from paddle/hps:paddlex3.1-gpu, with additional Python dependencies for PyMuPDF, BLIP, and Triton clients installed in the Dockerfile.

+--------------+        gRPC/HTTP         +------------+
| paddlex-server|  <--------------------> | clients    |
|  layout-parsing  |                      +------------+
|  status_server   |                            ^
+---------+--------+                            |
          | async captions                      |
          v                                      |
+---------+--------+   gRPC:8004 / HTTP:8003    |
|   blip-server    |<----------------------------+
|   blip-caption   |
+------------------+

Key Enhancements Beyond the Base SDK

Instance-aware scheduling: server/model_repo/layout-parsing/1/model.py integrates InstanceStatusTracker with /tmp/triton_instance_status_*.json heartbeats, exposes configured/active counts, and honours overrides from TRITON_INSTANCE_COUNT and config_*.pbtxt³.
Async image captioning pipeline: CaptionCoordinator in layout_captioning.py streams markdown images to the BLIP Triton model, merges captions back into parsing_res_list, and is gated by caption_config.yaml plus the global IMAGE_CAPTIONING_ENABLED env flag⁴.
Memory hygiene for long PDFs: Page rendering streams via bounded queues, gc.collect() is invoked per page, paddle.device.cuda.empty_cache() is called after each request, and a PADDLE_TRIM_CACHE_EVERY_N knob controls periodic GPU cache trimming³.
Resilient media uploads: app_common.postprocess_images is wrapped with _POSTPROCESS_UPLOAD_TIMEOUT_SECONDS, giving fast failures and actionable logs when downstream storage stalls.
Consolidated configuration: Model/backend matrices live in configFiles/modelSupport.json and configFiles/modeSupportGpu.json, while pipeline variants for Paddle, Paddle+HP, and ONNX Runtime runtimes sit alongside the shipping server/pipeline_config.yaml.

Repository Layout

server/model_repo/layout-parsing/1/ – Triton Python backend using PaddleX pipeline, caption orchestration, and active-instance tracking.
server/model_repo/blip-caption/1/ – Triton Python backend that lazy-loads and caches Salesforce/blip-image-captioning-large with JSON IO.
server/status_server.py – threadsafe HTTP server that aggregates /tmp/*_instance_status*.json.
configFiles/ – curated runtime configs (pipeline_config *.yaml) and backend capability matrices (modelSupport.json, modeSupportGpu.json).
Dockerfile, docker-compose*.yml – container build/test orchestration for local dev and production.
client/ – lightweight gRPC helper (client.py) and requirements for smoke tests against :8001.

Pipeline Runtime Configurations

Choose the right pipeline manifest and set PADDLEX_HPS_PIPELINE_CONFIG_PATH before launching server.sh:

File	Purpose
`server/pipeline_config.yaml`	Default shipping config with Paddle backend and tuned thresholds for production.
`configFiles/pipeline_config Paddle.yaml`	Legacy pure-Paddle baseline (HP Inference disabled) for compatibility testing.
`configFiles/pipeline_config paddleHpiTrue.yaml`	Paddle backend with `use_hpip: True` to enable High Performance Inference acceleration on supported modules.
`configFiles/pipeline_config onnxRuntime.yaml`	Hybrid config pushing layout detection and OCR classifiers through ONNX Runtime for lower latency on NVIDIA GPUs.

server/server.sh copies the chosen manifest into the runtime image, resolves config_{gpu,cpu}.pbtxt symlinks, and boots Triton with --model-repository set to /paddlex/var/paddlex_model_repo.

Triton Model Configuration

Layout parsing uses config_gpu_paddlex.pbtxt (instance_group.count: 6) for GPU deployments and server/model_repo/layout-parsing/config_cpu.pbtxt for CPU fallbacks.
BLIP captioning relies on config_gpu_blip.pbtxt / server/model_repo/blip-caption/config_gpu.pbtxt (instance_group.count: 4, dynamic batching up to 16) and shares the same image.
Update TRITON_INSTANCE_COUNT to ensure the status tracker reflects manual scaling, or edit the count: fields and reload the model.

Backend Capability Matrices

We expose PaddleX model-to-backend compatibility as JSON so downstream services can reason about runtime choices:

// configFiles/modeSupportGpu.json (excerpt)
{
  "gpu_cuda118_cudnn89": {
    "PP-DocLayout_plus-L": ["tensorrt_fp16", "paddle", "onnxruntime"],
    "PP-DocBlockLayout": ["tensorrt", "paddle", "onnxruntime"],
    "PP-OCRv5_server_det": ["tensorrt", "paddle"],
    "PP-OCRv5_server_rec": ["paddle_tensorrt_fp16", "tensorrt_fp16", "onnxruntime", "paddle"],
    "SLANet_plus": ["paddle"],
    "SLANeXt_wired": ["paddle"]
  }
}

See the full JSON files for the exhaustive matrix.

Building the Image

Prerequisites: Docker Engine with NVIDIA Container Toolkit, CUDA-capable GPU (≥12 GB recommended), and access to the PaddleX3.1 base image registry.

# Build the Triton-serving image
docker build -t paddlex/app:latest .

Running in Development

# Start layout parsing only (captioning disabled, lighter footprint)
IMAGE_CAPTIONING_ENABLED=false docker compose up paddlex-server

# Start both layout parsing and blip-caption (captioning enabled)
COMPOSE_PROFILES=image-captioning IMAGE_CAPTIONING_ENABLED=true \
  docker compose up

# Tail logs
docker logs -f paddlex-server
docker logs -f blip-server   # only if profile enabled

docker-compose.yml mounts the repo into /app, reuses config_gpu_paddlex.pbtxt, and exposes ports 8000/8001/8002/8081.
Captioning defaults on; flip IMAGE_CAPTIONING_ENABLED or remove the image-captioning profile to opt out.

Running in Production

# Build once, then deploy with production compose
docker compose -f docker-compose.prod.yml build

# Keep captioning enabled
COMPOSE_PROFILES=image-captioning IMAGE_CAPTIONING_ENABLED=true \
  docker compose -f docker-compose.prod.yml up -d

# Or disable captioning globally while keeping layout parsing online
IMAGE_CAPTIONING_ENABLED=false \
  docker compose -f docker-compose.prod.yml up -d paddlex-server

Production compose avoids bind-mounting the workspace, instead mounting curated model caches (paddlex/official_models) and optional BLIP cache directories. Ensure the external Docker network xyne exists (docker network create xyne) before bringing up the stack.

Bare-Metal Launch

For on-host Triton, execute server/server.sh after setting:

export PADDLEX_HPS_PIPELINE_CONFIG_PATH=/path/to/pipeline_config.yaml
export IMAGE_CAPTIONING_ENABLED=true
./server/server.sh

The script mirrors the repo to /paddlex/var/paddlex_model_repo, boots the status server, and launches Triton with explicit model loads.

Monitoring & Operations

Health metrics live at /paddlex_instance_status (layout-parsing active/idle counts) and /instance_status (layout + BLIP configured instances) on port 8081.
Runtime tuning knobs:
- TRITON_POSTPROCESS_TIMEOUT_SECONDS – fail fast when image uploads hang.
- PADDLE_TRIM_CACHE_EVERY_N – force paddle.device.cuda.empty_cache() periodically for long-running jobs.
- IMAGE_CAPTIONING_ENABLED – global kill-switch for caption orchestration.
- TRITON_INSTANCE_COUNT – advertises pre-scaled pool sizes to the status tracker.

Using the gRPC Client

The client/ folder includes a minimal smoke-test harness:

pip install -r client/requirements.txt
python client/client.py --file path/to/document.pdf --file-type 0 --url localhost:8001

Generated markdown and images are written to ./markdown_<page>/, mirroring how downstream services consume the layout parsing output.

Benchmark Highlights

We evaluated representative PDFs (Juspay pitch deck, LIC forms) across backends and GPUs. Times are wall-clock per document with a single instance unless noted.

Mode	GPU / Machine	Document	Pages	Size	Latency	Peak GPU Mem	Quality	Notes
Paddle basic (UAT)	RTX 4090 (local)	juspay.pdf	41	22 MB	28 s	6 GB + spike	Excellent	Baseline profile currently in UAT.
Paddle basic (UAT)	L4 (UAT)	juspay.pdf	41	22 MB	112 s	6 GB + spike	Excellent	Longer wall-time on L4 due to lower clocks.
Paddle basic (UAT)	RTX 4090 (local)	LIC.pdf	93	12 MB	65 s	6 GB + spike	Excellent	—
Paddle basic (UAT)	L4 (UAT)	LIC.pdf	93	12 MB	150–180 s	6 GB + spike	Excellent	—
TensorRT fp16	RTX 4090 (local)	juspay.pdf	41	22 MB	13 s	9 GB + 4 GB	Poor	Conversion skipped layers → unacceptable accuracy.
TensorRT fp32	RTX 4090 (local)	juspay.pdf	41	22 MB	18 s	12 GB + 4 GB	Poor	Same degradation as fp16.
ONNX Runtime	RTX 4090 (local)	juspay.pdf	41	22 MB	19 s	12 GB + 6 GB	Excellent	Competitive alternative.
Paddle + TensorRT subgraph fp32	RTX 4090 (local)	juspay.pdf	41	22 MB	17 s	6 GB + 2.5 GB	Excellent	Selected for production rollout.
Paddle + TensorRT subgraph fp32	A100	juspay.pdf	41	22 MB	53 s	6 GB + 2.5 GB	Excellent	Multi-instance friendly.
Paddle + TensorRT subgraph fp32	RTX 4090 (local)	LIC.pdf	93	12 MB	35 s	6 GB + 2.5 GB	Excellent	40 % faster than UAT baseline.
Paddle + TensorRT subgraph fp32	A100	LIC.pdf	93	12 MB	92 s	6 GB + 2.5 GB	Excellent	Up to 7 instances per A100 feasible.

Notes:

Latency grows ~30 % when scaling out instances; the status tracker helps right-size pools.
Formula recognition and seal detection remain disabled to prioritise throughput (adds 0.5–1 s per page when enabled).
Horizontal scaling is VRAM-bound; A100 deployments comfortably host seven concurrent instances (vs three on UAT).

Licensing & Attribution

PaddleOCR, PP-StructureV3, and PaddleX assets are Apache License 2.0. Review upstream license text at PaddleOCR.
BLIP weights (Salesforce/blip-image-captioning-large) inherit their respective Hugging Face licensing terms.
Cite the upstream research when publishing results built on this stack:
- PaddleOCR 3.0 Technical Report⁵
- PaddleOCR-VL⁶

References

PaddleX HPS PP-StructureV3 SDK download – https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hps/public/sdks/v3.3/paddlex_hps_PP-StructureV3_sdk.tar.gz ↩
PaddleX High-Stability Serving documentation – https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_deploy/serving.html#13-invoke-the-service ↩
server/model_repo/layout-parsing/1/model.py ↩ ↩²
server/model_repo/layout-parsing/1/layout_captioning.py ↩
Cheng Cui et al. PaddleOCR 3.0 Technical Report. arXiv:2507.05595, 2025. ↩
Cheng Cui et al. PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model. arXiv:2510.14528, 2025. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
client		client
configFiles		configFiles
imageCaptioningExperiments		imageCaptioningExperiments
server		server
.dockerignore		.dockerignore
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile copy.txt		Dockerfile copy.txt
README.md		README.md
config_gpu_blip.pbtxt		config_gpu_blip.pbtxt
config_gpu_paddlex.pbtxt		config_gpu_paddlex.pbtxt
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

XYNE Paddlex based OCR Module with High-Stability Serving

Architecture Overview

Key Enhancements Beyond the Base SDK

Repository Layout

Pipeline Runtime Configurations

Triton Model Configuration

Backend Capability Matrices

Building the Image

Running in Development

Running in Production

Bare-Metal Launch

Monitoring & Operations

Using the gRPC Client

Benchmark Highlights

Licensing & Attribution

References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

xynehq/paddle

Folders and files

Latest commit

History

Repository files navigation

XYNE Paddlex based OCR Module with High-Stability Serving

Architecture Overview

Key Enhancements Beyond the Base SDK

Repository Layout

Pipeline Runtime Configurations

Triton Model Configuration

Backend Capability Matrices

Building the Image

Running in Development

Running in Production

Bare-Metal Launch

Monitoring & Operations

Using the gRPC Client

Benchmark Highlights

Licensing & Attribution

References

Footnotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages