"One tool, four ways to deploy your model: Library, CLI, Template, or Full Stack"
4 ways to use InferX - Choose what fits your needs:
- π¦ Library - Import and use directly in your Python code
- β‘ CLI - Run models directly from command line
- ποΈ Template Generator - Generate ready-to-use project templates
- π’ Full Stack - Generate API servers and Docker containers
Unlike heavy frameworks, InferX gives you clean, minimal dependency code that you own completely. No framework lock-in, no heavy dependencies.
from inferx import InferenceEngine
# Use directly in your Python applications
engine = InferenceEngine("model.onnx", device="gpu")
result = engine.predict("image.jpg")
# Batch processing
results = engine.predict_batch(["img1.jpg", "img2.jpg"])# Run inference directly from command line
inferx run model.onnx image.jpg --device gpu
# Batch processing with output
inferx run model.xml images/ --output results.json --runtime openvino
# Device optimization
inferx run model.xml image.jpg --device myriad --runtime openvino# Generate YOLO ONNX project
uv run inferx template --model-type yolo --name my-detector
cd my-detector && uv sync
# Generate YOLO OpenVINO project
uv run inferx template --model-type yolo_openvino --name my-openvino-detector
cd my-openvino-detector && uv sync --extra openvino
# Generate with API server
uv run inferx template --model-type yolo --name my-api-detector --with-api
cd my-api-detector && uv sync --extra api
# Copy your model file
uv run inferx template --model-type yolo --name my-detector --model-path /path/to/model.onnx
# Project structure:
# βββ pyproject.toml # UV-compatible dependencies
# βββ src/
# β βββ inferencer.py # YOLO inference implementation
# β βββ server.py # FastAPI server (if --with-api)
# β βββ [base.py, utils.py, exceptions.py] # Supporting files
# βββ models/yolo_model.onnx # Your model file
# βββ config.yaml # Configuration# Generate with API server included
uv run inferx template --model-type yolo --name my-api-detector --with-api
cd my-api-detector
# Install dependencies
uv sync --extra api
# Start API server
uv run --extra api python -m src.server
# Server runs at: http://0.0.0.0:8080
# Test API endpoints
curl -X GET "http://localhost:8080/" # Health check
curl -X GET "http://localhost:8080/info" # Model info
curl -X POST "http://localhost:8080/predict" -F "[email protected]" # Inference| Framework | Dependencies | Container Size | Approach |
|---|---|---|---|
| InferX | ONNX Runtime only (~50MB) | ~75MB | Code generation |
| BentoML | Full framework stack | ~900MB | Framework-based |
| TorchServe | PyTorch + dependencies | ~1.2GB | Framework-based |
| TF Serving | TensorFlow | ~800MB | Framework-based |
When you run inferx template yolo --name my-detector:
my-detector/ # Your standalone project
βββ pyproject.toml # UV project with minimal deps
βββ src/
β βββ __init__.py
β βββ inferencer.py # YOLO inference implementation (inherits from InferX YOLOInferencer)
β βββ base.py # Base inferencer class
βββ models/
β βββ yolo_model.onnx # Place your YOLO model here (or .xml/.bin for OpenVINO)
βββ config.yaml # Inference configuration
βββ README.md # Usage instructions
βββ .gitignore # Standard Python gitignore
When you run inferx template yolo_openvino --name my-detector:
my-detector/ # Your standalone project
βββ pyproject.toml # UV project with minimal deps
βββ src/
β βββ __init__.py
β βββ inferencer.py # YOLO OpenVINO inference implementation (inherits from InferX YOLOOpenVINOInferencer)
β βββ base.py # Base inferencer class
βββ models/
β βββ yolo_model.xml # Place your YOLO OpenVINO model .xml file here
β βββ yolo_model.bin # Place your YOLO OpenVINO model .bin file here
βββ config.yaml # Inference configuration
βββ README.md # Usage instructions
βββ .gitignore # Standard Python gitignore
After inferx api:
my-detector/
βββ src/
β βββ inferencer.py # Existing
β βββ base.py # Existing
β βββ server.py # Generated FastAPI app
βββ requirements-api.txt # +FastAPI only
After inferx docker:
my-detector/
βββ Dockerfile # Multi-stage optimized
βββ docker-compose.yml # Ready to deploy
βββ .dockerignore # Build optimization
[project]
name = "my-detector"
version = "0.1.0"
dependencies = [
"onnxruntime>=1.16.0", # ~50MB
"numpy>=1.24.0", # Array operations
"opencv-python-headless>=4.8.0", # Image processing
]
[project.optional-dependencies]
api = ["fastapi>=0.104.0", "uvicorn>=0.24.0"] # Only when using API
gpu = ["onnxruntime-gpu>=1.16.0"] # Only for GPU inference
openvino = ["openvino>=2023.3.0"] # Intel optimization- Production safety: Fewer dependencies = fewer security vulnerabilities
- Faster deployment: Smaller containers, faster startup
- Cost efficiency: Less compute resources needed
- Maintenance: Easier to update and maintain
# Install from PyPI (when available)
pip install inferx
# Or install from source
git clone https://github.com/yourusername/inferx.git
cd inferx
pip install -e .from inferx import InferenceEngine
# Use directly in your Python applications
engine = InferenceEngine("model.onnx", device="gpu")
result = engine.predict("image.jpg")
print(result)# Run inference directly from command line
inferx run model.onnx image.jpg --device gpu
# Batch processing
inferx run model.xml images/ --output results.json --runtime openvino# Create YOLO detection project
inferx template yolo --name my-detector
cd my-detector
# Project structure:
# βββ src/inference.py # YOLO inference code
# βββ model.onnx # Place your model here
# βββ pyproject.toml # Minimal dependencies
# Test inference
uv run python -m src.inference test_image.jpg# Start with template
inferx template yolo --name my-detector
cd my-detector
# Add API server
inferx api
# Add Docker deployment
inferx docker
# Start server
uv run python -m src.server
# Or deploy with Docker
docker build -t my-detector:v1 .
docker run -p 8080:8080 my-detector:v1# 1. YOLO ONNX (Basic)
uv run inferx template --model-type yolo --name my-yolo-project
# 2. YOLO ONNX (with FastAPI)
uv run inferx template --model-type yolo --name my-yolo-api --with-api
# 3. YOLO OpenVINO (Basic)
uv run inferx template --model-type yolo_openvino --name my-openvino-project
# 4. YOLO OpenVINO (with FastAPI)
uv run inferx template --model-type yolo_openvino --name my-openvino-api --with-api
# π§ Coming Soon:
# - Anomaly detection templates
# - Image classification templates
# - Custom ONNX model templates- β Basic inference engines (ONNX + OpenVINO)
- β Configuration system
- β CLI structure
- β Testing framework
- β Project examples
- β Library usage pattern
- β CLI usage pattern
- β
Template generation (
inferx template) - NEW! - β API generation (FastAPI servers) - NEW!
- β 4 Template Combinations (YOLO, YOLO+API, OpenVINO, OpenVINO+API) - NEW!
- π§ Docker generation (
inferx docker) - Future feature - π§ Project templates (Anomaly, Classification)
- π§ Model zoo integration
See TODO.md for detailed development tasks and progress.
Generated projects include a config.yaml:
# Model settings
model:
path: "model.onnx"
type: "yolo"
# Inference settings
inference:
device: "auto" # auto, cpu, gpu
batch_size: 1
confidence_threshold: 0.25
# Input preprocessing
preprocessing:
input_size: [640, 640]
normalize: true
format: "RGB"# 1. Library - Import and use in your code
from inferx import InferenceEngine
engine = InferenceEngine("model.onnx")
result = engine.predict("image.jpg")
# 2. CLI - Run from command line
# inferx run model.onnx image.jpg
# 3. Template - Generate project structure
# inferx template yolo --name my-detector
# 4. Full Stack - Generate API + Docker
# inferx template yolo --name my-detector
# cd my-detector
# inferx api
# inferx docker# BentoML - Framework dependency
import bentoml
@bentoml.service(
resources={"cpu": "2"},
traffic={"timeout": 20},
)
class MyService:
# Heavy framework, complex setup# Generated inference.py - No framework dependency
import onnxruntime as ort
import numpy as np
class YOLOInferencer:
def __init__(self, model_path: str):
self.session = ort.InferenceSession(model_path)
def predict(self, image_path: str):
# Your clean, minimal code
return results- β You own the code - No framework lock-in
- β Minimal dependencies - Only what you need
- β Easy to modify - Standard Python code
- β Production ready - UV project structure
- β Fast deployment - Small containers
- β 4 usage patterns - Library, CLI, Template, or Full Stack
InferX core inference engines (Library and CLI) are production-ready. Template generation features are in active development.
- Test current inference engines with your ONNX/OpenVINO models
- Use the Library and CLI patterns in your projects and report issues
- Suggest template improvements for different model types
- Contribute code for template generation features
git clone https://github.com/yourusername/inferx.git
cd inferx
pip install -e .[dev]
# Run tests
python test_runner.py
# See development tasks
cat TODO.mdThis project is licensed under the MIT License - see the LICENSE file for details.
InferX - Minimal dependency ML inference templates. π
Give us your model. Get template, API, or Docker container.