InferX - 4-in-1 ML Inference Toolkit (In Development Stage)

"One tool, four ways to deploy your model: Library, CLI, Template, or Full Stack"

🎯 Philosophy

4 ways to use InferX - Choose what fits your needs:

📦 Library - Import and use directly in your Python code
⚡ CLI - Run models directly from command line
🏗️ Template Generator - Generate ready-to-use project templates
🚢 Full Stack - Generate API servers and Docker containers

Unlike heavy frameworks, InferX gives you clean, minimal dependency code that you own completely. No framework lock-in, no heavy dependencies.

🎯 4 Usage Patterns

📦 1. Library Usage (Import in your code)

from inferx import InferenceEngine

# Use directly in your Python applications
engine = InferenceEngine("model.onnx", device="gpu")
result = engine.predict("image.jpg")

# Batch processing
results = engine.predict_batch(["img1.jpg", "img2.jpg"])

⚡ 2. CLI Usage (Command line)

# Run inference directly from command line
inferx run model.onnx image.jpg --device gpu

# Batch processing with output
inferx run model.xml images/ --output results.json --runtime openvino

# Device optimization
inferx run model.xml image.jpg --device myriad --runtime openvino

3. Template Generation (Project scaffolding) ✅ WORKING

# Generate YOLO ONNX project
uv run inferx template --model-type yolo --name my-detector
cd my-detector && uv sync

# Generate YOLO OpenVINO project  
uv run inferx template --model-type yolo_openvino --name my-openvino-detector
cd my-openvino-detector && uv sync --extra openvino

# Generate with API server
uv run inferx template --model-type yolo --name my-api-detector --with-api
cd my-api-detector && uv sync --extra api

# Copy your model file
uv run inferx template --model-type yolo --name my-detector --model-path /path/to/model.onnx

# Project structure:
# ├── pyproject.toml         # UV-compatible dependencies
# ├── src/
# │   ├── inferencer.py      # YOLO inference implementation  
# │   ├── server.py          # FastAPI server (if --with-api)
# │   └── [base.py, utils.py, exceptions.py]  # Supporting files
# ├── models/yolo_model.onnx # Your model file
# └── config.yaml           # Configuration

🚢 4. API Server Generation ✅ WORKING

# Generate with API server included
uv run inferx template --model-type yolo --name my-api-detector --with-api
cd my-api-detector

# Install dependencies
uv sync --extra api

# Start API server
uv run --extra api python -m src.server
# Server runs at: http://0.0.0.0:8080

# Test API endpoints
curl -X GET "http://localhost:8080/"                           # Health check
curl -X GET "http://localhost:8080/info"                       # Model info
curl -X POST "http://localhost:8080/predict" -F "[email protected]"  # Inference

🆚 vs Heavy Frameworks

Framework	Dependencies	Container Size	Approach
InferX	ONNX Runtime only (~50MB)	~75MB	Code generation
BentoML	Full framework stack	~900MB	Framework-based
TorchServe	PyTorch + dependencies	~1.2GB	Framework-based
TF Serving	TensorFlow	~800MB	Framework-based

🏗️ Generated Project Structure

When you run inferx template yolo --name my-detector:

my-detector/                    # Your standalone project
├── pyproject.toml             # UV project with minimal deps
├── src/
│   ├── __init__.py
│   ├── inferencer.py          # YOLO inference implementation (inherits from InferX YOLOInferencer)
│   └── base.py                # Base inferencer class
├── models/
│   └── yolo_model.onnx        # Place your YOLO model here (or .xml/.bin for OpenVINO)
├── config.yaml                # Inference configuration
├── README.md                  # Usage instructions
└── .gitignore                 # Standard Python gitignore

When you run inferx template yolo_openvino --name my-detector:

my-detector/                    # Your standalone project
├── pyproject.toml             # UV project with minimal deps
├── src/
│   ├── __init__.py
│   ├── inferencer.py          # YOLO OpenVINO inference implementation (inherits from InferX YOLOOpenVINOInferencer)
│   └── base.py                # Base inferencer class
├── models/
│   ├── yolo_model.xml         # Place your YOLO OpenVINO model .xml file here
│   └── yolo_model.bin         # Place your YOLO OpenVINO model .bin file here
├── config.yaml                # Inference configuration
├── README.md                  # Usage instructions
└── .gitignore                 # Standard Python gitignore

After inferx api:

my-detector/
├── src/
│   ├── inferencer.py          # Existing
│   ├── base.py                # Existing
│   └── server.py              # Generated FastAPI app
└── requirements-api.txt       # +FastAPI only

After inferx docker:

my-detector/
├── Dockerfile                 # Multi-stage optimized
├── docker-compose.yml         # Ready to deploy
└── .dockerignore             # Build optimization

📦 Generated Dependencies

Template Project (pyproject.toml)

[project]
name = "my-detector"
version = "0.1.0"
dependencies = [
    "onnxruntime>=1.16.0",           # ~50MB
    "numpy>=1.24.0",                 # Array operations
    "opencv-python-headless>=4.8.0", # Image processing
]

[project.optional-dependencies]
api = ["fastapi>=0.104.0", "uvicorn>=0.24.0"]  # Only when using API
gpu = ["onnxruntime-gpu>=1.16.0"]               # Only for GPU inference
openvino = ["openvino>=2023.3.0"]               # Intel optimization

Why Minimal Dependencies?

Production safety: Fewer dependencies = fewer security vulnerabilities
Faster deployment: Smaller containers, faster startup
Cost efficiency: Less compute resources needed
Maintenance: Easier to update and maintain

🚀 Quick Start

📥 Installation

# Install from PyPI (when available)
pip install inferx

# Or install from source
git clone https://github.com/yourusername/inferx.git
cd inferx
pip install -e .

🎯 Four Usage Patterns

1. Library Usage (Import in your code)

from inferx import InferenceEngine

# Use directly in your Python applications
engine = InferenceEngine("model.onnx", device="gpu")
result = engine.predict("image.jpg")
print(result)

2. CLI Usage (Command line)

# Run inference directly from command line
inferx run model.onnx image.jpg --device gpu

# Batch processing
inferx run model.xml images/ --output results.json --runtime openvino

3. Template Generation

# Create YOLO detection project
inferx template yolo --name my-detector
cd my-detector

# Project structure:
# ├── src/inference.py    # YOLO inference code
# ├── model.onnx         # Place your model here
# └── pyproject.toml     # Minimal dependencies

# Test inference
uv run python -m src.inference test_image.jpg

4. Full Stack Deployment

# Start with template
inferx template yolo --name my-detector
cd my-detector

# Add API server
inferx api

# Add Docker deployment
inferx docker

# Start server
uv run python -m src.server

# Or deploy with Docker
docker build -t my-detector:v1 .
docker run -p 8080:8080 my-detector:v1

🎨 Available Templates ✅ 4 Working Combinations

# 1. YOLO ONNX (Basic)
uv run inferx template --model-type yolo --name my-yolo-project

# 2. YOLO ONNX (with FastAPI)  
uv run inferx template --model-type yolo --name my-yolo-api --with-api

# 3. YOLO OpenVINO (Basic)
uv run inferx template --model-type yolo_openvino --name my-openvino-project

# 4. YOLO OpenVINO (with FastAPI)
uv run inferx template --model-type yolo_openvino --name my-openvino-api --with-api

# 🚧 Coming Soon:
# - Anomaly detection templates
# - Image classification templates  
# - Custom ONNX model templates

🚧 Development Status

✅ Currently Available

✅ Basic inference engines (ONNX + OpenVINO)
✅ Configuration system
✅ CLI structure
✅ Testing framework
✅ Project examples
✅ Library usage pattern
✅ CLI usage pattern
✅ Template generation (inferx template) - NEW!
✅ API generation (FastAPI servers) - NEW!
✅ 4 Template Combinations (YOLO, YOLO+API, OpenVINO, OpenVINO+API) - NEW!

🚧 In Development

🚧 Docker generation (inferx docker) - Future feature
🚧 Project templates (Anomaly, Classification)
🚧 Model zoo integration

📋 TODO

See TODO.md for detailed development tasks and progress.

⚙️ Configuration (Used by All 4 Patterns)

Generated projects include a config.yaml:

# Model settings
model:
  path: "model.onnx"
  type: "yolo"
  
# Inference settings  
inference:
  device: "auto"        # auto, cpu, gpu
  batch_size: 1
  confidence_threshold: 0.25
  
# Input preprocessing
preprocessing:
  input_size: [640, 640]
  normalize: true
  format: "RGB"

🎯 Why InferX?

4 Flexible Usage Patterns

# 1. Library - Import and use in your code
from inferx import InferenceEngine
engine = InferenceEngine("model.onnx")
result = engine.predict("image.jpg")

# 2. CLI - Run from command line
# inferx run model.onnx image.jpg

# 3. Template - Generate project structure
# inferx template yolo --name my-detector

# 4. Full Stack - Generate API + Docker
# inferx template yolo --name my-detector
# cd my-detector
# inferx api
# inferx docker

Problem with Heavy Frameworks

# BentoML - Framework dependency
import bentoml
@bentoml.service(
    resources={"cpu": "2"},
    traffic={"timeout": 20},
)
class MyService:
    # Heavy framework, complex setup

InferX Solution - Clean Code

# Generated inference.py - No framework dependency
import onnxruntime as ort
import numpy as np

class YOLOInferencer:
    def __init__(self, model_path: str):
        self.session = ort.InferenceSession(model_path)
    
    def predict(self, image_path: str):
        # Your clean, minimal code
        return results

Benefits

✅ You own the code - No framework lock-in
✅ Minimal dependencies - Only what you need
✅ Easy to modify - Standard Python code
✅ Production ready - UV project structure
✅ Fast deployment - Small containers
✅ 4 usage patterns - Library, CLI, Template, or Full Stack

🤝 Contributing

✅ Current Status

InferX core inference engines (Library and CLI) are production-ready. Template generation features are in active development.

📋 How to Help

Test current inference engines with your ONNX/OpenVINO models
Use the Library and CLI patterns in your projects and report issues
Suggest template improvements for different model types
Contribute code for template generation features

🔧 Development Setup

git clone https://github.com/yourusername/inferx.git
cd inferx
pip install -e .[dev]

# Run tests
python test_runner.py

# See development tasks
cat TODO.md

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

InferX - Minimal dependency ML inference templates. 🚀

Give us your model. Get template, API, or Docker container.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
examples		examples
inferx		inferx
notes		notes
scripts		scripts
tests		tests
.gitignore		.gitignore
.uvignore		.uvignore
README.md		README.md
TODO.md		TODO.md
USAGE.md		USAGE.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Uh oh!

Uh oh!

omrylcn/inferx

Folders and files

Latest commit

History

Repository files navigation

InferX - 4-in-1 ML Inference Toolkit (In Development Stage)

🎯 Philosophy

🎯 4 Usage Patterns

📦 1. Library Usage (Import in your code)

⚡ 2. CLI Usage (Command line)

3. Template Generation (Project scaffolding) ✅ WORKING

🚢 4. API Server Generation ✅ WORKING

🆚 vs Heavy Frameworks

🏗️ Generated Project Structure

📦 Generated Dependencies

Template Project (pyproject.toml)

Why Minimal Dependencies?

🚀 Quick Start

📥 Installation

🎯 Four Usage Patterns

1. Library Usage (Import in your code)

2. CLI Usage (Command line)

3. Template Generation

4. Full Stack Deployment

🎨 Available Templates ✅ 4 Working Combinations

🚧 Development Status

✅ Currently Available

🚧 In Development

📋 TODO

⚙️ Configuration (Used by All 4 Patterns)

🎯 Why InferX?

4 Flexible Usage Patterns

Problem with Heavy Frameworks

InferX Solution - Clean Code

Benefits

🤝 Contributing

✅ Current Status

📋 How to Help

🔧 Development Setup

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages