Thanks to visit codestin.com
Credit goes to github.com

Skip to content
/ inferx Public

Lightweight ML Inference Runtime

omrylcn/inferx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

InferX - 4-in-1 ML Inference Toolkit (In Development Stage)

"One tool, four ways to deploy your model: Library, CLI, Template, or Full Stack"

License: MIT Python 3.11+ ONNX OpenVINO

🎯 Philosophy

4 ways to use InferX - Choose what fits your needs:

  1. πŸ“¦ Library - Import and use directly in your Python code
  2. ⚑ CLI - Run models directly from command line
  3. πŸ—οΈ Template Generator - Generate ready-to-use project templates
  4. 🚒 Full Stack - Generate API servers and Docker containers

Unlike heavy frameworks, InferX gives you clean, minimal dependency code that you own completely. No framework lock-in, no heavy dependencies.

🎯 4 Usage Patterns

πŸ“¦ 1. Library Usage (Import in your code)

from inferx import InferenceEngine

# Use directly in your Python applications
engine = InferenceEngine("model.onnx", device="gpu")
result = engine.predict("image.jpg")

# Batch processing
results = engine.predict_batch(["img1.jpg", "img2.jpg"])

⚑ 2. CLI Usage (Command line)

# Run inference directly from command line
inferx run model.onnx image.jpg --device gpu

# Batch processing with output
inferx run model.xml images/ --output results.json --runtime openvino

# Device optimization
inferx run model.xml image.jpg --device myriad --runtime openvino

3. Template Generation (Project scaffolding) βœ… WORKING

# Generate YOLO ONNX project
uv run inferx template --model-type yolo --name my-detector
cd my-detector && uv sync

# Generate YOLO OpenVINO project  
uv run inferx template --model-type yolo_openvino --name my-openvino-detector
cd my-openvino-detector && uv sync --extra openvino

# Generate with API server
uv run inferx template --model-type yolo --name my-api-detector --with-api
cd my-api-detector && uv sync --extra api

# Copy your model file
uv run inferx template --model-type yolo --name my-detector --model-path /path/to/model.onnx

# Project structure:
# β”œβ”€β”€ pyproject.toml         # UV-compatible dependencies
# β”œβ”€β”€ src/
# β”‚   β”œβ”€β”€ inferencer.py      # YOLO inference implementation  
# β”‚   β”œβ”€β”€ server.py          # FastAPI server (if --with-api)
# β”‚   └── [base.py, utils.py, exceptions.py]  # Supporting files
# β”œβ”€β”€ models/yolo_model.onnx # Your model file
# └── config.yaml           # Configuration

🚒 4. API Server Generation βœ… WORKING

# Generate with API server included
uv run inferx template --model-type yolo --name my-api-detector --with-api
cd my-api-detector

# Install dependencies
uv sync --extra api

# Start API server
uv run --extra api python -m src.server
# Server runs at: http://0.0.0.0:8080

# Test API endpoints
curl -X GET "http://localhost:8080/"                           # Health check
curl -X GET "http://localhost:8080/info"                       # Model info
curl -X POST "http://localhost:8080/predict" -F "[email protected]"  # Inference

πŸ†š vs Heavy Frameworks

Framework Dependencies Container Size Approach
InferX ONNX Runtime only (~50MB) ~75MB Code generation
BentoML Full framework stack ~900MB Framework-based
TorchServe PyTorch + dependencies ~1.2GB Framework-based
TF Serving TensorFlow ~800MB Framework-based

πŸ—οΈ Generated Project Structure

When you run inferx template yolo --name my-detector:

my-detector/                    # Your standalone project
β”œβ”€β”€ pyproject.toml             # UV project with minimal deps
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ inferencer.py          # YOLO inference implementation (inherits from InferX YOLOInferencer)
β”‚   └── base.py                # Base inferencer class
β”œβ”€β”€ models/
β”‚   └── yolo_model.onnx        # Place your YOLO model here (or .xml/.bin for OpenVINO)
β”œβ”€β”€ config.yaml                # Inference configuration
β”œβ”€β”€ README.md                  # Usage instructions
└── .gitignore                 # Standard Python gitignore

When you run inferx template yolo_openvino --name my-detector:

my-detector/                    # Your standalone project
β”œβ”€β”€ pyproject.toml             # UV project with minimal deps
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ inferencer.py          # YOLO OpenVINO inference implementation (inherits from InferX YOLOOpenVINOInferencer)
β”‚   └── base.py                # Base inferencer class
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ yolo_model.xml         # Place your YOLO OpenVINO model .xml file here
β”‚   └── yolo_model.bin         # Place your YOLO OpenVINO model .bin file here
β”œβ”€β”€ config.yaml                # Inference configuration
β”œβ”€β”€ README.md                  # Usage instructions
└── .gitignore                 # Standard Python gitignore

After inferx api:

my-detector/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ inferencer.py          # Existing
β”‚   β”œβ”€β”€ base.py                # Existing
β”‚   └── server.py              # Generated FastAPI app
└── requirements-api.txt       # +FastAPI only

After inferx docker:

my-detector/
β”œβ”€β”€ Dockerfile                 # Multi-stage optimized
β”œβ”€β”€ docker-compose.yml         # Ready to deploy
└── .dockerignore             # Build optimization

πŸ“¦ Generated Dependencies

Template Project (pyproject.toml)

[project]
name = "my-detector"
version = "0.1.0"
dependencies = [
    "onnxruntime>=1.16.0",           # ~50MB
    "numpy>=1.24.0",                 # Array operations
    "opencv-python-headless>=4.8.0", # Image processing
]

[project.optional-dependencies]
api = ["fastapi>=0.104.0", "uvicorn>=0.24.0"]  # Only when using API
gpu = ["onnxruntime-gpu>=1.16.0"]               # Only for GPU inference
openvino = ["openvino>=2023.3.0"]               # Intel optimization

Why Minimal Dependencies?

  • Production safety: Fewer dependencies = fewer security vulnerabilities
  • Faster deployment: Smaller containers, faster startup
  • Cost efficiency: Less compute resources needed
  • Maintenance: Easier to update and maintain

πŸš€ Quick Start

πŸ“₯ Installation

# Install from PyPI (when available)
pip install inferx

# Or install from source
git clone https://github.com/yourusername/inferx.git
cd inferx
pip install -e .

🎯 Four Usage Patterns

1. Library Usage (Import in your code)

from inferx import InferenceEngine

# Use directly in your Python applications
engine = InferenceEngine("model.onnx", device="gpu")
result = engine.predict("image.jpg")
print(result)

2. CLI Usage (Command line)

# Run inference directly from command line
inferx run model.onnx image.jpg --device gpu

# Batch processing
inferx run model.xml images/ --output results.json --runtime openvino

3. Template Generation

# Create YOLO detection project
inferx template yolo --name my-detector
cd my-detector

# Project structure:
# β”œβ”€β”€ src/inference.py    # YOLO inference code
# β”œβ”€β”€ model.onnx         # Place your model here
# └── pyproject.toml     # Minimal dependencies

# Test inference
uv run python -m src.inference test_image.jpg

4. Full Stack Deployment

# Start with template
inferx template yolo --name my-detector
cd my-detector

# Add API server
inferx api

# Add Docker deployment
inferx docker

# Start server
uv run python -m src.server

# Or deploy with Docker
docker build -t my-detector:v1 .
docker run -p 8080:8080 my-detector:v1

🎨 Available Templates βœ… 4 Working Combinations

# 1. YOLO ONNX (Basic)
uv run inferx template --model-type yolo --name my-yolo-project

# 2. YOLO ONNX (with FastAPI)  
uv run inferx template --model-type yolo --name my-yolo-api --with-api

# 3. YOLO OpenVINO (Basic)
uv run inferx template --model-type yolo_openvino --name my-openvino-project

# 4. YOLO OpenVINO (with FastAPI)
uv run inferx template --model-type yolo_openvino --name my-openvino-api --with-api

# 🚧 Coming Soon:
# - Anomaly detection templates
# - Image classification templates  
# - Custom ONNX model templates

🚧 Development Status

βœ… Currently Available

  • βœ… Basic inference engines (ONNX + OpenVINO)
  • βœ… Configuration system
  • βœ… CLI structure
  • βœ… Testing framework
  • βœ… Project examples
  • βœ… Library usage pattern
  • βœ… CLI usage pattern
  • βœ… Template generation (inferx template) - NEW!
  • βœ… API generation (FastAPI servers) - NEW!
  • βœ… 4 Template Combinations (YOLO, YOLO+API, OpenVINO, OpenVINO+API) - NEW!

🚧 In Development

  • 🚧 Docker generation (inferx docker) - Future feature
  • 🚧 Project templates (Anomaly, Classification)
  • 🚧 Model zoo integration

πŸ“‹ TODO

See TODO.md for detailed development tasks and progress.

βš™οΈ Configuration (Used by All 4 Patterns)

Generated projects include a config.yaml:

# Model settings
model:
  path: "model.onnx"
  type: "yolo"
  
# Inference settings  
inference:
  device: "auto"        # auto, cpu, gpu
  batch_size: 1
  confidence_threshold: 0.25
  
# Input preprocessing
preprocessing:
  input_size: [640, 640]
  normalize: true
  format: "RGB"

🎯 Why InferX?

4 Flexible Usage Patterns

# 1. Library - Import and use in your code
from inferx import InferenceEngine
engine = InferenceEngine("model.onnx")
result = engine.predict("image.jpg")

# 2. CLI - Run from command line
# inferx run model.onnx image.jpg

# 3. Template - Generate project structure
# inferx template yolo --name my-detector

# 4. Full Stack - Generate API + Docker
# inferx template yolo --name my-detector
# cd my-detector
# inferx api
# inferx docker

Problem with Heavy Frameworks

# BentoML - Framework dependency
import bentoml
@bentoml.service(
    resources={"cpu": "2"},
    traffic={"timeout": 20},
)
class MyService:
    # Heavy framework, complex setup

InferX Solution - Clean Code

# Generated inference.py - No framework dependency
import onnxruntime as ort
import numpy as np

class YOLOInferencer:
    def __init__(self, model_path: str):
        self.session = ort.InferenceSession(model_path)
    
    def predict(self, image_path: str):
        # Your clean, minimal code
        return results

Benefits

  • βœ… You own the code - No framework lock-in
  • βœ… Minimal dependencies - Only what you need
  • βœ… Easy to modify - Standard Python code
  • βœ… Production ready - UV project structure
  • βœ… Fast deployment - Small containers
  • βœ… 4 usage patterns - Library, CLI, Template, or Full Stack

🀝 Contributing

βœ… Current Status

InferX core inference engines (Library and CLI) are production-ready. Template generation features are in active development.

πŸ“‹ How to Help

  1. Test current inference engines with your ONNX/OpenVINO models
  2. Use the Library and CLI patterns in your projects and report issues
  3. Suggest template improvements for different model types
  4. Contribute code for template generation features

πŸ”§ Development Setup

git clone https://github.com/yourusername/inferx.git
cd inferx
pip install -e .[dev]

# Run tests
python test_runner.py

# See development tasks
cat TODO.md

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


InferX - Minimal dependency ML inference templates. πŸš€

Give us your model. Get template, API, or Docker container.

About

Lightweight ML Inference Runtime

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published