Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ShahriarRezghi/vizhon

Repository files navigation

Vizhon

Vizhon logo

Lightweight C++ library for object detection, oriented boxes (OBB), classification, instance segmentation, and pose estimation with ONNX Runtime — plus zero-dependency image I/O.

Bring your exported ONNX model (YOLO, RT-DETR, …), load it in a few lines, and get back friendly structs instead of raw tensors.

Contents

Features

  • Tiny, modern C++ API (header in src/vizhon/vizhon.h).
  • Backed by ONNX Runtime — choose CPU, CUDA (NVIDIA), or CoreML (Apple Silicon).
  • Simple image loader using stb_image (bundled).
  • Clear value types: Detection, BoundingBox, Classes, Segment, Pose.
  • Stream operators for pretty-printing results (std::cout << detection;).
  • Sane tensor conventions: float32, NCHW, normalized coordinates in [0,1].

Project layout

.
├── 3rd/                    # Third-party headers (stb)
├── cmake/                  # Finder/Config CMake modules
├── demo/                   # Minimal example app
├── docs/                   # Sphinx + Doxygen
├── scripts/                # Utilities (export helpers, etc.)
├── src/vizhon/             # Library sources/headers
├── CMakeLists.txt
└── README.md

Quickstart

# 1) Install ONNX Runtime (CPU or GPU build)
#    Download prebuilt or build from source.

# 2) Configure & build Vizhon (replace path below with your ORT install)
cmake -S . -B build \
  -DONNXRUNTIME_ROOT="/path/to/onnxruntime" \
  -DCMAKE_BUILD_TYPE=Release
cmake --build build -j

# 3) Run the demo (expects a model and an image)
./build/demo/vizhon_demo --model path/to/model.onnx --image path/to/image.jpg

The CMake finder script cmake/FindONNXRUNTIME.cmake looks for ONNX Runtime in common locations and in ONNXRUNTIME_ROOT or ONNXRUNTIME_DIR.

Building from source

Prerequisites

  • C++17 (or newer) compiler (Clang, GCC, MSVC)

  • CMake ≥ 3.16

  • ONNX Runtime (CPU or GPU build)

    • Set one of:

      • -DONNXRUNTIME_ROOT=/path/to/onnxruntime
      • -DONNXRUNTIME_DIR=/path/to/onnxruntime
    • Or install system-wide so the finder can locate it

Configure & build

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release \
      -DONNXRUNTIME_ROOT="/path/to/onnxruntime"
cmake --build build -j

This builds the library and the demo binary.

Using the library

Include the single public header:

#include <vizhon/vizhon.h>
using namespace Vizhon;

Common types

  • Tensor: typed n-D array wrapper around shared data (float32 or int64).
  • Vector<T>: alias of std::vector<T>.
  • Size: alias of int64_t.
  • String: alias of std::string.

Tensor shapes use NCHW (NxCxHxW). Many operators accept either NxCxHxW or CxHxW; the latter is treated as batch size = 1.

You may optionally pass original image dimensions as a Tensor of shape Nx2 [(width, height)] so post-processing can map normalized coordinates back to your originals.

Image loading

Tensor readImage(const std::string& path, int width=-1, int height=-1);
  • Returns a float32 tensor (CxHxW or NxCxHxW) suitable for inference.
  • If width/height are set, image is resized with stb_image_resize2.
Tensor img = readImage("assets/dog.jpg");              // keep size
Tensor img640 = readImage("assets/dog.jpg", 640, 640); // resize to 640x640

Detections (2D boxes)

Detector det(Model::YOLO8, "yolov8.onnx", {Device::CUDA, 0});

Tensor img = readImage("dog.jpg", 640, 640);
// Optional: original dimensions (N x 2): width, height
int64_t dims_data[2] = {1920, 1080};
Tensor dims(dims_data, {1,2});

auto results = det(img, dims, /*threshold=*/0.6);
for (const auto& ds : results) {
  for (const auto& d : ds) {
    std::cout << d << "\n"; // prints [x, y, w, h, class, name, conf]
  }
}

Detection fields are normalized to [0,1]: x,y are top-left, width,height are size.

Oriented boxes (OBB)

OBBFinder obb(Model::YOLO8, "yolo_obb.onnx");

auto boxes = obb(img, dims, 0.5);
for (const auto& per_image : boxes) {
  for (const auto& b : per_image) {
    std::cout << b << "\n"; // (xcenter, ycenter, w, h, angle, class, name, conf)
  }
}

Angles are in radians.

Classification

Classifier cls(Model::YOLO8, "cls.onnx");
auto out = cls(img);

for (const auto& c : out) {
  std::cout << c << "\n";      // prints class, name, and a summary
  // full distribution available in c.probs
}

Instance segmentation

Segmenter seg(Model::YOLO8, "seg.onnx");
auto segs = seg(img, dims, 0.5);

for (const auto& per_image : segs) {
  for (const auto& s : per_image) {
    std::cout << s << "\n";              // prints bbox + mask shape
    float v = s.mask(10, 20);            // access probability at (row=10, col=20)
  }
}

Segment::Mask stores rows, cols, and a flattened data buffer.

Pose estimation

PoseEstimator pose(Model::YOLO8, "pose.onnx");
auto poses = pose(img);

for (const auto& per_image : poses) {
  for (const auto& p : per_image) {
    std::cout << p << "\n"; // bbox + keypoints
    for (const auto& pt : p.points) {
      // pt.x, pt.y are normalized to [0,1], pt.conf is per-keypoint confidence
    }
  }
}

Getting the expected input size

Most exported models have a fixed square input (e.g., 640). For dynamic models, -1 is returned.

Size s1 = det.imageSize();   // e.g., 640 or -1
Size s2 = seg.imageSize();

Models & devices

Supported model enums

enum class Model {
  YOLO5, YOLO8, YOLO9, YOLO10, YOLO11, YOLO12, YOLO_NAS, RT_DETR
};

These enums select the appropriate post-processing path for a given head format. Bring your own .onnx file produced by your training/export pipeline.

Device selection

struct Device {
  enum Type { CPU, CUDA, CoreML } type;
  Size index; // used for CUDA (GPU ordinal)
};

Examples:

  • {Device::CPU} — default
  • {Device::CUDA, 0} — first NVIDIA GPU
  • {Device::CoreML} — Apple Silicon via CoreML EP (requires ORT CoreML build)

CMake integration

As a subdirectory

# CMakeLists.txt (your project)
add_subdirectory(vizhon)        # path to this repo
target_link_libraries(your_app PRIVATE vizhon)
target_include_directories(your_app PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/vizhon/src)

Pass your ONNX Runtime location:

cmake -DONNXRUNTIME_ROOT=/opt/onnxruntime ..

Via find_package

This repo ships:

  • cmake/FindONNXRUNTIME.cmake
  • cmake/vizhonConfig.cmake

Typical usage:

list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/vizhon/cmake")
find_package(ONNXRUNTIME REQUIRED)

find_package(vizhon CONFIG REQUIRED) # if installed/exported
target_link_libraries(your_app PRIVATE vizhon ONNXRUNTIME::ONNXRUNTIME)

If find_package(ONNXRUNTIME) can’t locate your install, set ONNXRUNTIME_ROOT or ONNXRUNTIME_DIR.

Exporting YOLO checkpoints to ONNX (scripts/exporter.py)

This repo includes a tiny helper to download official Ultralytics checkpoints and export them to ONNX in a way that works out-of-the-box with Vizhon’s post-processing.

Install (one-time)

# (optional) create & activate a venv
python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate

pip install --upgrade ultralytics

What it does

  • Downloads a model .pt from Ultralytics GitHub releases (configurable tag; default v8.3.0).

  • Exports to ONNX with:

    • dynamic=True (dynamic input shapes)
    • simplify=True (graph simplified)
    • nms=True for all tasks except classification
  • Saves both the .pt and .onnx into your chosen --output directory, reusing cached files on subsequent runs.

  • Enforces common invalid combinations early with clear errors (see below).

Supported choices

  • --model: yolo5 | yolo8 | yolo9 | yolo10 | yolo11 | yolo12 | yolo-nas (Note: RT-DETR is supported by Vizhon’s C++ API but not by this Python exporter yet.)
  • --size: nano | small | medium | big | large | xlarge
  • --task: detect | segment | pose | classify | bbox (default: detect)
  • --version: Ultralytics release tag (default: v8.3.0)
  • --output: directory to store artifacts

Constraints enforced by the tool

  • Models other than YOLOv10 do not provide the big size.
  • YOLOv5, YOLOv10, YOLOv12, YOLO-NASdetection tasks only.
  • YOLOv9detection and segmentation only; segmentation requires large or xlarge.
  • YOLO-NAS supports small, medium, large sizes only.

If you pass an invalid combo, the tool exits with a friendly error message.

CLI examples

# 1) YOLOv8n detection → ONNX (dynamic, simplified, with NMS)
python scripts/exporter.py \
  --model yolo8 --size nano --task detect \
  --output exported/

# 2) YOLOv9x segmentation (valid sizes: large/xlarge)
python scripts/exporter.py \
  --model yolo9 --size xlarge --task segment \
  --output exported/

# 3) YOLOv10 big detection (v10 uniquely supports 'big')
python scripts/exporter.py \
  --model yolo10 --size big \
  --output exported/

# 4) YOLO-NAS large detection
python scripts/exporter.py \
  --model yolo-nas --size large \
  --output exported/

# 5) Use a specific Ultralytics release tag (see GH releases)
python scripts/exporter.py \
  --model yolo11 --size small --version v8.3.0 \
  --output exported/

After a successful run you’ll see something like:

exported/
├── yolo11s.pt
└── yolo11s.onnx   ← use this with Vizhon::Detector / Segmenter / ...

Programmatic usage

from scripts.exporter import Exporter, Model, Size, Task, ExporterError

exp = Exporter(version="v8.3.0", outdir="exported")
try:
    onnx_path = exp.export(model=Model.YOLO8, size=Size.NANO, task=Task.DETECTION)
    print("ONNX written to:", onnx_path)
except ExporterError as e:
    print("Export failed:", e)

Using the exported ONNX with Vizhon

#include <vizhon/vizhon.h>
using namespace Vizhon;

Detector det(Model::YOLO8, "exported/yolov8n.onnx", {Device::CPU});
auto img = readImage("image.jpg", det.imageSize(), det.imageSize());
auto out = det(img, /*threshold=*/0.6);

Tip: If your model is classification, the exporter disables NMS (nms=False) automatically. All other tasks get ONNX with NMS baked in.

Demo

A minimal OpenCV-based demo lets you run detection, OBB, segmentation, pose, or classification on an image or a video/stream, draw results, and optionally save the visualized output.

Build & run

cmake -S . -B build -DONNXRUNTIME_ROOT=/path/to/onnxruntime
cmake --build build -j
./build/demo/vizhon_demo [options]

Usage

Required:
  -t, --type      {yolo5|yolo8|yolo9|yolo10|yolo11|yolo12|yolo-nas|rt-detr}
  -m, --model     PATH/TO/MODEL.onnx
  -k, --task      {detect|bbox|segment|pose|classify}

Input (pick one):
  -i, --image     PATH/TO/IMAGE
  -v, --video     VIDEO_SOURCE   (file path, RTSP/HTTP URL, or camera index like "0")

Optional:
  -d, --device    {cpu|cuda|apple}   (default: cpu)
  -o, --output    PATH/TO/SAVE       (image or video file)

Behavior

  • Internally uses an input image_size = 640.

  • Image mode: runs once, prints detections to stdout, renders and saves (if --output is given).

  • Video/stream mode: loops reading frames, runs inference each frame, renders live, saves if --output is given, and exits on any key press (cv::waitKey(1)).

  • Devices map to Vizhon’s Device:

    • cpuDevice::CPU (default)
    • cudaDevice::CUDA
    • appleDevice::CoreML
  • Tasks map to Vizhon runners:

    • detectVizhon::Detector
    • bboxVizhon::OBBFinder (oriented boxes)
    • segmentVizhon::Segmenter
    • poseVizhon::PoseEstimator
    • classifyVizhon::Classifier
  • Model type strings map to the enum: yolo5|yolo8|yolo9|yolo10|yolo11|yolo12|yolo-nas|rt-detr.

Examples

# 1) YOLOv8 detection on a single image (CPU)
./vizhon_demo \
  --type yolo8 \
  --model exported/yolov8n.onnx \
  --task detect \
  --image assets/dog.jpg \
  --output runs/dog_vis.jpg

# 2) YOLOv9 OBB on a video file (CUDA) and save annotated mp4
./vizhon_demo \
  -t yolo9 -m exported/yolov9c-obb.onnx -k bbox \
  -v assets/drive.mp4 -d cuda -o runs/drive_obb.mp4

# 3) YOLO-NAS segmentation on webcam 0 (Apple CoreML)
./vizhon_demo \
  -t yolo-nas -m exported/yolo_nas_l.onnx -k segment \
  -v 0 -d apple

# 4) RT-DETR detection on RTSP stream
./vizhon_demo \
  -t rt-detr -m exported/rtdetr.onnx -k detect \
  -v rtsp://user:pass@host:554/stream1

If neither --image nor --video is provided, or if required flags are missing/invalid, the demo throws a clear runtime error and prints the message before exiting.

Docs

Sphinx + Doxygen live in docs/.

# Python env with Sphinx
pip install -r docs/requirements.txt
cd docs
make html
# open _build/html/index.html

FAQ

Q: What tensor layout does Vizhon expect? A: Float32 NCHW. Provide NxCxHxW or CxHxW (treated as batch size 1).

Q: Are coordinates absolute pixels or normalized? A: All boxes and keypoints are normalized to [0,1]. If you pass original dimensions (Nx2 with width,height), you can easily convert to pixels.

Q: How do I select GPU/CPU? A: Pass a Device to the constructor: {Device::CPU}, {Device::CUDA, 0}, or {Device::CoreML}. Make sure your ONNX Runtime build includes the corresponding execution provider.

Q: Which YOLO versions are supported? A: The enum includes YOLOv5 through YOLOv12, YOLO-NAS, and RT-DETR. Post-processing is chosen by this enum; ensure your model head matches the selected type.

Contributing

Issues and PRs are welcome! If you’re adding support for a new head format, include:

  • Minimal ONNX snippet or spec of the output tensors
  • A tiny test (synthetic outputs → parsed structures)
  • Docs update in docs/ and a short demo

License

This project is released under the terms of the LICENSE in the repository.

Acknowledgements

Happy building & shipping! 🚀

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •