Lightweight C++ library for object detection, oriented boxes (OBB), classification, instance segmentation, and pose estimation with ONNX Runtime — plus zero-dependency image I/O.
Bring your exported ONNX model (YOLO, RT-DETR, …), load it in a few lines, and get back friendly structs instead of raw tensors.
- Vizhon
- Tiny, modern C++ API (header in
src/vizhon/vizhon.h). - Backed by ONNX Runtime — choose CPU, CUDA (NVIDIA), or CoreML (Apple Silicon).
- Simple image loader using
stb_image(bundled). - Clear value types:
Detection,BoundingBox,Classes,Segment,Pose. - Stream operators for pretty-printing results (
std::cout << detection;). - Sane tensor conventions: float32, NCHW, normalized coordinates in [0,1].
.
├── 3rd/ # Third-party headers (stb)
├── cmake/ # Finder/Config CMake modules
├── demo/ # Minimal example app
├── docs/ # Sphinx + Doxygen
├── scripts/ # Utilities (export helpers, etc.)
├── src/vizhon/ # Library sources/headers
├── CMakeLists.txt
└── README.md
# 1) Install ONNX Runtime (CPU or GPU build)
# Download prebuilt or build from source.
# 2) Configure & build Vizhon (replace path below with your ORT install)
cmake -S . -B build \
-DONNXRUNTIME_ROOT="/path/to/onnxruntime" \
-DCMAKE_BUILD_TYPE=Release
cmake --build build -j
# 3) Run the demo (expects a model and an image)
./build/demo/vizhon_demo --model path/to/model.onnx --image path/to/image.jpgThe CMake finder script
cmake/FindONNXRUNTIME.cmakelooks for ONNX Runtime in common locations and inONNXRUNTIME_ROOTorONNXRUNTIME_DIR.
-
C++17 (or newer) compiler (Clang, GCC, MSVC)
-
CMake ≥ 3.16
-
ONNX Runtime (CPU or GPU build)
-
Set one of:
-DONNXRUNTIME_ROOT=/path/to/onnxruntime-DONNXRUNTIME_DIR=/path/to/onnxruntime
-
Or install system-wide so the finder can locate it
-
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release \
-DONNXRUNTIME_ROOT="/path/to/onnxruntime"
cmake --build build -jThis builds the library and the demo binary.
Include the single public header:
#include <vizhon/vizhon.h>
using namespace Vizhon;Tensor: typed n-D array wrapper around shared data (float32orint64).Vector<T>: alias ofstd::vector<T>.Size: alias ofint64_t.String: alias ofstd::string.
Tensor shapes use NCHW (NxCxHxW). Many operators accept either NxCxHxW or CxHxW; the latter is treated as batch size = 1.
You may optionally pass original image dimensions as a Tensor of shape Nx2 [(width, height)] so post-processing can map normalized coordinates back to your originals.
Tensor readImage(const std::string& path, int width=-1, int height=-1);- Returns a float32 tensor (CxHxW or NxCxHxW) suitable for inference.
- If
width/heightare set, image is resized withstb_image_resize2.
Tensor img = readImage("assets/dog.jpg"); // keep size
Tensor img640 = readImage("assets/dog.jpg", 640, 640); // resize to 640x640Detector det(Model::YOLO8, "yolov8.onnx", {Device::CUDA, 0});
Tensor img = readImage("dog.jpg", 640, 640);
// Optional: original dimensions (N x 2): width, height
int64_t dims_data[2] = {1920, 1080};
Tensor dims(dims_data, {1,2});
auto results = det(img, dims, /*threshold=*/0.6);
for (const auto& ds : results) {
for (const auto& d : ds) {
std::cout << d << "\n"; // prints [x, y, w, h, class, name, conf]
}
}Detection fields are normalized to [0,1]: x,y are top-left, width,height are size.
OBBFinder obb(Model::YOLO8, "yolo_obb.onnx");
auto boxes = obb(img, dims, 0.5);
for (const auto& per_image : boxes) {
for (const auto& b : per_image) {
std::cout << b << "\n"; // (xcenter, ycenter, w, h, angle, class, name, conf)
}
}Angles are in radians.
Classifier cls(Model::YOLO8, "cls.onnx");
auto out = cls(img);
for (const auto& c : out) {
std::cout << c << "\n"; // prints class, name, and a summary
// full distribution available in c.probs
}Segmenter seg(Model::YOLO8, "seg.onnx");
auto segs = seg(img, dims, 0.5);
for (const auto& per_image : segs) {
for (const auto& s : per_image) {
std::cout << s << "\n"; // prints bbox + mask shape
float v = s.mask(10, 20); // access probability at (row=10, col=20)
}
}Segment::Mask stores rows, cols, and a flattened data buffer.
PoseEstimator pose(Model::YOLO8, "pose.onnx");
auto poses = pose(img);
for (const auto& per_image : poses) {
for (const auto& p : per_image) {
std::cout << p << "\n"; // bbox + keypoints
for (const auto& pt : p.points) {
// pt.x, pt.y are normalized to [0,1], pt.conf is per-keypoint confidence
}
}
}Most exported models have a fixed square input (e.g., 640). For dynamic models, -1 is returned.
Size s1 = det.imageSize(); // e.g., 640 or -1
Size s2 = seg.imageSize();enum class Model {
YOLO5, YOLO8, YOLO9, YOLO10, YOLO11, YOLO12, YOLO_NAS, RT_DETR
};These enums select the appropriate post-processing path for a given head format. Bring your own
.onnxfile produced by your training/export pipeline.
struct Device {
enum Type { CPU, CUDA, CoreML } type;
Size index; // used for CUDA (GPU ordinal)
};Examples:
{Device::CPU}— default{Device::CUDA, 0}— first NVIDIA GPU{Device::CoreML}— Apple Silicon via CoreML EP (requires ORT CoreML build)
# CMakeLists.txt (your project)
add_subdirectory(vizhon) # path to this repo
target_link_libraries(your_app PRIVATE vizhon)
target_include_directories(your_app PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/vizhon/src)Pass your ONNX Runtime location:
cmake -DONNXRUNTIME_ROOT=/opt/onnxruntime ..This repo ships:
cmake/FindONNXRUNTIME.cmakecmake/vizhonConfig.cmake
Typical usage:
list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/vizhon/cmake")
find_package(ONNXRUNTIME REQUIRED)
find_package(vizhon CONFIG REQUIRED) # if installed/exported
target_link_libraries(your_app PRIVATE vizhon ONNXRUNTIME::ONNXRUNTIME)If
find_package(ONNXRUNTIME)can’t locate your install, setONNXRUNTIME_ROOTorONNXRUNTIME_DIR.
This repo includes a tiny helper to download official Ultralytics checkpoints and export them to ONNX in a way that works out-of-the-box with Vizhon’s post-processing.
# (optional) create & activate a venv
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install --upgrade ultralytics-
Downloads a model
.ptfrom Ultralytics GitHub releases (configurable tag; defaultv8.3.0). -
Exports to ONNX with:
dynamic=True(dynamic input shapes)simplify=True(graph simplified)nms=Truefor all tasks except classification
-
Saves both the
.ptand.onnxinto your chosen--outputdirectory, reusing cached files on subsequent runs. -
Enforces common invalid combinations early with clear errors (see below).
--model:yolo5 | yolo8 | yolo9 | yolo10 | yolo11 | yolo12 | yolo-nas(Note: RT-DETR is supported by Vizhon’s C++ API but not by this Python exporter yet.)--size:nano | small | medium | big | large | xlarge--task:detect | segment | pose | classify | bbox(default:detect)--version: Ultralytics release tag (default:v8.3.0)--output: directory to store artifacts
- Models other than YOLOv10 do not provide the big size.
- YOLOv5, YOLOv10, YOLOv12, YOLO-NAS → detection tasks only.
- YOLOv9 → detection and segmentation only; segmentation requires large or xlarge.
- YOLO-NAS supports small, medium, large sizes only.
If you pass an invalid combo, the tool exits with a friendly error message.
# 1) YOLOv8n detection → ONNX (dynamic, simplified, with NMS)
python scripts/exporter.py \
--model yolo8 --size nano --task detect \
--output exported/
# 2) YOLOv9x segmentation (valid sizes: large/xlarge)
python scripts/exporter.py \
--model yolo9 --size xlarge --task segment \
--output exported/
# 3) YOLOv10 big detection (v10 uniquely supports 'big')
python scripts/exporter.py \
--model yolo10 --size big \
--output exported/
# 4) YOLO-NAS large detection
python scripts/exporter.py \
--model yolo-nas --size large \
--output exported/
# 5) Use a specific Ultralytics release tag (see GH releases)
python scripts/exporter.py \
--model yolo11 --size small --version v8.3.0 \
--output exported/After a successful run you’ll see something like:
exported/
├── yolo11s.pt
└── yolo11s.onnx ← use this with Vizhon::Detector / Segmenter / ...
from scripts.exporter import Exporter, Model, Size, Task, ExporterError
exp = Exporter(version="v8.3.0", outdir="exported")
try:
onnx_path = exp.export(model=Model.YOLO8, size=Size.NANO, task=Task.DETECTION)
print("ONNX written to:", onnx_path)
except ExporterError as e:
print("Export failed:", e)#include <vizhon/vizhon.h>
using namespace Vizhon;
Detector det(Model::YOLO8, "exported/yolov8n.onnx", {Device::CPU});
auto img = readImage("image.jpg", det.imageSize(), det.imageSize());
auto out = det(img, /*threshold=*/0.6);Tip: If your model is classification, the exporter disables NMS (
nms=False) automatically. All other tasks get ONNX with NMS baked in.
A minimal OpenCV-based demo lets you run detection, OBB, segmentation, pose, or classification on an image or a video/stream, draw results, and optionally save the visualized output.
cmake -S . -B build -DONNXRUNTIME_ROOT=/path/to/onnxruntime
cmake --build build -j
./build/demo/vizhon_demo [options]Required:
-t, --type {yolo5|yolo8|yolo9|yolo10|yolo11|yolo12|yolo-nas|rt-detr}
-m, --model PATH/TO/MODEL.onnx
-k, --task {detect|bbox|segment|pose|classify}
Input (pick one):
-i, --image PATH/TO/IMAGE
-v, --video VIDEO_SOURCE (file path, RTSP/HTTP URL, or camera index like "0")
Optional:
-d, --device {cpu|cuda|apple} (default: cpu)
-o, --output PATH/TO/SAVE (image or video file)
-
Internally uses an input image_size = 640.
-
Image mode: runs once, prints detections to
stdout, renders and saves (if--outputis given). -
Video/stream mode: loops reading frames, runs inference each frame, renders live, saves if
--outputis given, and exits on any key press (cv::waitKey(1)). -
Devices map to Vizhon’s
Device:cpu→Device::CPU(default)cuda→Device::CUDAapple→Device::CoreML
-
Tasks map to Vizhon runners:
detect→Vizhon::Detectorbbox→Vizhon::OBBFinder(oriented boxes)segment→Vizhon::Segmenterpose→Vizhon::PoseEstimatorclassify→Vizhon::Classifier
-
Model type strings map to the enum:
yolo5|yolo8|yolo9|yolo10|yolo11|yolo12|yolo-nas|rt-detr.
# 1) YOLOv8 detection on a single image (CPU)
./vizhon_demo \
--type yolo8 \
--model exported/yolov8n.onnx \
--task detect \
--image assets/dog.jpg \
--output runs/dog_vis.jpg
# 2) YOLOv9 OBB on a video file (CUDA) and save annotated mp4
./vizhon_demo \
-t yolo9 -m exported/yolov9c-obb.onnx -k bbox \
-v assets/drive.mp4 -d cuda -o runs/drive_obb.mp4
# 3) YOLO-NAS segmentation on webcam 0 (Apple CoreML)
./vizhon_demo \
-t yolo-nas -m exported/yolo_nas_l.onnx -k segment \
-v 0 -d apple
# 4) RT-DETR detection on RTSP stream
./vizhon_demo \
-t rt-detr -m exported/rtdetr.onnx -k detect \
-v rtsp://user:pass@host:554/stream1If neither
--imagenor--videois provided, or if required flags are missing/invalid, the demo throws a clear runtime error and prints the message before exiting.
Sphinx + Doxygen live in docs/.
# Python env with Sphinx
pip install -r docs/requirements.txt
cd docs
make html
# open _build/html/index.htmlQ: What tensor layout does Vizhon expect?
A: Float32 NCHW. Provide NxCxHxW or CxHxW (treated as batch size 1).
Q: Are coordinates absolute pixels or normalized?
A: All boxes and keypoints are normalized to [0,1]. If you pass original dimensions (Nx2 with width,height), you can easily convert to pixels.
Q: How do I select GPU/CPU?
A: Pass a Device to the constructor: {Device::CPU}, {Device::CUDA, 0}, or {Device::CoreML}. Make sure your ONNX Runtime build includes the corresponding execution provider.
Q: Which YOLO versions are supported? A: The enum includes YOLOv5 through YOLOv12, YOLO-NAS, and RT-DETR. Post-processing is chosen by this enum; ensure your model head matches the selected type.
Issues and PRs are welcome! If you’re adding support for a new head format, include:
- Minimal ONNX snippet or spec of the output tensors
- A tiny test (synthetic outputs → parsed structures)
- Docs update in
docs/and a short demo
This project is released under the terms of the LICENSE in the repository.
Happy building & shipping! 🚀