LycheeAI x REVEL Hackathon Project — End-to-End Automation for Industrial Object Detection
In robotic perception systems, acquiring real-world annotated datasets for tasks such as Pick & Place introduces significant cost, safety risk, and logistical complexity. As outlined in NVIDIA’s Synthetic Data Best Practices:
“Synthetic data enables perception models to be trained safely, scalably, and under controlled conditions — a critical advantage in robotics.”
This pipeline implements that principle with strict adherence to reproducibility, modularity, and compatibility with industrial toolchains:
- Controlled variation: Objects remain static; only camera pose and lighting parameters are randomized — ensuring scenario integrity.
- Industry-standard output formats: KITTI → YOLO conversion ensures compatibility with Ultralytics training frameworks and NVIDIA TAO Toolkit.
- GPU-accelerated inference: CUDA + TensorRT support enables low-latency deployment on edge devices.
- ROS 2 integration: Real-time detector node publishes standardized
vision_msgs/Detection2DArraymessages for seamless integration into robotic control stacks.
Each stage depends strictly on successful completion of its predecessor. In case of failure, re-execution of the same command is required — substitutions or workarounds are not recommended. The system is optimized for hackathon constraints (time, compute resources) without compromising reproducibility.
Semantics.class=<label>tagging enables automatic bounding box annotation via NVIDIA Replicator — see Replicator Semantic Annotation Guide.- Camera jitter is constrained to position and look-at vectors only — physical object positions remain fixed to preserve scene semantics.
- Lighting randomization (
dome+sphere) introduces naturalistic shadow and reflection variations — improving model robustness to illumination changes (Isaac Sim Lighting Best Practices).
# === REVEL → KITTI dataset (train/val), safe jitter (objects fixed) ===
import omni.replicator.core as rep, os
from pathlib import Path
USER = os.getenv("USER")
OUT_ROOT = Path(f"/home/{USER}/datasets/revel_kitti") # Output directory
RES = (1280, 720) # Image resolution — HD standard
N_TOTAL = 6000 # Total frame count
SPLIT = 0.9 # Train ratio (90%)
N_TRAIN = int(N_TOTAL * SPLIT)
N_VAL = N_TOTAL - N_TRAIN
# 1) Target: All objects with Semantics.class tag
targets = rep.get.prims(semantics=[('class','*')])
assert targets, "ERROR: No object found with 'Semantics.class' tag. Add it to Root Xform!"
# 2) Camera + Render Product — FOV and gaze point approximating human eye level
cam = rep.create.camera(position=(0,1.6,2.2), look_at=(0,0.6,0), focal_length=24.0)
rp = rep.create.render_product(cam, RES)
# 3) JITTER ONLY LIGHT AND CAMERA — DO NOT MOVE OBJECTS!
def jitter():
# Dome light: ambient illumination — randomized temperature and intensity
rep.randomizer.light(light_type='dome',
intensity=rep.distribution.uniform(800, 3000),
temperature=rep.distribution.uniform(4500, 8500))
# Sphere light: directional spot effect — randomized position and count
rep.create.light(light_type='sphere',
intensity=rep.distribution.uniform(1000, 4000),
position =rep.distribution.uniform((-2,2,-2),(2,4,2)),
temperature=rep.distribution.uniform(3000, 9000),
count =rep.distribution.choice([1,2,3]))
# Slightly perturb camera — focusing on objects
with cam:
rep.modify.pose(
position=rep.distribution.uniform((-1.5,1.2,-1.5),(1.5,2.2,1.5)), # +/- 1.5m lateral
look_at =rep.distribution.choice(targets) # Look at random object
)
rep.randomizer.register(jitter) # Register randomizer
writer = rep.WriterRegistry.get("KittiWriter") # Write in standard KITTI format
def run_split(out_dir, frames):
writer.initialize(output_dir=str(out_dir)) # Set output directory
writer.attach([rp]) # Attach render product
print(f"[RUN] {out_dir} frames={frames}")
rep.orchestrator.run(num_frames=frames) # Generate frames
# Generate train and val sets separately
run_split(OUT_ROOT / "train", N_TRAIN)
run_split(OUT_ROOT / "val", N_VAL)
print(f"[DONE] KITTI dataset ready → {OUT_ROOT}")ls ~/datasets/revel_kitti/train/rgb_*.png | head -5
ls ~/datasets/revel_kitti/train/label_*.txt | head -5
echo "Train: $(find ~/datasets/revel_kitti/train -name "*.png" | wc -l) images"
echo "Val: $(find ~/datasets/revel_kitti/val -name "*.png" | wc -l) images"Expected Output:
train/→ 5400 images + corresponding label filesval/→ 600 images + corresponding label filesIf label files are empty, verify that all target objects in Isaac Sim have correctly assigned
Semantics.classtags.
KITTI format is designed for 3D object detection and uses absolute pixel coordinates and class names. YOLO format requires normalized 2D bounding boxes in the structure:
(class_id center_x center_y width height) — directly consumable by Ultralytics YOLO trainers and compatible with NVIDIA TAO Toolkit.
import os, cv2, shutil
KITTI_ROOT = os.path.expanduser("~/datasets/revel_kitti")
YOLO_ROOT = os.path.expanduser("~/datasets/revel_yolo")
CLASSES_TXT= os.path.expanduser("~/classes.txt") # One class name per line
IMG_EXT = (".png", ".jpg", ".jpeg")
# Load class names and map to IDs
classes = [l.strip() for l in open(CLASSES_TXT) if l.strip()]
name2id = {n:i for i,n in enumerate(classes)} # "bottle" → 0, "can" → 1, ...
def find_label_for(stem, search_dir):
"""Find KITTI label file — usually in same directory"""
for root, _, files in os.walk(search_dir):
if stem + ".txt" in files:
return os.path.join(root, stem + ".txt")
return None
def convert_split(split):
split_dir = os.path.join(KITTI_ROOT, split)
out_img = os.path.join(YOLO_ROOT, "images", split)
out_lbl = os.path.join(YOLO_ROOT, "labels", split)
os.makedirs(out_img, exist_ok=True)
os.makedirs(out_lbl, exist_ok=True)
for root, _, files in os.walk(split_dir):
for fn in files:
if not fn.lower().endswith(IMG_EXT):
continue
img_path = os.path.join(root, fn)
stem = os.path.splitext(fn)[0]
img = cv2.imread(img_path)
if img is None:
continue
h, w = img.shape[:2] # Image dimensions — required for normalization
lab_path = find_label_for(stem, split_dir)
yolo_lines = []
if lab_path and os.path.exists(lab_path):
with open(lab_path) as f:
for line in f:
parts = line.strip().split()
if len(parts) < 8:
continue
cls_name = parts[0] # Class name (e.g., "bottle")
try:
# KITTI bbox format: xmin, ymin, xmax, ymax (in pixels)
xmin, ymin, xmax, ymax = map(float, parts[4:8])
except:
# Fallback: extract numeric values
nums = [float(p) for p in parts if p.replace('.','',1).isdigit()]
if len(nums) >= 4:
xmin, ymin, xmax, ymax = nums[:4]
else:
continue
# Normalize: center x,y and width/height ratios
cx = (xmin + xmax) / 2.0 / w
cy = (ymin + ymax) / 2.0 / h
bw = (xmax - xmin) / w
bh = (ymax - ymin) / h
cid = name2id.get(cls_name)
# Add if valid class and valid bbox size
if cid is not None and 0 < bw <= 1 and 0 < bh <= 1:
yolo_lines.append(f"{cid} {cx:.6f} {cy:.6f} {bw:.6f} {bh:.6f}")
# Copy image and write label to target directory
dst_img = os.path.join(out_img, fn)
shutil.copy2(img_path, dst_img)
with open(os.path.join(out_lbl, stem + ".txt"), "w") as f:
f.write("\n".join(yolo_lines))
# Convert both train and val splits
for sp in ("train","val"):
convert_split(sp)
print(f"✅ YOLO dataset ready → {YOLO_ROOT}")python3 kitti2yolo.pyOutput Structure:
~/datasets/revel_yolo/images/train~/datasets/revel_yolo/labels/train~/datasets/revel_yolo/images/val~/datasets/revel_yolo/labels/val
Ultralytics requires dataset metadata — including paths, splits, and class mappings — to be specified in a dataset.yaml file. Dynamic generation from classes.txt eliminates manual entry errors.
DATA_ROOT=~/datasets/revel_yolo
YAML_PATH=~/revel_dataset.yaml
{
echo "path: $DATA_ROOT" # Root data directory
echo "train: images/train" # Train image path (relative to path)
echo "val: images/val" # Val image path
echo "names:" # Class names → ID mapping
nl -v 0 -ba ~/classes.txt | awk '{printf(" %d: %s\n",$1-1,$2)}' # Number starting from 0
} > "$YAML_PATH"
echo "✅ YAML file created:"
cat "$YAML_PATH"Example Output:
path: /home/user/datasets/revel_yolo train: images/train val: images/val names: 0: allen_key.usd 1: dewalt_battery_small.usd 2: dewalt_drill.usd 3: makita_trimmer.usd 4: stanley_cup.usd
- YOLOv10-L: Selected for optimal accuracy-speed tradeoff — Ultralytics YOLOv10 Documentation.
imgsz=960: Downsampled input size suitable for 1280x720 source imagery.mosaic=1: Enables multi-image augmentation — improves generalization.cos_lr=1: Cosine annealing learning rate scheduler — mitigates overfitting.patience=20: Early stopping if validation mAP does not improve over 20 epochs.
source /opt/ros/jazzy/setup.bash
sudo apt install -y ros-jazzy-vision-msgs ros-jazzy-cv-bridge ros-jazzy-image-transport python3-colcon-common-extensions
python3 -m pip install --upgrade ultralytics "torch>=2.4" torchvision opencv-python --break-system-packages
python3 -m pip install "numpy==1.26.4" --break-system-packages # For Ultralytics compatibilityyolo detect train \
model=yolov10l.pt \
data=~/revel_dataset.yaml \
epochs=100 \
imgsz=960 \
batch=16 \
device=0 \
project=revel_hackathon \
name=y10l_revel \
mosaic=1 \
hsv=1 \ # Hue-Saturation-Value augmentation
cos_lr=1 \ # Cosine LR scheduler
patience=20 # Early stoppingyolo detect val \
model=runs/detect/revel_hackathon_y10l_revel/weights/best.pt \
data=~/revel_dataset.yaml \
imgsz=960yolo export \
model=runs/detect/revel_hackathon_y10l_revel/weights/best.pt \
format=engine \ # TensorRT Engine (.engine)
device=0Output:
best.engine— deployable for low-latency inference on NVIDIA Jetson or RTX platforms.
The trained model is deployed as a ROS 2 node to enable real-time perception within robotic systems:
- Subscribes to
/rgbtopic (sensor_msgs/Image). - Executes YOLOv10 inference on GPU.
- Publishes annotated visualization to
/yolo/annotated(sensor_msgs/Image). - Publishes structured detections to
/yolo/detections(vision_msgs/Detection2DArray) — compatible with NVIDIA Isaac ROS.
source ~/ros2_ws/install/setup.bash
export ROS_DOMAIN_ID=0
export RMW_IMPLEMENTATION=rmw_fastrtps_cpp
ros2 launch yolo_detector yolo.launch.py \
image_topic:=/rgb \
device:=cuda:0 \
model:=/home/$USER/runs/detect/revel_hackathon_y10l_revel/weights/best.pt- Launch RViz:
rviz2
- Add Display → Image → Topic:
/yolo/annotated - Monitor detection messages:
ros2 topic echo /yolo/detections -n 1
Message Specification:
vision_msgs/Detection2DArray
EachDetection2Dcontains:
bbox.center.x,bbox.center.ybbox.size_x,bbox.size_yresults[].id(class index)results[].score(confidence)
| Stage | Technology | Output | Reference |
|---|---|---|---|
| 1 | Isaac Sim 5.0 + Replicator | 6000-frame KITTI dataset | Synthetic Data Best Practices |
| 2 | Python Script | YOLO-format dataset | Ultralytics Documentation |
| 3 | Bash + YAML | dataset.yaml |
Ultralytics Config Guide |
| 4 | Ultralytics YOLOv10-L | best.pt + best.engine |
YOLOv10 Docs |
| 5 | ROS 2 Jazzy | Real-time detector node | Isaac ROS Vision Msgs |
- Error Handling: Re-execute the exact failing command. The pipeline is designed to be idempotent.
- Data Quality Issues: Increase variance ranges for lighting and camera jitter in Isaac Sim script.
- Overfitting: Reduce epoch count or increase augmentation strength (
hsv,degrees,translate, etc.). - ROS 2 Node Failures: Verify installation of
cv_bridge,vision_msgs, and correct CUDA device mapping.
This pipeline integrates components from NVIDIA Omniverse, Ultralytics YOLO, and ROS 2 ecosystems. All configurations and scripts comply with their respective official documentation. Commercial use requires adherence to applicable licensing terms of each component.
Prepared by: Şamma ERDOĞAN
LycheeAI x REVEL Hackathon — September 2025