Thanks to visit codestin.com
Credit goes to github.com

Skip to content

sammahmeterdogan/revel_hackathon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation


REVEL KITTI Dataset Generation and YOLOv10 Training Pipeline

NVIDIA Omniverse & ROS 2 Integrated Synthetic Data Training System

LycheeAI x REVEL Hackathon Project — End-to-End Automation for Industrial Object Detection


1. Introduction: Rationale for Synthetic Data Pipeline

In robotic perception systems, acquiring real-world annotated datasets for tasks such as Pick & Place introduces significant cost, safety risk, and logistical complexity. As outlined in NVIDIA’s Synthetic Data Best Practices:

“Synthetic data enables perception models to be trained safely, scalably, and under controlled conditions — a critical advantage in robotics.”

This pipeline implements that principle with strict adherence to reproducibility, modularity, and compatibility with industrial toolchains:

  • Controlled variation: Objects remain static; only camera pose and lighting parameters are randomized — ensuring scenario integrity.
  • Industry-standard output formats: KITTI → YOLO conversion ensures compatibility with Ultralytics training frameworks and NVIDIA TAO Toolkit.
  • GPU-accelerated inference: CUDA + TensorRT support enables low-latency deployment on edge devices.
  • ROS 2 integration: Real-time detector node publishes standardized vision_msgs/Detection2DArray messages for seamless integration into robotic control stacks.

2. Pipeline Architecture Overview (5 Sequential Stages)

Each stage depends strictly on successful completion of its predecessor. In case of failure, re-execution of the same command is required — substitutions or workarounds are not recommended. The system is optimized for hackathon constraints (time, compute resources) without compromising reproducibility.


3. Stage 1: Isaac Sim — KITTI Dataset Generation (Train/Val Split)

Configuration Rationale

  • Semantics.class=<label> tagging enables automatic bounding box annotation via NVIDIA Replicator — see Replicator Semantic Annotation Guide.
  • Camera jitter is constrained to position and look-at vectors only — physical object positions remain fixed to preserve scene semantics.
  • Lighting randomization (dome + sphere) introduces naturalistic shadow and reflection variations — improving model robustness to illumination changes (Isaac Sim Lighting Best Practices).

Script (Isaac Sim Script Editor)

# === REVEL → KITTI dataset (train/val), safe jitter (objects fixed) ===
import omni.replicator.core as rep, os
from pathlib import Path

USER     = os.getenv("USER")
OUT_ROOT = Path(f"/home/{USER}/datasets/revel_kitti")  # Output directory
RES      = (1280, 720)                                 # Image resolution — HD standard
N_TOTAL  = 6000                                        # Total frame count
SPLIT    = 0.9                                         # Train ratio (90%)
N_TRAIN  = int(N_TOTAL * SPLIT)
N_VAL    = N_TOTAL - N_TRAIN

# 1) Target: All objects with Semantics.class tag
targets = rep.get.prims(semantics=[('class','*')])
assert targets, "ERROR: No object found with 'Semantics.class' tag. Add it to Root Xform!"

# 2) Camera + Render Product — FOV and gaze point approximating human eye level
cam = rep.create.camera(position=(0,1.6,2.2), look_at=(0,0.6,0), focal_length=24.0)
rp  = rep.create.render_product(cam, RES)

# 3) JITTER ONLY LIGHT AND CAMERA — DO NOT MOVE OBJECTS!
def jitter():
    # Dome light: ambient illumination — randomized temperature and intensity
    rep.randomizer.light(light_type='dome',
                         intensity=rep.distribution.uniform(800, 3000),
                         temperature=rep.distribution.uniform(4500, 8500))
    # Sphere light: directional spot effect — randomized position and count
    rep.create.light(light_type='sphere',
                     intensity=rep.distribution.uniform(1000, 4000),
                     position =rep.distribution.uniform((-2,2,-2),(2,4,2)),
                     temperature=rep.distribution.uniform(3000, 9000),
                     count    =rep.distribution.choice([1,2,3]))
    # Slightly perturb camera — focusing on objects
    with cam:
        rep.modify.pose(
            position=rep.distribution.uniform((-1.5,1.2,-1.5),(1.5,2.2,1.5)),  # +/- 1.5m lateral
            look_at =rep.distribution.choice(targets)                          # Look at random object
        )

rep.randomizer.register(jitter)  # Register randomizer

writer = rep.WriterRegistry.get("KittiWriter")  # Write in standard KITTI format

def run_split(out_dir, frames):
    writer.initialize(output_dir=str(out_dir))   # Set output directory
    writer.attach([rp])                          # Attach render product
    print(f"[RUN] {out_dir}  frames={frames}")
    rep.orchestrator.run(num_frames=frames)      # Generate frames

# Generate train and val sets separately
run_split(OUT_ROOT / "train", N_TRAIN)
run_split(OUT_ROOT / "val",   N_VAL)
print(f"[DONE] KITTI dataset ready → {OUT_ROOT}")

Verification (Terminal)

ls ~/datasets/revel_kitti/train/rgb_*.png | head -5
ls ~/datasets/revel_kitti/train/label_*.txt | head -5
echo "Train: $(find ~/datasets/revel_kitti/train -name "*.png" | wc -l) images"
echo "Val: $(find ~/datasets/revel_kitti/val -name "*.png" | wc -l) images"

Expected Output:

  • train/ → 5400 images + corresponding label files
  • val/ → 600 images + corresponding label files

If label files are empty, verify that all target objects in Isaac Sim have correctly assigned Semantics.class tags.


4. Stage 2: KITTI to YOLO Format Conversion

Conversion Rationale

KITTI format is designed for 3D object detection and uses absolute pixel coordinates and class names. YOLO format requires normalized 2D bounding boxes in the structure:
(class_id center_x center_y width height) — directly consumable by Ultralytics YOLO trainers and compatible with NVIDIA TAO Toolkit.

Script (kitti2yolo.py)

import os, cv2, shutil

KITTI_ROOT = os.path.expanduser("~/datasets/revel_kitti")
YOLO_ROOT  = os.path.expanduser("~/datasets/revel_yolo")
CLASSES_TXT= os.path.expanduser("~/classes.txt")  # One class name per line
IMG_EXT    = (".png", ".jpg", ".jpeg")

# Load class names and map to IDs
classes = [l.strip() for l in open(CLASSES_TXT) if l.strip()]
name2id = {n:i for i,n in enumerate(classes)}  # "bottle" → 0, "can" → 1, ...

def find_label_for(stem, search_dir):
    """Find KITTI label file — usually in same directory"""
    for root, _, files in os.walk(search_dir):
        if stem + ".txt" in files:
            return os.path.join(root, stem + ".txt")
    return None

def convert_split(split):
    split_dir = os.path.join(KITTI_ROOT, split)
    out_img = os.path.join(YOLO_ROOT, "images", split)
    out_lbl = os.path.join(YOLO_ROOT, "labels", split)
    os.makedirs(out_img, exist_ok=True)
    os.makedirs(out_lbl, exist_ok=True)

    for root, _, files in os.walk(split_dir):
        for fn in files:
            if not fn.lower().endswith(IMG_EXT):
                continue
            img_path = os.path.join(root, fn)
            stem = os.path.splitext(fn)[0]
            img = cv2.imread(img_path)
            if img is None:
                continue
            h, w = img.shape[:2]  # Image dimensions — required for normalization

            lab_path = find_label_for(stem, split_dir)
            yolo_lines = []
            if lab_path and os.path.exists(lab_path):
                with open(lab_path) as f:
                    for line in f:
                        parts = line.strip().split()
                        if len(parts) < 8:
                            continue
                        cls_name = parts[0]  # Class name (e.g., "bottle")
                        try:
                            # KITTI bbox format: xmin, ymin, xmax, ymax (in pixels)
                            xmin, ymin, xmax, ymax = map(float, parts[4:8])
                        except:
                            # Fallback: extract numeric values
                            nums = [float(p) for p in parts if p.replace('.','',1).isdigit()]
                            if len(nums) >= 4:
                                xmin, ymin, xmax, ymax = nums[:4]
                            else:
                                continue
                        # Normalize: center x,y and width/height ratios
                        cx = (xmin + xmax) / 2.0 / w
                        cy = (ymin + ymax) / 2.0 / h
                        bw = (xmax - xmin) / w
                        bh = (ymax - ymin) / h
                        cid = name2id.get(cls_name)
                        # Add if valid class and valid bbox size
                        if cid is not None and 0 < bw <= 1 and 0 < bh <= 1:
                            yolo_lines.append(f"{cid} {cx:.6f} {cy:.6f} {bw:.6f} {bh:.6f}")

            # Copy image and write label to target directory
            dst_img = os.path.join(out_img, fn)
            shutil.copy2(img_path, dst_img)
            with open(os.path.join(out_lbl, stem + ".txt"), "w") as f:
                f.write("\n".join(yolo_lines))

# Convert both train and val splits
for sp in ("train","val"):
    convert_split(sp)

print(f"✅ YOLO dataset ready → {YOLO_ROOT}")

Execution

python3 kitti2yolo.py

Output Structure:

  • ~/datasets/revel_yolo/images/train
  • ~/datasets/revel_yolo/labels/train
  • ~/datasets/revel_yolo/images/val
  • ~/datasets/revel_yolo/labels/val

5. Stage 3: Ultralytics YAML Configuration File

Purpose

Ultralytics requires dataset metadata — including paths, splits, and class mappings — to be specified in a dataset.yaml file. Dynamic generation from classes.txt eliminates manual entry errors.

Bash Command

DATA_ROOT=~/datasets/revel_yolo
YAML_PATH=~/revel_dataset.yaml
{
  echo "path: $DATA_ROOT"          # Root data directory
  echo "train: images/train"       # Train image path (relative to path)
  echo "val: images/val"           # Val image path
  echo "names:"                    # Class names → ID mapping
  nl -v 0 -ba ~/classes.txt | awk '{printf("  %d: %s\n",$1-1,$2)}'  # Number starting from 0
} > "$YAML_PATH"

echo "✅ YAML file created:"
cat "$YAML_PATH"

Example Output:

path: /home/user/datasets/revel_yolo
train: images/train
val: images/val
names:
  0: allen_key.usd
  1: dewalt_battery_small.usd
  2: dewalt_drill.usd
  3: makita_trimmer.usd
  4: stanley_cup.usd

6. Stage 4: Ultralytics YOLOv10-L Model Training

Model Selection & Hyperparameter Justification

  • YOLOv10-L: Selected for optimal accuracy-speed tradeoff — Ultralytics YOLOv10 Documentation.
  • imgsz=960: Downsampled input size suitable for 1280x720 source imagery.
  • mosaic=1: Enables multi-image augmentation — improves generalization.
  • cos_lr=1: Cosine annealing learning rate scheduler — mitigates overfitting.
  • patience=20: Early stopping if validation mAP does not improve over 20 epochs.

Dependency Installation (One-Time Setup)

source /opt/ros/jazzy/setup.bash
sudo apt install -y ros-jazzy-vision-msgs ros-jazzy-cv-bridge ros-jazzy-image-transport python3-colcon-common-extensions
python3 -m pip install --upgrade ultralytics "torch>=2.4" torchvision opencv-python --break-system-packages
python3 -m pip install "numpy==1.26.4" --break-system-packages  # For Ultralytics compatibility

Training Command

yolo detect train \
  model=yolov10l.pt \
  data=~/revel_dataset.yaml \
  epochs=100 \
  imgsz=960 \
  batch=16 \
  device=0 \
  project=revel_hackathon \
  name=y10l_revel \
  mosaic=1 \
  hsv=1 \          # Hue-Saturation-Value augmentation
  cos_lr=1 \       # Cosine LR scheduler
  patience=20      # Early stopping

Validation

yolo detect val \
  model=runs/detect/revel_hackathon_y10l_revel/weights/best.pt \
  data=~/revel_dataset.yaml \
  imgsz=960

Optional: TensorRT Export for Edge Deployment

yolo export \
  model=runs/detect/revel_hackathon_y10l_revel/weights/best.pt \
  format=engine \    # TensorRT Engine (.engine)
  device=0

Output: best.engine — deployable for low-latency inference on NVIDIA Jetson or RTX platforms.


7. Stage 5: ROS 2 Node Integration for Real-Time Inference

Node Functionality

The trained model is deployed as a ROS 2 node to enable real-time perception within robotic systems:

  • Subscribes to /rgb topic (sensor_msgs/Image).
  • Executes YOLOv10 inference on GPU.
  • Publishes annotated visualization to /yolo/annotated (sensor_msgs/Image).
  • Publishes structured detections to /yolo/detections (vision_msgs/Detection2DArray) — compatible with NVIDIA Isaac ROS.

Launch Command

source ~/ros2_ws/install/setup.bash
export ROS_DOMAIN_ID=0
export RMW_IMPLEMENTATION=rmw_fastrtps_cpp

ros2 launch yolo_detector yolo.launch.py \
  image_topic:=/rgb \
  device:=cuda:0 \
  model:=/home/$USER/runs/detect/revel_hackathon_y10l_revel/weights/best.pt

Visualization via RViz

  1. Launch RViz:
    rviz2
  2. Add Display → Image → Topic: /yolo/annotated
  3. Monitor detection messages:
    ros2 topic echo /yolo/detections -n 1

Message Specification: vision_msgs/Detection2DArray
Each Detection2D contains:

  • bbox.center.x, bbox.center.y
  • bbox.size_x, bbox.size_y
  • results[].id (class index)
  • results[].score (confidence)

8. Summary: End-to-End Pipeline Specifications

Stage Technology Output Reference
1 Isaac Sim 5.0 + Replicator 6000-frame KITTI dataset Synthetic Data Best Practices
2 Python Script YOLO-format dataset Ultralytics Documentation
3 Bash + YAML dataset.yaml Ultralytics Config Guide
4 Ultralytics YOLOv10-L best.pt + best.engine YOLOv10 Docs
5 ROS 2 Jazzy Real-time detector node Isaac ROS Vision Msgs

9. Operational Notes and Troubleshooting

  • Error Handling: Re-execute the exact failing command. The pipeline is designed to be idempotent.
  • Data Quality Issues: Increase variance ranges for lighting and camera jitter in Isaac Sim script.
  • Overfitting: Reduce epoch count or increase augmentation strength (hsv, degrees, translate, etc.).
  • ROS 2 Node Failures: Verify installation of cv_bridge, vision_msgs, and correct CUDA device mapping.

10. Licensing and Attribution

This pipeline integrates components from NVIDIA Omniverse, Ultralytics YOLO, and ROS 2 ecosystems. All configurations and scripts comply with their respective official documentation. Commercial use requires adherence to applicable licensing terms of each component.

Prepared by: Şamma ERDOĞAN
LycheeAI x REVEL Hackathon — September 2025


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published