Scene Graph Benchmark in Pytorch

[BMVC 2025] Code for the paper REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation

Previous work (PE-NET model)	Our REACT model for Real-Time SGG
video_github_baseline.mp4	video_github_REACT.mp4

Our paper REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation has been accepted at BMVC 2025! We dive into current bottlenecks of SGG models for real-time constraints and propose a simple yet very efficient implementation using YOLOV8/9/10/11/12. Weights are available here. Here is a snapshot of the main results:

Background

This implementation is a new benchmark for the task of Scene Graph Generation, based on a fork of the SGG Benchmark by Kaihua Tang. The implementation by Kaihua is a good starting point however it is very outdated and is missing a lot of new development for the task. My goal with this new codebase is to provide an up-to-date and easy-to-run implementation of common approaches in the field of Scene Graph Generation. This codebase also focuses on real-time and real-world usage of Scene Graph Generation with dedicated dataset tools and a large choice of object detection backbones. This codebase is actually a work-in-progress, do not expect everything to work properly on the first run. If you find any bugs, please feel free to post an issue or contribute with a PR.

Recent Updates

Installation

Check INSTALL.md for installation instructions.

Datasets

Check DATASET.md for instructions regarding dataset preprocessing, including how to create your own dataset with SGG-Annotate.

DEMO

You can download a pre-train model or train your own model and run my off-the-shelf demo!

You can use the SGDET_on_custom_images.ipynb notebook to visualize detections on images.

I also made a demo code to try SGDET with your webcam in the demo folder, feel free to have a look!

Supported Models

Background

Scene Graph Generation approaches can be categorized between one-stage and two-stage approaches:

Two-stages approaches are the original implementation of SGG. It decouples the training process into (1) training an object detection backbone and (2) using bounding box proposals and image features from the backbone to train a relation prediction model.
One-stage approaches are learning both the object and relation features in the same learning stage. This codebase focuses only on the first category, two-stage approaches.

Object Detection Backbones

We proposed different object detection backbones that can be plugged with any relation prediction head, depending on the use case.

🚀 NEW! No need to train a backbone anymore, we support Yolo-World for fast and easy open-vocabulary inference. Please check it out!

YOLO12: New yolo architecture for SOTA real-time object detection.
YOLO11: New yolo version from Ultralytics for SOTA real-time object detection.
YOLOV10: New end-to-end yolo architecture for SOTA real-time object detection.
YOLOV8-World: SOTA in real-time open-vocabulary object detection!
YOLOV9: SOTA in real-time object detection.
YOLOV8: New yolo version from Ultralytics for SOTA real-time object detection.
LEGACY Faster-RCNN: This is the original backbone used in most SGG approaches. It is based on a ResNeXt-101 feature extractor and an RPN for regression and classification. See the original paper for reference. Performance is 38.52/26.35/28.14 mAp on VG train/val/test set respectively. You can find the original pretrained model by Kaihua here.

Relation Heads

We try to compiled the main approaches for relation modeling in this codebase:

REACT (2025): REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation
SQUAT (2023): Devil's on the Edges: Selective Quad Attention for Scene Graph Generation, thanks to the official implementation by authors
PE-NET (2023): Prototype-based Embedding Network for Scene Graph Generation, thanks to the official implementation by authors
SHA-GCL (2022): Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation in Pytorch, thanks to the official implementation by authors
GPS-NET (2020): GPS-Net: Graph Property Sensing Network for Scene Graph Generation, thanks to the official implementation by authors
Transformer (2020): Unbiased Scene Graph Generation from Biased Training, thanks to the implementation by Kaihua
VCTree (2018): Learning to Compose Dynamic Tree Structures for Visual Contexts, thanks to the implementation by Kaihua
Neural-Motifs (2018): Neural Motifs: Scene Graph Parsing with Global Context, thanks to the implementation by Kaihua
IMP (2017): Scene Graph Generation by Iterative Message Passing, thanks to the implementation by Kaihua

Debiasing methods

On top of relation heads, several debiasing methods have been proposed through the years with the aim of increasing the accuracy of baseline models in the prediction of tail classes.

Hierarchical (2024): Hierarchical Relationships: A New Perspective to Enhance Scene Graph Generation, thanks to the implementation by authors
Causal (2020): Unbiased Scene Graph Generation from Biased Training, thanks to the implementation by authors

Data Augmentation methods

Due to severe biases in datasets, the task of Scene Graph Generation as also been tackled through data-centring approaches.

IETrans (2022): Fine-Grained Scene Graph Generation with Data Transfer, custom implementation based on the one by Zijian Zhou

Model ZOO

We provide some of the pre-trained weights for evaluation or usage in downstream tasks, please see MODEL_ZOO.md.

Metrics and Results (IMPORTANT)

Explanation of metrics in our toolkit and reported results are given in METRICS.md

YOLOV8/9/10/11/12/World Pre-training

If you want to use YoloV8/9/10/11/12 or Yolo-World as a backbone instead of Faster-RCNN, you need to first train a model using the official ultralytics implementation. To help you with that, I have created a dedicated notebook to generate annotations in YOLO format from a .h5 file (SGG format). Once you have a model, you can modify this config file and change the path PRETRAINED_DETECTOR_CKPT to your model weights. Please note that you will also need to change the variable SIZE and OUT_CHANNELS accordingly if you use another variant of YOLO (nano, small or large for instance). For training an SGG model with YOLO as a backbone, you need to modify the META_ARCHITECTURE variable in the same config file to GeneralizedYOLO. You can then follow the standard procedure for PREDCLS, SGCLS or SGDET training below.

Faster R-CNN pre-training (legacy)

⚠️ Faster-RCNN pre-training is not officially supported anymore in this codebase, please use a YOLO backbone instead (see above). Using detector_pretrain_net.py will NOT WORK with a YOLO backbone.

The following command can be used to train your own Faster R-CNN model:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --master_port 10001 --nproc_per_node=4 tools/detector_pretrain_net.py --config-file "configs/e2e_relation_detector_X_101_32_8_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 8 TEST.IMS_PER_BATCH 4 DTYPE "float16" SOLVER.MAX_EPOCH 20 MODEL.RELATION_ON False OUTPUT_DIR ./checkpoints/pretrained_faster_rcnn SOLVER.PRE_VAL False

where CUDA_VISIBLE_DEVICES and --nproc_per_node represent the id of GPUs and number of GPUs you use, --config-file means the config we use, where you can change other parameters. SOLVER.IMS_PER_BATCH and TEST.IMS_PER_BATCH are the training and testing batch size respectively, DTYPE "float16" enables Automatic Mixed Precision, OUTPUT_DIR is the output directory to save checkpoints and log (considering /home/username/checkpoints/pretrained_faster_rcnn), SOLVER.PRE_VAL means whether we conduct validation before training or not.

Perform training on Scene Graph Generation

There are three standard protocols: (1) Predicate Classification (PredCls): taking ground truth bounding boxes and labels as inputs, (2) Scene Graph Classification (SGCls) : using ground truth bounding boxes without labels, (3) Scene Graph Detection (SGDet): detecting SGs from scratch. We use the argument --task to select the protocols.

For Predicate Classification (PredCls), we need to set:

--task predcls

For Scene Graph Classification (SGCls): ⚠️ SGCls mode is currently LEGACY and NOT SUPPORTED anymore for any YOLO-based model, please find the reason why in this issue.

--task sgcls

For Scene Graph Detection (SGDet):

--task sgdet

Predefined Models

We abstract various SGG models to be different relation-head predictors in the file roi_heads/relation_head/roi_relation_predictors.py. To select our predefined models, you can use MODEL.ROI_RELATION_HEAD.PREDICTOR.

For REACT Model:

MODEL.ROI_RELATION_HEAD.PREDICTOR REACTPredictor

For PE-NET Model:

MODEL.ROI_RELATION_HEAD.PREDICTOR PrototypeEmbeddingNetwork

For Neural-MOTIFS Model:

MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor

For Iterative-Message-Passing(IMP) Model (Note that SOLVER.BASE_LR should be changed to 0.001 in SGCls, or the model won't converge):

MODEL.ROI_RELATION_HEAD.PREDICTOR IMPPredictor

For VCTree Model:

MODEL.ROI_RELATION_HEAD.PREDICTOR VCTreePredictor

For Transformer Model (Note that Transformer Model needs to change SOLVER.BASE_LR to 0.001, SOLVER.SCHEDULE.TYPE to WarmupMultiStepLR, SOLVER.MAX_ITER to 16000, SOLVER.IMS_PER_BATCH to 16, SOLVER.STEPS to (10000, 16000).), which is provided by Jiaxin Shi:

MODEL.ROI_RELATION_HEAD.PREDICTOR TransformerPredictor

For Unbiased-Causal-TDE Model:

MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor

The default settings are under configs/e2e_relation_X_101_32_8_FPN_1x.yaml and sgg_benchmark/config/defaults.py. The priority is command > yaml > defaults.py

Customize Your Own Model

If you want to customize your own model, you can refer sgg_benchmark/modeling/roi_heads/relation_head/model_XXXXX.py and sgg_benchmark/modeling/roi_heads/relation_head/utils_XXXXX.py. You also need to add the corresponding nn.Module in sgg_benchmark/modeling/roi_heads/relation_head/roi_relation_predictors.py. Sometimes you may also need to change the inputs & outputs of the module through sgg_benchmark/modeling/roi_heads/relation_head/relation_head.py.

The Causal TDE on Unbiased Scene Graph Generation from Biased Training

As to the Unbiased-Causal-TDE, there are some additional parameters you need to know. MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE is used to select the causal effect analysis type during inference(test), where "none" is original likelihood, "TDE" is total direct effect, "NIE" is natural indirect effect, "TE" is total effect. MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE has two choice "sum" or "gate". Since Unbiased Causal TDE Analysis is model-agnostic, we support Neural-MOTIFS, VCTree and VTransE. MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER is used to select these models for Unbiased Causal Analysis, which has three choices: motifs, vctree, vtranse.

Note that during training, we always set MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE to be 'none', because causal effect analysis is only applicable to the inference/test phase.

Examples of the Training Command

NEW: I replaced the training by iteration (steps) with training by epochs (iteration on the whole dataset), controlling the training loop by iteration is still possible but it's made easier by epochs imo, you can try with the argument SOLVER.MAX_EPOCH (see below)

By default, only the last checkpoint will be saved which is not very efficient. You can choose to save only the best checkpoint instead with the argument --save-best. Training Example 1 : (PreCls, Motif Model)

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 10025 --nproc_per_node=2 tools/relation_train_net.py --task predcls --save-best --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor SOLVER.IMS_PER_BATCH 12 TEST.IMS_PER_BATCH 2 DTYPE "float16" SOLVER.MAX_EPOCH 20 MODEL.PRETRAINED_DETECTOR_CKPT ./checkpoints/pretrained_faster_rcnn/model_final.pth OUTPUT_DIR ./checkpoints/motif-precls-exmp

where MODEL.PRETRAINED_DETECTOR_CKPT is the pretrained Faster R-CNN model you want to load, OUTPUT_DIR is the output directory used to save checkpoints and the log. Since we use the WarmupReduceLROnPlateau as the learning scheduler for SGG, SOLVER.STEPS is not required anymore.

Training Example 2 : (SGCls, Causal, TDE, SUM Fusion, MOTIFS Model)

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 10026 --nproc_per_node=2 tools/relation_train_net.py --task sgcls --save-best  --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE none MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs  SOLVER.IMS_PER_BATCH 12 TEST.IMS_PER_BATCH 2 DTYPE "float16" SOLVER.MAX_EPOCH 20 MODEL.PRETRAINED_DETECTOR_CKPT ./checkpoints/pretrained_faster_rcnn/model_final.pth OUTPUT_DIR ./checkpoints/causal-motifs-sgcls-exmp

Hyperparameters Tuning

Required library: pip install ray[data,train,tune] optuna tensorboard

We provide a training loop for hyperparameters tuning in hyper_param_tuning.py. This script uses the RayTune library for efficient hyperparameters search. You can define a search_space object with different values related to the optimizer (AdamW and SGD supported for now) or directly customize the model structure with model parameters (for instance Linear layers dimensions or MLP dimensions etc). The ASHAScheduler scheduler is used for the early stopping of bad trials. The default value to optimize is the overall loss but this can be customize to specific loss values or standard metrics such as mean_recall.

To launch the script, do as follow:

CUDA_VISIBLE_DEVICES=0 python tools/hyper_param_tuning.py --save-best --task sgdet --config-file "./configs/IndoorVG/e2e_relation_yolov10.yaml" MODEL.ROI_RELATION_HEAD.PREDICTOR PrototypeEmbeddingNetwork DTYPE "float16" SOLVER.PRE_VAL True GLOVE_DIR /home/maelic/glove OUTPUT_DIR ./checkpoints/IndoorVG4/SGDET/penet-yolov10m SOLVER.IMS_PER_BATCH 8

The config and OUTPUT_DIR paths need to be absolute to allow faster loading. A lot of terminal outputs are disabled by default during tuning, using the cfg.VERBOSE variable.

To watch the results with tensorboardX:

tensorboard --logdir=./ray_results/train_relation_net_2024-06-23_15-28-01

Evaluation

Examples of the Test Command

Test Example 1 : (PreCls, Motif Model)

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10027 --nproc_per_node=1 tools/relation_test_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor TEST.IMS_PER_BATCH 1 DTYPE "float16" GLOVE_DIR /home/kaihua/glove MODEL.PRETRAINED_DETECTOR_CKPT /home/kaihua/checkpoints/motif-precls-exmp OUTPUT_DIR /home/kaihua/checkpoints/motif-precls-exmp

Test Example 2 : (SGCls, Causal, TDE, SUM Fusion, MOTIFS Model)

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10028 --nproc_per_node=1 tools/relation_test_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE TDE MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs  TEST.IMS_PER_BATCH 1 DTYPE "float16" GLOVE_DIR /home/kaihua/glove MODEL.PRETRAINED_DETECTOR_CKPT /home/kaihua/checkpoints/causal-motifs-sgcls-exmp OUTPUT_DIR /home/kaihua/checkpoints/causal-motifs-sgcls-exmp

Other Options that May Improve the SGG

For some models (not all), turning on or turning off MODEL.ROI_RELATION_HEAD.POOLING_ALL_LEVELS will affect the performance of predicate prediction, e.g., turning it off will improve VCTree PredCls but not the corresponding SGCls and SGGen. For the reported results of VCTree, we simply turn it on for all three protocols like other models.
For some models (not all), a crazy fusion proposed by Learning to Count Object will significantly improves the results, which looks like f(x1, x2) = ReLU(x1 + x2) - (x1 - x2)**2. It can be used to combine the subject and object features in roi_heads/relation_head/roi_relation_predictors.py. For now, most of our model just concatenate them as torch.cat((head_rep, tail_rep), dim=-1).
Not to mention the hidden dimensions in the models, e.g., MODEL.ROI_RELATION_HEAD.CONTEXT_HIDDEN_DIM. Due to the limited time, we didn't fully explore all the settings in this project, I won't be surprised if you improve our results by simply changing one of our hyper-parameters

Frequently Asked Questions:

Q: Fail to load the given checkpoints. A: The model to be loaded is based on the last_checkpoint file in the OUTPUT_DIR path. If you fail to load the given pretained checkpoints, it probably because the last_checkpoint file still provides the path in my workstation rather than your own path.
Q: AssertionError on "assert len(fns) == 108073" A: If you are working on VG dataset, it is probably caused by the wrong DATASETS (data path) in sgg_benchmark/config/paths_catlog.py. If you are working on your custom datasets, just comment out the assertions.
Q: AssertionError on "l_batch == 1" in model_motifs.py A: The original MOTIFS code only supports evaluation on 1 GPU. Since my reimplemented motifs is based on their code, I keep this assertion to make sure it won't cause any unexpected errors.

Citations

If you find this project helps your research, please kindly consider citing our project or papers in your publications.

@misc{neau2024reactrealtimeefficiencyaccuracy,
      title={REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation}, 
      author={Maëlic Neau and Paulo E. Santos and Anne-Gwenn Bosser and Cédric Buche},
      year={2024},
      eprint={2405.16116},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2405.16116}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
TRAINING		TRAINING
configs		configs
datasets		datasets
demo		demo
docs		docs
process_data		process_data
scripts		scripts
sgg_benchmark		sgg_benchmark
tests		tests
tools		tools
visualization		visualization
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scene Graph Benchmark in Pytorch

[BMVC 2025] Code for the paper REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation

Background

Recent Updates

Contents

Installation

Datasets

DEMO

Supported Models

Background

Object Detection Backbones

Relation Heads

Debiasing methods

Data Augmentation methods

Model ZOO

Metrics and Results (IMPORTANT)

YOLOV8/9/10/11/12/World Pre-training

Faster R-CNN pre-training (legacy)

Perform training on Scene Graph Generation

Predefined Models

Customize Your Own Model

The Causal TDE on Unbiased Scene Graph Generation from Biased Training

Examples of the Training Command

Hyperparameters Tuning

Evaluation

Examples of the Test Command

Other Options that May Improve the SGG

Frequently Asked Questions:

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Maelic/SGG-Benchmark

Folders and files

Latest commit

History

Repository files navigation

Scene Graph Benchmark in Pytorch

[BMVC 2025] Code for the paper REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation

Background

Recent Updates

Contents

Installation

Datasets

DEMO

Supported Models

Background

Object Detection Backbones

Relation Heads

Debiasing methods

Data Augmentation methods

Model ZOO

Metrics and Results (IMPORTANT)

YOLOV8/9/10/11/12/World Pre-training

Faster R-CNN pre-training (legacy)

Perform training on Scene Graph Generation

Predefined Models

Customize Your Own Model

The Causal TDE on Unbiased Scene Graph Generation from Biased Training

Examples of the Training Command

Hyperparameters Tuning

Evaluation

Examples of the Test Command

Other Options that May Improve the SGG

Frequently Asked Questions:

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages