RDFAnalyzerCore

A powerful, config-driven framework for building physics analyses with ROOT RDataFrame.

Overview

RDFAnalyzerCore provides a core, analysis-agnostic framework for constructing and running Analyzer pipelines using ROOT RDataFrame. The framework features:

Config-Driven Architecture: Separate configuration from code for reproducibility
Plugin System: Extensible design with BDT, ONNX, correction, and histogram managers
Lazy Evaluation: Leverages RDataFrame's efficient event processing
Systematic Support: Built-in handling of systematic variations
Analysis Modularity: Analyses live in separate repositories, automatically discovered at build time
Python Bindings: Use the framework from Python with numba and numpy integration
Statistical Analysis: Optional CMS Combine integration for limit setting and fits

Quick Start

NDHistogramManager

Books and fills N-dimensional histograms with support for systematics, regions, and categories. Supports both manual histogram booking and config-driven histogram definitions.

Config-Driven Histograms: Define histograms in a configuration file for dynamic runtime booking. See docs/CONFIG_HISTOGRAMS.md for detailed documentation.

Quick example:

// Enable histogram manager
auto histManager = std::make_unique<NDHistogramManager>(analyzer.getConfigurationProvider());
analyzer.addPlugin("histogramManager", std::move(histManager));

// Define variables and apply filters
analyzer.Define("jet_pt", computePt, {"jet_px", "jet_py"});
analyzer.Filter("quality", isGood, {"jet_quality"});

// Book histograms from config file (after all defines/filters)
analyzer.bookConfigHistograms();

// Save results
analyzer.save();

Config file format (histograms.txt):

name=pt_hist variable=jet_pt weight=event_weight bins=50 lowerBound=0.0 upperBound=500.0

Installing

Clone the repository:

git clone [email protected]:brkronheim/RDFAnalyzerCore.git
cd RDFAnalyzerCore
source env.sh  # On a CVMFS-backed host
source build.sh

# Run example
cd build/analyses/ExampleAnalysis
./analysis cfg.yaml

New to the framework? Check out the Getting Started Guide.

Documentation

Core C++ Backbone

Architecture - Core manager wiring, plugin lifecycle, and execution flow
API Reference - Canonical Analyzer and interface APIs
Plugin Development - Implementing C++ plugins via IPluggableManager
Doxygen Guide - C++ documentation standards for headers and interfaces

For Users

Getting Started - Installation and first steps
Configuration Reference - Complete config file documentation
Analysis Guide - Building analyses step-by-step
Python Bindings - Using the framework from Python
API Reference - Detailed API documentation

Statistical Analysis

Datacard Generator - Creating CMS combine datacards
Systematics Example - Creating histograms with systematic variations
Combine Integration - Complete workflow from analysis to statistical inference

For Developers

Architecture - Internal design and C++ wiring structure
Plugin Development - Creating custom plugins
ONNX Implementation - ONNX manager details
ONNX Multi-Output - Multi-output model support

Documentation Paths by Audience

If you are reading docs for a specific role, start here:

Developers (framework contributors): docs/ARCHITECTURE.md, docs/API_REFERENCE.md, docs/PLUGIN_DEVELOPMENT.md, docs/DOXYGEN_GUIDE.md
Analyzers (analysis authors): docs/GETTING_STARTED.md, docs/ANALYSIS_GUIDE.md, docs/CONFIG_REFERENCE.md, docs/CONFIG_HISTOGRAMS.md, docs/NUISANCE_GROUPS.md
Agents/automation tooling: docs/INDEX.md, docs/ERRORS_AND_TRACING.md, docs/CONFIGURATION_VALIDATION.md, docs/OUTPUT_SCHEMA.md, docs/VALIDATION_REPORTS.md

The docs are intentionally layered: GETTING_STARTED and ANALYSIS_GUIDE show workflow, while API_REFERENCE and headers in core/interface/ are the source of truth for signatures and behavior.

Requirements

ROOT 6.30/02 or later (progress bar support requires 6.30+)
CMake 3.19.0 or later
C++17 compatible compiler
Git

For Python bindings (optional):

Python 3.8+
pybind11, numpy, numba (install with pip install pybind11 numpy numba)

For LAW / Luigi workflows:

Create and activate the repository-local virtualenv, then install the production requirements: python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements-production.txt
After that, source law/env.sh will reuse .venv automatically when it exists.
xrdfs must be available on PATH for XRootD file discovery workflows.

Self-hosted CI runner Dockerfile

A ready-to-build runner image including ROOT, Python, numpy and numba is provided at docker/gh-runner.Dockerfile.
See docs/CI_DOCKERFILE.md for build/run instructions and details.

Repository Structure

RDFAnalyzerCore/
├── core/               # Framework code
│   ├── interface/     # Public headers and interfaces
│   ├── src/          # Core implementations
│   ├── plugins/      # Plugin managers (BDT, ONNX, etc.)
│   ├── python/       # HTCondor submission scripts
│   └── tests/        # Core test targets
├── analyses/          # Analysis repositories (git submodules/clones)
├── examples/          # Python binding examples
├── docs/             # Documentation
├── cmake/            # CMake modules
└── build/            # Build artifacts (generated)

Features

Plugin System

The framework includes several built-in plugins for common analysis tasks:

BDTManager

Manages Boosted Decision Trees using the FastForest library.

Load BDT models from text files
Apply models with sigmoid activation
Conditional execution for efficiency

OnnxManager

Manages ONNX machine learning models from any ML framework.

Automatic ONNX Runtime setup (no manual installation)
Support for multi-output models (e.g., ParticleTransformer)
Thread-safe inference with ROOT ImplicitMT
See: ONNX Implementation Guide

SofieManager

Manages SOFIE (System for Optimized Fast Inference code Emit) models from ROOT TMVA.

Build-time compilation from ONNX for maximum performance
Zero runtime overhead (compiled C++ code)
Eliminates runtime model loading overhead compared to ONNX Runtime
Manual registration required (rebuild for model updates)
See: SOFIE Implementation Guide

CorrectionManager

Applies scale factors and corrections using correctionlib.

JSON-based correction definitions
Automatic application of configured corrections
Support for multi-dimensional lookups

TriggerManager

Handles trigger logic and trigger menu configuration.

Configurable trigger groups with OR logic
Trigger veto support
Sample-specific trigger configurations

NDHistogramManager

Books and fills N-dimensional histograms.

Support for systematics, regions, and categories
Automatic systematic axis generation
Vector-based filling with scalar expansion

Configuration-Driven Design

All framework behavior is controlled through text configuration files:

Main configuration: I/O, performance, plugin configs
Plugin configs: Model definitions, corrections, triggers
Output configs: Branch selection, histogram definitions
Analysis-local registries can also live in YAML when they are primarily data, such as the VHqq run-era payload and trigger map in analyses/VHbbcc/VHqqRDF/cfg/year_settings.yaml

Example:

# Main config
fileList=data.root
saveFile=output.root
threads=-1
bdtConfig=cfg/bdts.txt
onnxConfig=cfg/onnx_models.txt

See Configuration Reference for complete documentation.

Installation and Building

On a CVMFS-backed HEP host

# Clone repository
git clone [email protected]:brkronheim/RDFAnalyzerCore.git
cd RDFAnalyzerCore

# Setup environment
source env.sh

# Build
source build.sh

Standalone ROOT Installation

Ensure ROOT and CMake are available:

# Setup ROOT
source <root-install>/bin/thisroot.sh

# Build
cmake -S . -B build
cmake --build build -j$(nproc)

Build Options

The framework supports optional features that can be enabled at build time:

# Build with all features (default: tests enabled, Combine disabled)
cmake -S . -B build

# Disable tests (faster build for production)
cmake -S . -B build -DBUILD_TESTS=OFF

# Enable CMS Combine for statistical analysis
cmake -S . -B build -DBUILD_COMBINE=ON

# Enable both Combine and CombineHarvester
cmake -S . -B build \
    -DBUILD_COMBINE=ON \
    -DBUILD_COMBINE_HARVESTER=ON

# Complete build with all options
cmake -S . -B build \
    -DBUILD_TESTS=ON \
    -DBUILD_COMBINE=ON \
    -DBUILD_COMBINE_HARVESTER=ON

cmake --build build -j$(nproc)

Available Options:

BUILD_TESTS (default: ON) - Build analysis tests
BUILD_COMBINE (default: OFF) - Build CMS Combine package
BUILD_COMBINE_HARVESTER (default: OFF) - Build CombineHarvester (requires BUILD_COMBINE=ON)

Note: Building Combine and CombineHarvester takes several minutes and requires an internet connection.

See Combine Integration Guide for complete statistical analysis workflows.

Testing

Create and activate a local Python virtual environment before running Python-focused tests. Use the same interpreter for both CTest and direct pytest runs when possible.

python -m venv .venv
source .venv/bin/activate
pip install -r requirements-ci.txt pytest

Then run the full C++/Python suite through CTest:

cd build
ctest --output-on-failure

Or run targeted suites directly from the canonical Python test directories:

source .venv/bin/activate
PYTHONPATH=core/python:core/python/law python -m pytest core/tests/python/ core/tests/law/ -q

The C++ suite and Python binding smoke test now live under core/tests/cpp/.

If you rebuild or reconfigure after creating the virtual environment, prefer:

cmake -S . -B build -DPython3_EXECUTABLE=.venv/bin/python

For a fast sanity check after edits, run cd build && ctest --output-on-failure and then source .venv/bin/activate && PYTHONPATH=core/python:core/python/law python -m pytest core/tests/python/ core/tests/law/ -q.

Adding Your Analysis

Analyses are developed in separate repositories and automatically discovered during build:

# Clone your analysis into analyses/
cd analyses
git clone <your-analysis-repo> MyAnalysis
cd ..

# Rebuild - your analysis is automatically found
source build.sh

# Run
cd build/analyses/MyAnalysis
./myanalysis config.txt

Requirements for analysis repositories:

Must contain a CMakeLists.txt at the root
Should link against RDFCore library
Configuration files typically in cfg/ subdirectory

See Analysis Guide for step-by-step instructions.

Framework Architecture

Core Concepts

The framework is built around several key components that work together:

Analyzer: Central orchestrator providing a simplified API
ConfigurationManager: Loads and provides access to configuration
DataManager: Wraps ROOT::RDataFrame with systematic support
SystematicManager: Tracks and propagates systematic variations
Plugins: Extensible managers for specific tasks (ML, corrections, histograms)
Analysis Services: Internal service hooks (for example provenance and counters)
OutputSinks: Abstract destinations for skims and metadata

The core is wired in C++ through interfaces and dependency-aware plugin ordering. Analyzer owns core managers and services, injects a shared ManagerContext, then executes plugin lifecycle hooks in a deterministic order.

Data Flow

Configuration Files
        ↓
ConfigurationManager → Plugins Load Configs
        ↓
DataManager builds TChain & RDataFrame
        ↓
User Code: Define Variables, Apply Filters
        ↓
Plugins: Apply Models, Corrections, Book Histograms
        ↓
RDataFrame Event Loop (Lazy Evaluation)
        ↓
Output Sinks: Write Skims & Metadata

Key Design Principles:

Interface-based: Components depend on interfaces, not implementations
Plugin architecture: Extensible without modifying core
Config-driven: Behavior controlled by text files
Lazy evaluation: Efficient processing via RDataFrame

See Architecture Documentation for detailed internals.

Usage Example

Here's a minimal analysis using the framework:

#include <analyzer.h>

int main(int argc, char **argv) {
    // Create analyzer from config file
    Analyzer analyzer(argv[1]);
    
    // Define variables
    analyzer.Define("good_jets",
        [](const RVec<float>& pt, const RVec<float>& eta) {
            return pt > 25.0 && abs(eta) < 2.5;
        },
        {"jet_pt", "jet_eta"}
    );
    
    analyzer.Define("n_good_jets",
        [](const RVec<bool>& good) { return Sum(good); },
        {"good_jets"}
    );
    
    // Apply selection
    analyzer.Filter("jet_selection",
        [](int n_jets) { return n_jets >= 4; },
        {"n_good_jets"}
    );
    
    // Apply ML model (from config)
    auto onnxMgr = analyzer.getPlugin<OnnxManager>("onnx");
    if (onnxMgr) {
        onnxMgr->applyAllModels();
    }
    
    // Save outputs
    analyzer.run();
    
    return 0;
}

Configuration (config.txt):

fileList=data1.root,data2.root
saveFile=output.root
threads=-1
onnxConfig=cfg/onnx_models.txt
saveConfig=cfg/output_branches.txt

See Analysis Guide for complete examples.

Python Bindings

The framework can also be used from Python with high performance:

import rdfanalyzer

# Create analyzer from config file
analyzer = rdfanalyzer.Analyzer("config.txt")

# Define variables using C++ expressions (ROOT JIT)
analyzer.Define("pt_gev", "pt / 1000.0", ["pt"])
analyzer.Define("delta_r", 
                   "sqrt(delta_eta*delta_eta + delta_phi*delta_phi)",
                   ["delta_eta", "delta_phi"])

# Or use numba-compiled functions
import numba, ctypes

@numba.cfunc("float64(float64)")
def convert_to_gev(pt):
    return pt / 1000.0

func_ptr = ctypes.cast(convert_to_gev.address, ctypes.c_void_p).value
analyzer.DefineFromPointer("pt_gev", func_ptr, "double(double)", ["pt"])

# Apply filters and save
analyzer.Filter("high_pt", "pt_gev > 25.0", ["pt_gev"])
analyzer.save()

Key Features:

String-based expressions (ROOT JIT compilation)
Numba function pointers for custom logic
Numpy array integration
Full systematic variation support

See Python Bindings Guide for complete documentation and examples.

Advanced Features

Machine Learning Integration

The framework supports multiple ML backends:

ONNX: Runtime evaluation of models from any framework (PyTorch, TensorFlow, scikit-learn)
BDT: FastForest-based boosted decision trees
SOFIE: Build-time compiled models for maximum performance

Plotting from Meta Output

PlottingUtility can build compiled-ROOT stack plots directly from the meta output file. It supports:

per-process normalization through optional counter histograms (for example counter_weightSum_<sample>)
linear and log-y stack plots
optional data/MC ratio panels
ratio/error/pull summary computation
PCA-based mean/up/down envelope construction from variation histograms

All managers support:

Conditional execution (skip expensive inference when not needed)
Multi-output models
Thread-safe inference with ROOT ImplicitMT

Execution entry points:

save() always writes the configured skim output and finalizes plugins/services.
run() conditionally writes skim output when enableSkim=1|true|True, saves ND histograms (if histogramManager is registered), and finalizes plugins/services.

Systematic Uncertainties

Built-in support for systematic variations:

sysMgr->registerSystematic("jes_up");
sysMgr->registerSystematic("jes_down");

analyzer.Define("corrected_pt",
    [](float pt, const std::string& sys) {
        if (sys == "jes_up") return pt * 1.02;
        if (sys == "jes_down") return pt * 0.98;
        return pt;
    },
    {"jet_pt"},
    sysMgr
);

Histograms automatically include systematic axes.

CMS Combine Datacard Generation

Framework includes a Python script for generating CMS combine datacards from analysis outputs:

# Install dependencies (uproot-based, no PyROOT required)
pip install uproot awkward numpy pyyaml

# Generate datacards
python core/python/create_datacards.py config.yaml

Features:

Pure Python: Uses uproot (no PyROOT dependency)
YAML-based configuration for datacards
Multiple control region support
Sample combination (binned/stitched samples)
Observable rebinning (uniform and variable)
Systematic uncertainties (rate and shape)
Automatic systematic variation reading from input files
Full Combine and CombineHarvester integration

See:

Datacard Generator Guide for complete documentation
Systematics Example for creating histograms with systematic variations
Combine Integration for complete statistical analysis workflow

Production Manager

Unified production management system for batch analyses.

Legacy compatibility submission scripts have been removed. New production workflows should use LAW discovery tasks to generate job configs and core/python/production_monitor.py / core/python/production_manager.py for submission, monitoring, validation, and resubmission.

Recommended workflow:

# Discover files via Rucio and build job configs
law run GetRucioFileList --submit-config analyses/myAnalysis/cfg/submit_config.txt --name myRun
law run SkimTask --submit-config analyses/myAnalysis/cfg/submit_config.txt --dataset-manifest analyses/myAnalysis/cfg/datasets.yaml --name mySkimRun --file-source rucio --file-source-name myRun --exe build/analyses/MyAnalysis/myanalysis

# Monitor and validate
python core/python/production_monitor.py monitor --name mySkimRun
python core/python/production_monitor.py validate --name mySkimRun
python core/python/production_monitor.py resubmit --name mySkimRun

Features:

Unified job lifecycle management (generation, submission, monitoring, validation)
State persistence (resilient to connection failures)
Real-time progress monitoring
Automatic output validation
Failure recovery and resubmission
HTCondor and DASK backend support
Works in AFS/EOS storage areas

See: Production Manager Guide for complete documentation.

HTCondor Submission

Legacy batch submission scripts have been removed. Batch submission now uses LAW discovery tasks and the production manager toolchain.

Features:

Rucio-based dataset discovery via LAW
Open Data discovery via LAW tasks
Automatic input/output staging
XRootD support
Shared executable staging
Configuration validation

See: Batch Submission Guide for complete documentation.

Custom ROOT Dictionaries

Support for custom C++ objects:

cmake -S . -B build \
  -DRDF_CUSTOM_DICT_HEADERS="MyEvent.h;MyObject.h" \
  -DRDF_CUSTOM_DICT_LINKDEF="MyLinkDef.h" \
  -DRDF_CUSTOM_DICT_INCLUDE_DIRS="/path/to/headers"

Dictionaries are automatically built and linked.

Contributing

Contributions are welcome! To contribute:

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request

For new plugins, see Plugin Development Guide.

Support

Documentation: Check the docs/ directory
Issues: Open an issue on GitHub
Examples: See analyses/ExampleAnalysis/

License

This project is licensed under the terms specified in the repository.

Acknowledgments

Built on ROOT RDataFrame
Uses ONNX Runtime for ML inference
Corrections via correctionlib
BDT support via FastForest

Full Documentation: https://brkronheim.github.io/RDFAnalyzerCore/

Name		Name	Last commit message	Last commit date
Latest commit History 554 Commits
.github/workflows		.github/workflows
analyses		analyses
cmake		cmake
core		core
docker		docker
docs		docs
examples		examples
.clang_format		.clang_format
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
Doxyfile		Doxyfile
README.md		README.md
build.sh		build.sh
buildFast.sh		buildFast.sh
buildFull.sh		buildFull.sh
buildTest.sh		buildTest.sh
clean.sh		clean.sh
cleanBuild.sh		cleanBuild.sh
cleanTest.sh		cleanTest.sh
env.sh		env.sh
requirements-ci.txt		requirements-ci.txt
requirements-production.txt		requirements-production.txt
test.sh		test.sh
test_python_bindings.sh		test_python_bindings.sh

Folders and files

Latest commit

History

Repository files navigation

RDFAnalyzerCore

Overview

Quick Start

NDHistogramManager

Installing

Documentation

Core C++ Backbone

For Users

Statistical Analysis

For Developers

Documentation Paths by Audience

Requirements

Repository Structure

Features

Plugin System

BDTManager

OnnxManager

SofieManager

CorrectionManager

TriggerManager

NDHistogramManager

Configuration-Driven Design

Installation and Building

On a CVMFS-backed HEP host

Standalone ROOT Installation

Build Options

Testing

Adding Your Analysis

Framework Architecture

Core Concepts

Data Flow

Usage Example

Python Bindings

Advanced Features

Machine Learning Integration

Plotting from Meta Output

Systematic Uncertainties

CMS Combine Datacard Generation

Production Manager

HTCondor Submission

Custom ROOT Dictionaries

Contributing

Support

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages