BordAX

A High-Performance JAX Framework for Programmatic Reinforcement Learning

Overview

BordAX is a research-focused framework for Programmatic Reinforcement Learning (PRL) that combines the speed of JAX with support for structured, interpretable policies.

Key Features

🚀 High Performance: Fully JIT-compiled training pipelines leveraging JAX's XLA compilation
🧩 Modular Architecture: Clean separation between agents, algorithms, environments, and training logic
🎯 Multiple Policy Types: Support for MLPs, boolean functions (HyperBool), and decision trees (DTSemNet)
🔄 Flexible Algorithms: Built-in PPO (on-policy) and DQN (off-policy) with easy extensibility
🔧 Extensible: Simple APIs for adding new agents, algorithms, and environments

Installation

Setup

# Clone the repository
git clone https://github.com/SynthesisLab/bordax.git
cd bordax

# Create and activate virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Quick Test

Verify your installation:

python -c "from bordax.trainer import Trainer; print('✓ BordAX installed successfully')"

Quick Start

Training PPO on CartPole

python train_ppo.py

This will:

Train an agent with MLP policy using PPO on CartPole-v1
Save results to runs/ppo_YYYYMMDD_HHMMSS/
Generate training plots (rewards, policy loss, value loss, entropy)

Expected results:

Solves CartPole-v1 (reward = 500) in ~100k steps
Training time: ~6 seconds on CPU

Training DQN on CartPole

python train_dqn.py

Expected results:

Solves CartPole-v1 (reward = 500) in ~50k steps
Training time: ~30 seconds on CPU

Custom Training Script

from bordax.trainer import Trainer, TrainerConfig
from bordax.algorithms.utils import make_algo
from bordax.environments.utils import make_env
from bordax.agents.utils import make_agent
import jax

# Setup environment
env = make_env("gymnax/CartPole-v1", {}, num_envs=1)
eval_env = make_env("gymnax/CartPole-v1", {}, num_envs=1)

# Create agent
agent = make_agent("mlp/mlp", env, {
    "policy_layers": [64, 64],
    "value_layers": [64, 64],
})

# Configure algorithm
algorithm = make_algo("ppo", {
    "lr": 3e-4,
    "rollout_length": 2048,
    "gamma": 0.99,
    "_lambda": 0.95,
    "clip_schedule": lambda _: 0.2,
    "vf_schedule": lambda _: 0.5,
    "ent_schedule": lambda _: 0.01,
    "num_minibatches": 16,
    "num_sgd_steps": 10,
})

# Setup trainer
config = TrainerConfig(
    num_checkpoints=100,
    epochs_per_checkpoint=1,
    evaluation_episodes=32,
    debug=True,
    save_model=True,
)

trainer = Trainer(env, eval_env, agent, algorithm, config)

# Initialize and train
key = jax.random.PRNGKey(0)
init_key, train_key = jax.random.split(key)
trainer.init(init_key)

metrics, eval_data, model_params = trainer.run(train_key)

Architecture

BordAX uses a modular pipeline architecture:

Trainer
  └─> Algorithm (Collector + BatchBuilder + Updater)
       ├─> Collector: Generates environment transitions
       ├─> BatchBuilder: Constructs training batches
       └─> Updater: Computes gradients and updates parameters

Core Components

Component	Purpose	Examples
Agent	Defines policy and value networks	`MLPPolicyValue`, `DQNAgent`
Algorithm	Bundles training pipeline components	`ppo_algo()`, `dqn_algo()`
Collector	Generates transitions via environment interaction	`OnPolicyCollector`, `EpsGreedyCollector`
BatchBuilder	Transforms data into training batches	`MiniBatch`, `UniformReplayBatch`
Updater	Updates parameters using loss functions	`SGDUpdate`, `DQNUpdater`
Trainer	Orchestrates full training loop	`Trainer`

Supported Algorithms

PPO
DQN

Project Structure

bordax/
├── bordax/                   # Main package
│   ├── agents/               # Agent definitions
│   │   ├── base.py           # Base classes and implementations
│   │   ├── components.py     # Neural network modules
│   │   └── utils.py          # Agent factory
│   ├── algorithms/           # RL algorithms
│   │   ├── base.py           # Algorithm implementations
│   │   ├── losses.py         # Algorithm-specific losses
│   │   └── utils.py          # Algorithm factory
│   ├── environments/         # Environment adapters (Gymnax, Gymnasium)
│   ├── batchbuilders.py      # Batch construction
│   ├── buffer.py             # Replay buffer
│   ├── collectors.py         # Data collection strategies
│   ├── trainer.py            # Training pipeline orchestration
│   ├── types.py              # Type definitions
│   └── updaters.py           # Model parameter updates
├── train_ppo.py              
├── train_dqn.py              
├── requirements.txt          
└── README.md

Policy Representations

Standard Neural Networks

MLP Policy-Value (mlp/mlp):

agent = make_agent("mlp/mlp", env, {
    "policy_layers": [128, 128, 64],
    "value_layers": [128, 128, 64],
})

Programmatic Policies

HyperBool (Boolean function-based):

agent = make_agent("boolean/mlp", env, {
    "n": 4,  # Number of boolean variables
    "value_layers": [128, 64, 32],
})

DTSemNet (Decision trees):

agent = make_agent("dt/mlp", env, {
    "tree_depth": 4,
    "value_layers": [64, 64],
})

License

BordAX is released under the MIT License.

Acknowledgments

BordAX builds on excellent work from the JAX ecosystem:

JAX: High-performance numerical computing
Flax: Neural network library
Gymnax: JAX-compatible RL environments
Optax: Gradient processing and optimization
Distrax: Probability distributions

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
bordax		bordax
brouillax		brouillax
environments		environments
imgs		imgs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train_dqn.py		train_dqn.py
train_ppo.py		train_ppo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BordAX

Overview

Key Features

Installation

Setup

Quick Test

Quick Start

Training PPO on CartPole

Training DQN on CartPole

Custom Training Script

Architecture

Core Components

Supported Algorithms

Project Structure

Policy Representations

Standard Neural Networks

Programmatic Policies

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

SynthesisLab/bordax

Folders and files

Latest commit

History

Repository files navigation

BordAX

Overview

Key Features

Installation

Setup

Quick Test

Quick Start

Training PPO on CartPole

Training DQN on CartPole

Custom Training Script

Architecture

Core Components

Supported Algorithms

Project Structure

Policy Representations

Standard Neural Networks

Programmatic Policies

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages