Real-time reinforcement learning on the ESP32-S3 microcontroller.
TinyRL is a lightweight, header-only C++ deep learning framework designed for real-time reinforcement learning on microcontrollers and embedded systems.
| Component | Description |
|---|---|
| Autograd Core | Tensors, reverse-mode automatic differentiation, neural network layers, and optimizers |
| Stream-X | Streaming deep RL algorithms: StreamAC (Actor-Critic), StreamQ (Q-learning with discrete/Atari variants), and StreamSARSA |
TinyRL/
├── src/ # Core C++ headers (autograd, layers, optimizer)
├── examples/
│ ├── cpp/ # C++ examples (MLP, CNN, operations)
│ ├── python/ # Python examples (MNIST, CIFAR-10, spiral)
│ ├── stream_x/ # Streaming deep RL (StreamAC, StreamQ, StreamSARSA)
│ └── stream_x_esp32/ # ESP32 embedded RL example
├── bindings/ # pybind11 Python bindings
├── tests/ # Comprehensive test suite
├── docs/ # MkDocs documentation
└── website/ # Landing page
| Feature | Description |
|---|---|
| 🔧 Header-only | Zero dependencies—just include and compile |
| 🚀 Autograd | Reverse-mode automatic differentiation with dynamic graphs |
| 🧠 Neural Networks | Linear, Conv2D, LayerNorm, ReLU, Tanh, Softmax, and more |
| 📊 N-D Tensors | 2D and 4D tensor operations with broadcasting |
| 🐍 Python Bindings | Full pybind11 interface for rapid prototyping |
| 🎯 Optimizers | SGD, RMSProp, and ObGD (Stream-X) |
| ⚡ SIMD Acceleration | Vectorized operations for SSE/NEON |
| 🛡️ Error Handling | Comprehensive validation with descriptive messages |
| 🧪 Well-tested | Extensive test suite with gradient checking |
| 🪶 Embedded Ready | Runs on ESP32-S3 with minimal memory footprint |
- Quick Start
- Installation
- Basic Usage
- Core Components
- Examples
- Building from Source
- Numeric Precision
- Streaming RL
- Embedded / ESP32-S3
- Documentation
- Contributing
- License
- C++17 compatible compiler (GCC 13+, Clang 16+, or recent MSVC)
- CMake 3.14+
- Python 3.8+ (for Python bindings)
- pybind11 (for Python bindings)
git clone https://github.com/mohmdelsayed/TinyRL.git
cd TinyRLInclude the headers in your project:
#include "autograd.h"
#include "layers.h"
#include "optimizer.h"# Build and install TinyRL
mkdir build && cd build
cmake .. -DAUTOGRAD_BUILD_EXAMPLES=ON -DAUTOGRAD_BUILD_TESTS=ON
cmake --build . --target install -j
# In your project's CMakeLists.txt
find_package(AutoGrad REQUIRED)
target_link_libraries(your_target PRIVATE AutoGrad::autograd)# Build from source (no PyPI release yet)
./install.sh --with-bindings
# Or manual build
cd bindings && mkdir build && cd build
cmake .. && cmake --build . -j#include "autograd.h"
#include "layers.h"
#include "module.h"
#include "optimizer.h"
int main() {
ag::manual_seed(42);
nn::Sequential model;
model.add(nn::Linear(10, 5));
ag::SGD optimizer(0.01f);
optimizer.add_parameters(model.layers());
// Random input (batch 32 x features 10)
// and dummy target (batch 32 x 5 to match output)
int batch_size = 32;
ag::Tensor x(ag::Matrix::Random(batch_size, 10), true, "x");
ag::Tensor y(ag::Matrix::Random(batch_size, 5), false, "y");
ag::Tensor out = model.forward(x);
ag::Tensor loss = ag::sum(ag::pow(out - y, 2.0f)) / batch_size;
optimizer.zero_grad();
loss.backward();
optimizer.step();
return 0;
}import autograd
import numpy as np
# Create tensors
x = autograd.Tensor(np.random.rand(32, 10), requires_grad=True)
y = autograd.Tensor(np.random.rand(32, 1), requires_grad=False)
# Build model
model = autograd.Sequential()
model.add(autograd.Linear(10, 1))
opt = autograd.SGD(0.01)
opt.add_parameters(model.layers())
# Forward/backward
out = model.forward(x)
loss = autograd.sum(autograd.pow(out - y, 2.0))
opt.zero_grad()
loss.backward()
opt.step()- N-dimensional tensor support with bounds checking
- Basic arithmetic operations with broadcasting
- Matrix operations (matmul, transpose)
- Element-wise operations with SIMD optimization
- Convolution operations
- Reverse-mode automatic differentiation
- Dynamic computational graph construction
- Gradient computation and memory management
- Graph visualization tools
- Enhanced error handling and validation
- Linear (Dense) layers
- Convolutional layers (Conv2D)
- Activation functions (ReLU, LeakyReLU, Tanh, Softmax, Softplus)
- Normalization layers (LayerNorm) with efficient fused operations
- Sequential model support
- SGD (Stochastic Gradient Descent)
- RMSProp (Root Mean Square Propagation with adaptive learning rates)
- ObGD (Overshooting-bounded Gradient Descent) for streaming deep RL
- Random number generation (
rng.h) - Computational graph visualization (
draw_graph.h) - Weight initialization (
initialize.h) with LeCun and sparse methods - Lightweight logging macro (
log.h) viaAG_LOG(printf on embedded) - Build scripts and comprehensive test suites
- Python bindings
cd examples/cpp
mkdir build && cd build
cmake .. && make
./minimal_operations
./minimal_network_mlp
./minimal_network_cnncd bindings && mkdir build && cd build
cmake .. && make
cd ../../examples/python
python minimal_operations.py
python full_training_mnist.py- Basic Operations: Tensor creation, arithmetic, and gradients
- MLP Training: Multi-layer perceptron with MNIST
- CNN Training: Convolutional networks with CIFAR-10
- Reinforcement Learning: Stream-X examples (StreamAC algorithm in continuous/discrete variants, StreamQ algorithm, StreamSARSA algorithm for discrete actions)
./install.shmkdir build && cd build
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DAUTOGRAD_BUILD_EXAMPLES=ON \
-DAUTOGRAD_BUILD_TESTS=ON \
-DAUTOGRAD_BUILD_BINDINGS=OFF
make -j$(nproc)| Option | Description | Default |
|---|---|---|
AUTOGRAD_BUILD_EXAMPLES |
Build example programs | ON |
AUTOGRAD_BUILD_TESTS |
Build test suite | ON |
AUTOGRAD_BUILD_BINDINGS |
Build Python bindings | OFF |
AUTOGRAD_BUILD_STREAM_X |
Build Stream-X RL module | OFF |
AUTOGRAD_ENABLE_SIMD |
Enable SIMD optimizations | ON |
AUTOGRAD_EMBEDDED |
Enable embedded-friendly mode (defines AG_EMBEDDED) |
OFF |
AUTOGRAD_FAST_MATH |
Enable -ffast-math in Release builds |
ON |
AUTOGRAD_USE_DOUBLE |
Use double precision (AG_USE_DOUBLE) |
OFF |
TinyRL uses float32 (Float type) by default for all computations, providing optimal balance of performance and memory efficiency for embedded systems and real-time applications.
To use float64 (double precision), define AG_USE_DOUBLE before including TinyRL headers:
#define AG_USE_DOUBLE
#include "autograd.h"Or via CMake:
cmake .. -DAUTOGRAD_USE_DOUBLE=ONOr via the install script:
./install.sh --use-doubleThe Stream-X module lives in examples/stream_x and includes the StreamAC algorithm (continuous and discrete Actor‑Critic variants), the StreamQ algorithm (discrete Q‑learning), and the StreamSARSA algorithm (on-policy discrete TD learning).
./install.sh --with-stream-xOr directly via CMake: -DAUTOGRAD_BUILD_STREAM_X=ON.
- StreamAC algorithm — Continuous actions
#include "stream_x/stream_ac_continuous.h"
int obs_dim = 11;
int act_dim = 1;
int hidden = 128;
ContinuousStreamAC ac_cont(obs_dim, 1.0f, 0.99f, 0.8f, 2.0f, 2.0f);
nn::Sequential actor_backbone;
actor_backbone.add(nn::Linear(obs_dim, hidden));
actor_backbone.add(nn::ReLU());
nn::Sequential mu_head;
mu_head.add(nn::Linear(hidden, act_dim));
nn::Sequential std_head;
std_head.add(nn::Linear(hidden, act_dim));
std_head.add(nn::Softplus());
nn::Sequential critic;
critic.add(nn::Linear(obs_dim, hidden));
critic.add(nn::ReLU());
critic.add(nn::Linear(hidden, 1));
ac_cont.set_model(actor_backbone, mu_head, std_head, critic);
ag::Matrix norm_s = ac_cont.normalize_observation(state);
ag::Matrix norm_sn = ac_cont.normalize_observation(next_state);
ag::Tensor s(norm_s, false);
ag::Tensor sn(norm_sn, false);
ag::Tensor a_cont = ac_cont.sample_action(s);
ag::Float scaled_r = ac_cont.scale_reward(reward, done);
ag::Tensor r(ag::Matrix::Constant(1, 1, scaled_r), false);
ac_cont.update(s, a_cont, r, sn, done);- StreamAC algorithm — Discrete actions
#include "stream_x/stream_ac_discrete.h"
int obs_dim = 11;
int n_actions = 2;
int hidden = 128;
DiscreteStreamAC ac_disc(obs_dim, n_actions, 1.0f, 0.99f, 0.8f, 2.0f, 2.0f);
nn::Sequential actor;
actor.add(nn::Linear(obs_dim, hidden));
actor.add(nn::ReLU());
actor.add(nn::Linear(hidden, n_actions));
actor.add(nn::Softmax());
nn::Sequential critic;
critic.add(nn::Linear(obs_dim, hidden));
critic.add(nn::ReLU());
critic.add(nn::Linear(hidden, 1));
ac_disc.set_model(actor, critic);
ag::Matrix norm_s = ac_disc.normalize_observation(state);
ag::Matrix norm_sn = ac_disc.normalize_observation(next_state);
ag::Tensor s(norm_s, false);
ag::Tensor sn(norm_sn, false);
ag::Tensor a_idx = ac_disc.sample_action(s); // action index (scalar tensor)
ag::Float scaled_r = ac_disc.scale_reward(reward, done);
ag::Tensor r(ag::Matrix::Constant(1, 1, scaled_r), false);
ac_disc.update(s, a_idx, r, sn, done);- StreamQ algorithm — Discrete Q-learning
#include "stream_x/stream_q.h"
// Define a simple Q network (1xA output)
nn::Sequential qnet;
qnet.add(nn::Linear(11, 64));
qnet.add(nn::ReLU());
qnet.add(nn::Linear(64, 2));
StreamQ q_agent(/*n_obs=*/11, /*n_actions=*/2, /*lr=*/1.0f, /*gamma=*/0.99f,
/*lambda=*/0.8f, /*kappa=*/2.0f);
q_agent.set_model(qnet);
ag::Matrix norm_s = q_agent.normalize_observation(state);
ag::Matrix norm_sn = q_agent.normalize_observation(next_state);
ag::Tensor s(norm_s, false);
ag::Tensor sn(norm_sn, false);
auto [act, is_nongreedy] = q_agent.sample_action(s);
ag::Float scaled_r = q_agent.scale_reward(reward, done);
q_agent.update(s, act, scaled_r, sn, done, is_nongreedy);An on-device reinforcement learning example is provided in examples/stream_x_esp32/ using PlatformIO/Arduino.
- Header-only integration (no dynamic library needed)
- Compile with:
-DAG_EMBEDDED -DAG_ENABLE_SIMD=OFF - Replaces iostream logging with
printfviaAG_LOG - Manual unrolled math loops for Xtensa
- Enable via CMake:
-DAUTOGRAD_EMBEDDED=ON(addsAG_EMBEDDEDdefine automatically)
c++ -std=c++17 -DAG_EMBEDDED \
-I src -I examples/stream_x/src \
examples/stream_x_esp32/src/stream_ac_continuous.cpp -o esp32_sim
./esp32_simcd examples/stream_x_esp32
platformio run --target upload
platformio device monitorFor detailed instructions, see examples/stream_x_esp32/README.md.
Run the comprehensive test suite:
# Quick test (standalone)
cd tests
sh compile_tests.sh && sh run_tests.sh
# Full project test with new build system
./install.sh --build-type Debug
# Run specific test categories
cd build
./test_error_handling # Error handling validation
./test_operations # Basic operations
./test_mlps # Neural network tests
./test_cnns # Convolutional network tests- Error Handling: Comprehensive validation of all error conditions
- Functionality: Complete testing of all features and optimizers
- Edge Cases: Boundary conditions and error state testing
Visit the project documentation: https://mohmdelsayed.github.io/TinyRL/
The documentation includes:
- 📋 API Reference: Complete class and method documentation
- 📚 Tutorials: Step-by-step guides
- 💡 Examples: Practical code examples
- 🛠️ Installation Guide: Setup instructions
- 🧠 Architecture Overview: System design and components
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
git clone https://github.com/mohmdelsayed/TinyRL.git
cd TinyRL
./install.sh --build-type Debug --with-bindings
# Make your changes
cd build && ctest # Run testsIf you use TinyRL in your research, please cite:
@software{TinyRL2026,
author = {Elsayed, Mohamed},
title = {{TinyRL: Real-Time Deep RL That Fits in Small Devices}},
year = {2026},
publisher = {GitHub},
url = {https://github.com/mohmdelsayed/TinyRL},
}This project is licensed under the MIT License—see the LICENSE file for details.
- TinyRL's autograd design is inspired by TinyGrad/PyTorch's automatic differentiation systems
- Stream-X algorithms are based on Streaming Deep Reinforcement Learning
- We thank Khurram Javed, Adrian Orenstein, and Kris De Asis for testing an early version of the library and for providing helpful feedback.
