A pure Rust implementation of TensorFlow, providing a full-featured machine learning framework with Rust's safety and performance.
Beta Release Notice (0.1.0-beta.1 · 2025-02-02)
First beta release with production-ready core functionality! All 2357 tests passing, zero security vulnerabilities, and comprehensive documentation. The core API is stabilizing for 1.0.
⚠️ Note: This release temporarily excludes Python bindings (FFI) and tensorboard integration. See CHANGELOG.md for details and timeline.
TenfloweRS is a native Rust machine learning framework inspired by TensorFlow, designed to bring the power of deep learning to the Rust ecosystem. It leverages Rust's memory safety, zero-cost abstractions, and excellent performance while maintaining compatibility with the broader ML ecosystem through ONNX support.
TenfloweRS adapts TensorFlow's proven architecture to Rust's strengths:
- Memory Safety First: All operations are memory-safe by design, eliminating segfaults and data races
- Zero-Cost Abstractions: High-level APIs compile down to efficient machine code
- Explicit over Implicit: Clear ownership and error handling following Rust conventions
- Modular Architecture: Organized as a workspace of focused, reusable crates
- Cross-Platform: Native support for Windows, macOS, and Linux with unified GPU abstraction
| TensorFlow Concept | TenfloweRS Implementation |
|---|---|
tf.Tensor |
Tensor<T> with static typing |
tf.Operation |
Op trait with registered kernels |
tf.Graph |
Graph struct with ownership semantics |
tf.Session |
Session trait for graph execution |
tf.GradientTape |
GradientTape for automatic differentiation |
tf.keras.Layer |
Layer trait with builder pattern |
tf.data.Dataset |
Iterator-based Dataset trait |
tf.device |
Device enum with placement control |
- 🚀 Dual Execution Modes: Both eager execution (PyTorch-style) and static computation graphs (TensorFlow-style)
- 🦀 Pure Rust Implementation: No C/C++ dependencies in the core, ensuring memory safety
- 🎮 GPU Support: Cross-platform GPU acceleration via WGPU (Metal, Vulkan, DirectX)
- 🔧 Rust Scientific Stack: Built on NumRS2 and SciRS2 for numerical computing
- 🐍 Python Bindings:
⚠️ Temporarily excluded in beta.1 (requires Python environment setup) - 📦 ONNX Support: Import and export models for cross-framework compatibility
- ⚡ Performance: SIMD vectorization, optional BLAS integration, and parallel execution
- ✅ Production Ready: 2357 tests passing, 0 security vulnerabilities, comprehensive docs
Current Version: 0.1.0-beta.1 (Released 2025-02-02)
First beta release with production-ready core functionality! The core API is stabilizing for 1.0 release.
- Tests: 2357/2357 passing (100% pass rate)
- Security: 0 vulnerabilities (all known issues resolved)
- Code Quality: Zero clippy warnings, full formatting compliance
- Documentation: Complete crate-level docs and READMEs for all published crates
- Build: All 5 core crates successfully package and verify
- ✅ tenflowers-core (6.5 MiB) - Core tensor operations and GPU support
- ✅ tenflowers-autograd (2.8 MiB) - Automatic differentiation engine
- ✅ tenflowers-dataset (2.1 MiB) - Data loading and preprocessing
- ✅ tenflowers-neural (3.0 MiB) - Neural network layers and training
- ✅ tenflowers (182 KiB) - Unified API and prelude
⚠️ tenflowers-ffi: Python bindings (requires Python dev environment)- Will be re-enabled in future release with proper CI/CD
- Use Rust API directly for now
⚠️ tensorboard integration: Logging feature (security fix)- Removed due to protobuf vulnerability (RUSTSEC-2024-0437)
- Will be re-added once dependency updated
- Use alternative logging temporarily
See CHANGELOG.md for complete details and migration guide.
- ✅ Core tensor operations fully tested and validated
- ✅ Automatic differentiation engine with comprehensive gradient support
- ✅ Neural network layers (Dense, Conv2D, BatchNorm, Dropout, etc.)
- ✅ Training utilities (optimizers, loss functions, training loops)
- ✅ Data loading pipeline with multi-format support
- ✅ GPU acceleration via WGPU (cross-platform)
- ✅ SciRS2/NumRS2 ecosystem integration complete
- ✅ Security hardening (zero vulnerabilities)
- ✅ Comprehensive documentation
- Python bindings not available (see Temporarily Excluded above)
- Tensorboard logging not available (see Temporarily Excluded above)
- Graph mode optimization passes still in development
- Multi-GPU orchestration experimental
- ONNX import/export in development
- Re-enable Python bindings with proper CI/CD
- Re-enable tensorboard integration (awaiting dependency fix)
- Complete graph optimization passes
- Expand GPU kernel coverage
- Performance benchmarking suite
- ONNX import/export finalization
- API stability guarantee for 1.0
- All 2357 tests passing
- Zero security vulnerabilities
- Zero clippy warnings
- All crates properly documented
- Package verification successful
- Version consistency across workspace
- CHANGELOG.md updated
- Migration guide provided
- ✅ Core tensor operations (creation, manipulation, arithmetic)
- ✅ Automatic differentiation with gradient tape
- ✅ Neural network layers and model composition
- ✅ Training loop with optimizers (SGD, Adam, AdamW)
- ✅ Data loading from multiple formats (CSV, images, HDF5, Parquet)
- ✅ GPU acceleration (WGPU backend)
- ✅ Integration with SciRS2 ecosystem
- ✅ Comprehensive error handling (no unwrap() usage)
- 🚧 Python bindings (code complete, CI/CD in progress)
- 🚧 Tensorboard integration (awaiting dependency security fix)
- 🚧 Graph optimization passes
- 🚧 Shape inference system
- 🚧 Graph construction and optimization
- 🚧 Tape-based automatic differentiation
- 🚧 GPU compute kernels
Add TenfloweRS to your Cargo.toml:
[dependencies]
tenflowers-core = "0.1.0-beta.1"
tenflowers-neural = "0.1.0-beta.1"For GPU support:
[dependencies]
tenflowers-core = { version = "0.1.0-beta.1", features = ["gpu"] }use tenflowers_core::{Tensor, Device, Context};
// Create a context for eager execution
let ctx = Context::new()?;
// Create tensors
let a = Tensor::<f32>::ones(&[2, 3]);
let b = Tensor::<f32>::from_vec(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], &[2, 3])?;
// Operations execute immediately in eager mode
let c = a.add(&b)?;
let d = c.matmul(&b.transpose()?)?;
// Move to GPU
let gpu_tensor = a.to(Device::Gpu(0))?;
// Automatic differentiation
let tape = GradientTape::new();
let x = Tensor::variable(vec![1.0, 2.0, 3.0], &[3]);
let y = tape.watch(x.clone());
let z = y.pow(2.0)?;
let grads = tape.gradient(&z, &[&x])?;use tenflowers_core::{Graph, Session, Placeholder};
// Build a computation graph
let graph = Graph::new();
let a = graph.placeholder::<f32>("input_a", &[None, 784])?;
let w = graph.variable("weights", &[784, 10])?;
let b = graph.variable("bias", &[10])?;
let y = a.matmul(&w)?.add(&b)?;
// Create a session and run
let session = Session::new(&graph)?;
session.run(
&[("input_a", input_tensor)],
&["output"],
&mut outputs
)?;use tenflowers_neural::{Sequential, Dense, Conv2D, Model};
use tenflowers_core::Tensor;
// Define a CNN for image classification
let mut model = Sequential::new(vec![
Box::new(Conv2D::new(32, (3, 3)).with_activation("relu")),
Box::new(Conv2D::new(64, (3, 3)).with_activation("relu")),
Box::new(layers::GlobalAveragePooling2D::new()),
Box::new(Dense::new(128, true).with_activation("relu")),
Box::new(layers::Dropout::new(0.5)),
Box::new(Dense::new(10, true).with_activation("softmax")),
]);
// Compile the model
model.compile(
optimizer::Adam::new(0.001),
loss::SparseCategoricalCrossentropy::new(),
vec![metrics::Accuracy::new()]
)?;
// Train the model
model.fit(
&train_dataset,
epochs: 10,
batch_size: 32,
validation_data: Some(&val_dataset),
)?;use tenflowers_dataset::{Dataset, DataLoader};
// Create a dataset from tensors
let dataset = Dataset::from_tensor_slices((images, labels))?
.shuffle(1000)
.batch(32)
.prefetch(2);
// Iterate through batches
for (batch_images, batch_labels) in dataset.iter() {
// Training step
}TenfloweRS follows a modular architecture inspired by TensorFlow:
tenflowers/
├── tenflowers-core/ # Core tensor operations and device management
│ ├── tensor/ # Tensor implementation with device support
│ ├── ops/ # Operation registry and implementations
│ ├── kernels/ # CPU and GPU kernel implementations
│ ├── graph/ # Computation graph representation
│ └── device/ # Device abstraction and management
├── tenflowers-autograd/ # Automatic differentiation engine
│ ├── tape/ # GradientTape for eager mode
│ ├── graph_grad/ # Graph-based backpropagation
│ └── ops/ # Gradient definitions for operations
├── tenflowers-neural/ # Neural network layers and models
│ ├── layers/ # Layer implementations
│ ├── models/ # Model abstraction and builders
│ ├── optimizers/ # Training optimizers
│ └── losses/ # Loss functions
├── tenflowers-dataset/ # Data loading and preprocessing
│ ├── sources/ # Data source implementations
│ ├── transforms/ # Data transformation ops
│ └── iterators/ # Efficient iteration strategies
└── tenflowers-ffi/ # Python bindings
├── tensor_py/ # Python tensor wrapper
├── ops_py/ # Operation bindings
└── keras_compat/ # Keras-compatible API
- Reference-counted tensors with device placement
- Lazy allocation and memory pooling
- Zero-copy views and slicing
- Automatic broadcasting
- Extensible operation registry
- Multi-dispatch for device/dtype specialization
- Shape inference at graph construction time
- Automatic gradient registration
- Eager Mode: Operations execute immediately
- Graph Mode: Build once, run multiple times with optimization
- XLA Integration: (Future) JIT compilation for performance
- Unified API for CPU, GPU, and custom devices
- Automatic device placement with hints
- Cross-device memory transfers
- Multi-GPU support with collective operations
# Clone the repository
git clone https://github.com/cool-japan/tenflowers
cd tenflowers
# Build all crates
cargo build --workspace
# Run tests (requires cargo-nextest)
cargo nextest run --workspace
# Build with GPU support
cargo build --workspace --features gpu
# Build with BLAS acceleration
cargo build --workspace --features blas-openblas
# Check for warnings (must pass - no warnings policy)
cargo check --workspace
cargo clippy --workspace -- -D warningsCheck out the examples directory for comprehensive examples:
mnist_eager.rs- MNIST classification with eager executionmnist_graph.rs- MNIST using static graphs (coming soon)gan_example.rs- Generative Adversarial Network (coming soon)transformer.rs- Transformer model implementation (coming soon)
TenfloweRS is designed for high performance:
- CPU: SIMD vectorization, optional BLAS integration, Rayon parallelization
- GPU: WGPU compute shaders, memory pooling, kernel fusion
- Memory: Zero-copy operations, buffer reuse, lazy allocation
Coming soon! Target performance goals:
- CPU: Match or exceed NumPy
- GPU: 90% of TensorFlow performance
- Memory: Within 10% of TensorFlow usage
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Key areas where we need help:
- Implementing core operations (see TODO.md)
- GPU kernel development
- Shape inference functions
- Documentation and examples
- Testing and benchmarking
- Python API design
- Check TODO.md for tasks
- Open an issue to discuss your contribution
- Follow the no-warnings policy
- Write tests including gradient checks
- Submit a PR with clear description
See TODO.md for the detailed development roadmap.
- v0.1.0: Core tensor ops and basic autograd
- v0.2.0: GPU support and essential layers
- v0.3.0: Graph mode and optimizations
- v0.4.0: Python bindings and Keras compatibility
- v0.5.0: ONNX import/export
- v1.0.0: Production-ready with stable API
| Feature | TensorFlow | TenfloweRS |
|---|---|---|
| Language | C++ with Python API | Pure Rust with Python bindings |
| Memory Safety | Manual management | Guaranteed by Rust |
| Execution | Eager + Graph | Eager + Graph |
| GPU Support | CUDA, ROCm | WGPU (cross-platform) |
| Autodiff | Tape + Graph | Tape + Graph |
| Deployment | TFLite, TF.js | Native, WASM (planned) |
| Ecosystem | Mature, extensive | Growing, Rust-focused |
This project is licensed under the Apache License, Version 2.0 (LICENSE).
TenfloweRS builds upon the excellent Rust scientific computing ecosystem:
- NumRS2 for n-dimensional arrays
- SciRS2 for scientific algorithms
- ndarray for array operations
- WGPU for GPU compute
Special thanks to the TensorFlow team for the inspiration and architectural patterns.
- GitHub Issues: Bug reports and feature requests
- Discussions: Community forum
- Discord: Coming soon!
Note: TenfloweRS is not affiliated with Google's TensorFlow. It's an independent project bringing ML capabilities to Rust.