stochastic-rs

A high-performance Rust library for stochastic process simulation, quantitative finance, statistics, copulas, distributions, and neural-network volatility surrogates. Generic over f32 / f64, with SIMD acceleration on CPU and CUDA / Metal / Accelerate / cubecl backends where they pay off, and first-class Python bindings via PyO3.

Documentation

📖 stochastic.rust-dd.com — full docs site (Fumadocs + Next.js, deployed on Vercel).

Local preview from source under website/:

cd website
bun install
bun run dev          # http://localhost:3000

Highlights:

120+ stochastic processes — diffusion, jump, fractional / rough, short-rate, HJM, LMM, fBM, Hawkes, Lévy. Generic-precision ProcessExt<T> impl, SIMD on CPU, optional CUDA / Metal for FGN / fBM.
Pricing & calibration — closed-form (BSM, Bachelier, Black76, Bjerksund-Stensland, …), Fourier (Heston / Bates / Merton-jump / Kou / VG / CGMY / HKDE / double-Heston), Monte Carlo (basket, rainbow, cliquet, autocallable, spread), finite difference, Bermudan LSM, Heston SLV. Heston / SABR / SVJ / Lévy / rough Bergomi / double-Heston / Hull-White swaption-grid calibrators.
Statistics & risk — Hurst (Fukasawa), MLE for 1-D diffusions with 6 transition densities, ADF / KPSS / Phillips-Perron, realised variance with BNHLS bandwidth, HMM, changepoint, particle filter, UKF. VaR / CVaR / drawdown, Sharpe / Sortino / IR / Calmar.
Fixed income & credit — yield-curve bootstrapping, Nelson-Siegel / Svensson, multi-curve, IRS / inflation swaps, Vasicek / CIR / Hull-White / G2++ short-rate engines, Merton structural model, reduced-form survival curves, CDS pricing, JLT migration matrices.
Microstructure — Almgren-Chriss, Kyle (1985), Bouchaud propagator, full price-time priority order book.
Distributions & copulas — 19 SIMD distributions with closed-form pdf / cdf / cf / moments. Clayton / Frank / Gumbel / Independence bivariate; Gaussian / vine multivariate.
Python bindings — 210 entries (198 PyO3 classes + 12 functions) spanning every sub-crate except AI surrogates. Numpy-in / numpy-out.

Installation

Rust

[dependencies]
stochastic-rs = "2.0.0"

use stochastic_rs::prelude::*;
use stochastic_rs::stochastic::diffusion::gbm::Gbm;
use stochastic_rs::quant::pricing::heston::HestonPricer;

For per-sub-crate (lean) builds, OpenBLAS / CUDA / Metal / cubecl / Accelerate feature flags, native CPU optimisation, and SIMD details, see the installation guide on the docs site.

Python

pip install stochastic-rs

Source build (requires the Rust toolchain):

pip install maturin
maturin develop --release --manifest-path stochastic-rs-py/Cargo.toml

Linux (x86_64 / aarch64) and macOS (arm64 / x86_64) wheels ship with the openblas feature on. The Windows wheel omits the 15 BLAS-backed classes; everything else (≈195 classes / 12 functions) works identically. See the Python bindings page for the parity table and the source-build path with vcpkg.

Quickstart

use stochastic_rs::prelude::*;
use stochastic_rs::stochastic::diffusion::ou::Ou;
use stochastic_rs::quant::pricing::heston::HestonPricer;
use stochastic_rs::quant::types::OptionType;

fn main() {
    // Mean-reverting OU path
    let p = Ou::<f64>::new(2.0, 0.0, 1.0, 1_000, Some(0.0), Some(1.0));
    let path = p.sample();

    // Heston European call with first- and second-order Greeks
    let pricer = HestonPricer::<f64>::new(
        100.0, 100.0, 1.0, 0.03, 0.0,
        0.04, 2.0, 0.04, 0.3, -0.5,
    );
    let price = pricer.price(OptionType::Call);
    let greeks = pricer.greeks(OptionType::Call);
    println!("call={:.4}, delta={:.4}, vega={:.4}", price, greeks.delta, greeks.vega);
}

import stochastic_rs as srs

# Mean-reverting OU path
p = srs.Ou(theta=2.0, mu=0.0, sigma=1.0, n=1000, x0=0.0, t=1.0)
path = p.sample()                       # numpy.ndarray, shape (1000,)

# Heston European call
pricer = srs.HestonPricer(
    s0=100, k=100, tau=1.0, r=0.03, q=0.0,
    v0=0.04, kappa=2.0, theta=0.04, sigma=0.3, rho=-0.5,
)
print("call =", pricer.price("call"))
g = pricer.greeks("call")
print(f"delta={g.delta:.4f}, vega={g.vega:.4f}")

More end-to-end recipes (Heston calibration, fBM Hurst estimation, vol-surface from quotes, Python interop) live in the tutorials section.

Benchmarks

FGN — CPU vs CUDA native (`f32`, H = 0.7)

cargo bench --features cuda-native --bench fgn_cuda_native

Single path:

n	CPU `sample`	CUDA `sample_cuda_native(1)`	Speedup
1,024	8.1 µs	46 µs	0.18×
4,096	35 µs	84 µs	0.42×
16,384	147 µs	110 µs	1.3×
65,536	850 µs	227 µs	3.7×

Batch:

n, m	CPU `sample_par`	CUDA `sample_cuda_native`	Speedup
4,096, 32	147 µs	117 µs	1.3×
4,096, 512	1.78 ms	2.37 ms	0.75×
65,536, 128	12.6 ms	10.5 ms	1.2×
65,536, 1 k	102 ms	93 ms	1.1×

CUDA wins for large n (≥ 16 k); CPU rayon dominates for medium n because of the GPU launch / transfer overhead.

Distribution sampling — `Normal` vs upstream `rand_distr`

Single-thread fill_slice, median of 7 runs (cargo bench --bench dist_multicore). Comparison column:

rand_distr + SimdRng — rand_distr::Normal consuming our SimdRng (same uniform stream, only the Normal algorithm differs).
rand_distr + rand::rng() — the out-of-box upstream pipeline.

n	`SimdNormal` (µs)	`rand_distr + SimdRng` (µs)	speedup	`rand_distr + rand::rng()` (µs)	speedup
4	0.008	0.013	1.73×	0.032	4.22×
8	0.014	0.026	1.78×	0.065	4.52×
16	0.029	0.051	1.79×	0.128	4.47×
64	0.109	0.208	1.90×	0.508	4.64×
256	0.432	0.840	1.94×	2.029	4.70×
4 096	6.975	13.176	1.89×	32.382	4.64×
65 536	113.458	212.406	1.87×	520.219	4.59×

Single-sample speedup vs prior release

Criterion dist.sample(rng) loop, vs the wide 1.3.0 baseline (cargo bench --bench distributions -- --baseline before):

distribution	f32 / large	f64 / large	f64 / small
`Uniform/simd`	−57% (≈ 2.3×)	−77% (≈ 4.4×)	−58% (≈ 2.4×)
`Normal/simd`	−51% (≈ 2.0×)	−75% (≈ 4.0×)	−63% (≈ 2.7×)
`Exp/simd N=64`	−3% (n.s.)	−73% (≈ 3.7×)	—
`LogNormal/simd`	−71% (≈ 3.4×)	−70% (≈ 3.4×)	−66% (≈ 2.9×)

Driven by SIMD u64→f64 / u32→f32 magic-number conversion in SimdRng (direct-write fill_uniform_f64 / fill_uniform_f32 APIs that skip the [f64; 8] return-by-value round-trip), fused Exp(λ) scaling inside fill_exp_scaled, and an 8-at-a-time main loop in fill_ziggurat so copy_from_slice inlines to stp stores instead of a memcpy call.

Opt-in: dual-stream RNG (`dual-stream-rng` feature)

[dependencies]
stochastic-rs = { version = "2.1", features = ["dual-stream-rng"] }

Unlocks SimdRngDual (two parallel xoshiro engines) and SimdNormalDual (Ziggurat unrolled 2× over the dual streams). Measured against the single-stream SimdNormal::fill_slice on Apple Silicon (cargo bench --bench dual_stream_compare --features dual-stream-rng):

n	single (`SimdNormal`)	dual (`SimdNormalDual`)	Δ
64	111.6 ns	105.5 ns	−5.5%
256	444.8 ns	418.3 ns	−6.0%
4 096	7.43 µs	6.60 µs	−11.2%
65 536	113.9 µs	106.6 µs	−6.4%
1 048 576	1.83 ms	1.70 ms	−6.8%

The win comes from hiding the 16 scalar kn / wn table-lookup latencies behind the second engine's xoshiro state update on a modern out-of-order core. Uniform fills are not bottlenecked on the engine so they see no speedup. Trade-off: SimdRngDual::from_seed does not reproduce SimdRng::from_seed's bit-exact sequence (statistical properties are identical and KS-validated).

Contributing

Contributions are welcome — bug reports, feature suggestions, or PRs. Open an issue or start a discussion on GitHub. Per-feature recipes (add-diffusion-process, adding-distribution, calibration-pattern, docs-writing, …) live under .claude/skills/.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 810 Commits
.agents		.agents
.cargo		.cargo
.claude/skills		.claude/skills
.codex		.codex
.devcontainer		.devcontainer
.github/workflows		.github/workflows
benches		benches
docs		docs
examples		examples
src		src
stochastic-rs-ai		stochastic-rs-ai
stochastic-rs-copulas		stochastic-rs-copulas
stochastic-rs-core		stochastic-rs-core
stochastic-rs-distributions		stochastic-rs-distributions
stochastic-rs-py		stochastic-rs-py
stochastic-rs-quant		stochastic-rs-quant
stochastic-rs-stats		stochastic-rs-stats
stochastic-rs-stochastic		stochastic-rs-stochastic
stochastic-rs-viz		stochastic-rs-viz
tests		tests
website		website
.gitignore		.gitignore
.mcp.json		.mcp.json
CLAUDE.md		CLAUDE.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build.rs		build.rs
publish.sh		publish.sh
pyproject.toml		pyproject.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stochastic-rs

Documentation

Installation

Rust

Python

Quickstart

Benchmarks

FGN — CPU vs CUDA native (`f32`, H = 0.7)

Distribution sampling — `Normal` vs upstream `rand_distr`

Single-sample speedup vs prior release

Opt-in: dual-stream RNG (`dual-stream-rng` feature)

Contributing

License

About

Uh oh!

Releases 59

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

stochastic-rs

Documentation

Installation

Rust

Python

Quickstart

Benchmarks

FGN — CPU vs CUDA native (f32, H = 0.7)

Distribution sampling — Normal vs upstream rand_distr

Single-sample speedup vs prior release

Opt-in: dual-stream RNG (dual-stream-rng feature)

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 59

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

FGN — CPU vs CUDA native (`f32`, H = 0.7)

Distribution sampling — `Normal` vs upstream `rand_distr`

Opt-in: dual-stream RNG (`dual-stream-rng` feature)

Packages