15 unstable releases (3 breaking)

Uses new Rust 2024

0.4.5	Dec 1, 2025
0.4.4	Dec 1, 2025
0.4.2	Nov 30, 2025
0.3.6	Nov 30, 2025
0.1.0	Nov 23, 2025

#386 in Algorithms

BSD-3-Clause

115KB
2K SLoC

umap-rs

Fast, parallel Rust implementation of the core UMAP algorithm. Clean, modern Rust implementation focused on performance and correctness.

Scales to 264 million samples in 2 hours on a 126-core machine (see Performance).

What this is

Core UMAP dimensionality reduction algorithm
Fully parallelized via Rayon with memory-efficient sparse matrix construction
Extensible metric system (Euclidean + custom metrics)
Checkpointing and fault-tolerant training
Dense arrays only (no sparse matrix support)
Fit only (transform for new points not yet implemented)

See DIVERGENCES.md for detailed comparison to Python umap-learn.

Usage

use ndarray::Array2;
use umap::{GraphParams, Umap, UmapConfig};

// Configure UMAP parameters
let config = UmapConfig {
  n_components: 2,
  graph: GraphParams {
    n_neighbors: 15,
    ..Default::default()
  },
  ..Default::default()
};

// Create UMAP instance
let umap = Umap::new(config);

// Precompute KNN (use your favorite ANN library: pynndescent, hnswlib, etc.)
let knn_indices: Array2<u32> = /* shape (n_samples, n_neighbors) */;
let knn_dists: Array2<f32> = /* shape (n_samples, n_neighbors) */;

// Provide initialization (see Initialization section below)
// Common choices: random, PCA, or your own custom embedding
let init: Array2<f32> = /* shape (n_samples, n_components) */;

// Fit UMAP to data
let model = umap.fit(
  data.view(),
  knn_indices.view(),
  knn_dists.view(),
  init.view(),
);

// Get the embedding
let embedding = model.embedding();  // Returns ArrayView2<f32>

// Or take ownership of the embedding
let embedding = model.into_embedding();  // Returns Array2<f32>

Checkpointing

UMAP training has two phases: learning the manifold (building the graph) and optimizing the embedding (running gradient descent). The first is deterministic and expensive, the second is iterative and can be interrupted.

For long training runs, you can checkpoint the optimization and resume if interrupted:

use umap_rs::{Metric, Optimizer};

// Phase 1: Learn the manifold structure from your data
// This builds the fuzzy topological graph. It's slow but deterministic - 
// same inputs always give same manifold.
let manifold = umap.learn_manifold(data.view(), knn_indices.view(), knn_dists.view());

// Phase 2: Optimize the embedding via gradient descent
// Create an optimizer that will run 500 epochs of SGD
let metric = umap_rs::EuclideanMetric;
let mut opt = Optimizer::new(manifold, init, 500, &config, metric.metric_type());

// Train in chunks of 10 epochs at a time
while opt.remaining_epochs() > 0 {
  opt.step_epochs(10, &metric);  // Run 10 more epochs
  
  // Periodically save a checkpoint (embedding + all optimization state)
  if opt.current_epoch() % 50 == 0 {
    std::fs::write(
      format!("checkpoint_{}.bin", opt.current_epoch()),
      bincode::serialize(&opt)?
    )?;
  }
}

// Training done - convert to final lightweight model
let fitted = opt.into_fitted(config);

If your process is interrupted, load the checkpoint and continue:

// Deserialize the optimizer state from disk
let mut opt: Optimizer = bincode::deserialize(&std::fs::read("checkpoint_250.bin")?)?;

// Continue from epoch 250 to 500
while opt.remaining_epochs() > 0 {
  opt.step_epochs(10, &metric);
}

let fitted = opt.into_fitted(config);

The checkpoint contains everything: current embedding, epoch counters, and the manifold. When training completes, convert to a final model:

// Training done - drop the heavy optimization state
let fitted = opt.into_fitted(config);

// Access the embedding
let embedding = fitted.embedding();  // Zero-copy view

// Or take ownership
let embedding = fitted.into_embedding();

// Save the final model (much smaller than checkpoints)
std::fs::write("model.bin", bincode::serialize(&fitted)?)?;

The serialized FittedUmap contains just the manifold and embedding, not the optimization state, making it lightweight for long-term storage.

Initialization

You must provide your own initialization. This library is designed to be minimal and focused on the core UMAP optimization - initialization is left to the caller.

Recommended Approaches

Random initialization (simplest):

use ndarray::Array2;
use rand::Rng;

fn random_init(n_samples: usize, n_components: usize) -> Array2<f32> {
  let mut rng = rand::thread_rng();
  Array2::from_shape_fn((n_samples, n_components), |_| {
    rng.gen_range(-10.0..10.0)
  })
}

PCA initialization (recommended for better convergence):

// Use any PCA library (e.g., linfa-reduction, ndarray-stats, etc.)
// Project data to first n_components principal components
// Scale to roughly [-10, 10] range

Custom initialization:

Spectral embedding (use sparse eigensolvers like arpack-ng for large datasets)
t-SNE initialization
Pre-trained neural network embeddings
Domain-specific embeddings

Configuration

UMAP parameters are grouped logically:

Basic

use umap::UmapConfig;

let config = UmapConfig {
    n_components: 2,  // Output dimensions
    ..Default::default()
};

Manifold parameters

use umap::config::ManifoldParams;

let manifold = ManifoldParams {
    min_dist: 0.1,   // Minimum distance in embedding
    spread: 1.0,     // Scale of embedding
    a: None,         // Auto-computed from min_dist/spread
    b: None,         // Auto-computed from min_dist/spread
};

Graph construction

use umap::config::GraphParams;

let graph = GraphParams {
    n_neighbors: 15,              // Number of nearest neighbors
    local_connectivity: 1.0,      // Local neighborhood connectivity
    set_op_mix_ratio: 1.0,        // Fuzzy union (1.0) vs intersection (0.0)
    disconnection_distance: None, // Auto-computed from metric
    symmetrize: true,             // Symmetrize graph (set false to save memory)
};

The symmetrize option controls whether the fuzzy graph is symmetrized via fuzzy set union. For very large datasets, setting symmetrize: false roughly halves memory usage with minimal impact on 2D visualization quality.

Optimization

use umap::config::OptimizationParams;

let optimization = OptimizationParams {
    n_epochs: None,           // Auto-determined from dataset size
    learning_rate: 1.0,       // SGD learning rate
    negative_sample_rate: 5,  // Negative samples per positive
    repulsion_strength: 1.0,  // Weight for negative samples
};

Complete example

let config = UmapConfig {
    n_components: 3,
    manifold: ManifoldParams {
        min_dist: 0.05,
        ..Default::default()
    },
    graph: GraphParams {
        n_neighbors: 30,
        ..Default::default()
    },
    optimization: OptimizationParams {
        n_epochs: Some(500),
        ..Default::default()
    },
};

Custom distance metrics

Implement the Metric trait:

use umap::Metric;
use ndarray::{Array1, ArrayView1};

#[derive(Debug)]
struct MyMetric;

impl Metric for MyMetric {
    fn distance(&self, a: ArrayView1<f32>, b: ArrayView1<f32>) -> (f32, Array1<f32>) {
        // Return (distance, gradient)
        // gradient = ∂distance/∂a
        todo!()
    }

    // Optional: provide fast squared distance for optimization
    fn squared_distance(&self, a: ArrayView1<f32>, b: ArrayView1<f32>) -> Option<f32> {
        None  // Return Some(dist_sq) if available
    }
}

// Use custom metric
let umap = Umap::with_metrics(
    config,
    Box::new(MyMetric),       // Input space metric
    Box::new(EuclideanMetric), // Output space metric
);

See src/distances.rs for the Euclidean implementation example.

Build

cargo build --release

Performance

This implementation is designed for large-scale datasets (100M+ samples) on high-core-count machines.

Real-world benchmark

264 million samples embedded to 2D in 2 hours on a 126-core AMD EPYC 9J45 with 1.4 TB RAM:

Precomputed KNN (n_neighbors=100)
Precomputed PCA initialization
Symmetrization disabled
~1 TB peak memory

Parallelization

Every phase is fully parallelized via Rayon:

Graph construction: Parallel smooth KNN distance, parallel CSR matrix construction
Set operations: Parallel CSC structure building, parallel symmetrization
Optimizer initialization: Parallel edge filtering, parallel epoch scheduling
SGD optimization: Lock-free Hogwild! algorithm for parallel gradient descent

Memory Efficiency

Optimized for minimal memory footprint at scale:

Direct CSR construction: Builds sparse matrices in-place without intermediate COO/triplet format. Avoids O(nnz) temporary allocations and O(nnz log nnz) global sorting.
u32 indices: Uses 4-byte indices instead of 8-byte, halving index memory for datasets up to ~4B samples.
CSC structure-only: Transpose operations store only structure (indptr + indices), looking up values in original CSR via O(log k) binary search.
Sequential array allocation: Large arrays are allocated one at a time to avoid memory spikes.
No cloning: Avoids sequential .clone() on large arrays; uses parallel copies when needed.

Scaling Guidelines

Memory scales with n_samples × n_neighbors:

n_samples	n_neighbors	Approx. Memory
10M	30	~10 GB
100M	30	~100 GB
250M	30	~250 GB
250M	256	~2 TB

To reduce memory:

Use smaller n_neighbors (15-50 is typical for visualization)
Disable symmetrization: config.graph.symmetrize = false
Slice KNN arrays to use fewer neighbors than computed

Configuration for Large Datasets

let config = UmapConfig {
    graph: GraphParams {
        n_neighbors: 30,      // Lower = less memory
        symmetrize: false,    // Skip symmetrization to save memory
        ..Default::default()
    },
    ..Default::default()
};

Timing Logs

The library emits structured logs via the tracing crate. Enable a subscriber to see timing for each phase:

tracing_subscriber::fmt::init();  // or your preferred subscriber

Example output:

INFO umap_rs::umap::fuzzy_simplicial_set: smooth_knn_dist complete duration_ms=52033
INFO umap_rs::umap::fuzzy_simplicial_set: csr row_counts complete duration_ms=48495
INFO umap_rs::umap::fuzzy_simplicial_set: csr indptr complete duration_ms=560 nnz=62586367074
INFO umap_rs::optimizer: optimizer edge filtering complete duration_ms=725 total_edges=23276942679

Advanced: Accessing the Graph

The fuzzy simplicial set graph is exposed as SparseMat (a CsMatI<f32, u32, usize>):

use umap_rs::SparseMat;

let manifold = umap.learn_manifold(data.view(), knn_indices.view(), knn_dists.view());
let graph: &SparseMat = manifold.graph();
// graph uses u32 column indices (memory efficient) and usize row pointers (handles large nnz)

Documentation

UMAP.md - How UMAP works (algorithm explanation)
DIVERGENCES.md - Differences from Python umap-learn
AGENTS.md - Developer notes

Run cargo doc --open to browse the API documentation.

Design principles

Minimal - Core algorithm only, no feature creep
Fast - Parallel by default, zero-copy where possible
Explicit - Caller provides KNN, initialization, etc.
Rust-native - Idiomatic patterns, not Python translations

Limitations

Maximum ~4 billion samples: Uses u32 indices internally for memory efficiency
No input validation (assumes clean data)
Transform not yet implemented
Dense arrays only
Panics on invalid input (not Result-based errors)
Requires external KNN computation and initialization

KNN Sentinel Values

If your KNN search couldn't find k neighbors for some points (e.g., isolated points), use u32::MAX as a sentinel index and any distance value (commonly f32::INFINITY). These entries are automatically skipped during graph construction:

// Point 5 only has 2 real neighbors, rest are sentinels
knn_indices[[5, 0]] = 10;           // real neighbor
knn_indices[[5, 1]] = 23;           // real neighbor  
knn_indices[[5, 2]] = u32::MAX;     // sentinel - skipped
knn_indices[[5, 3]] = u32::MAX;     // sentinel - skipped

This is a specialized tool for the core algorithm. Wrap it in validation/error handling for production use.

License

BSD-3-Clause (see LICENSE file)

References

Original paper: McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426
Hogwild! SGD: Recht, B., et al. (2011). Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. NIPS 2011
Python umap-learn: https://github.com/lmcinnes/umap

Dependencies

~14–19MB
~273K SLoC