2 unstable releases
Uses new Rust 2024
| 0.2.0 | Nov 11, 2025 |
|---|---|
| 0.1.0 | Nov 11, 2025 |
#290 in Machine learning
665KB
13K
SLoC
MLMF - Machine Learning Model Files
MLMF (Machine Learning Model Files) is a comprehensive Rust crate for working with ML model files. MLMF provides loading, saving, conversion, and dynamic mapping capabilities for transformer models across multiple formats including SafeTensors, GGUF, ONNX, PyTorch, and AWQ. It eliminates code duplication and provides a unified, efficient API for model file operations.
Features
- ποΈ Architecture Detection: Automatically detects model architecture (LLaMA, GPT-2, GPT-NeoX) from tensor names
- π¦ Multiple Formats: Comprehensive support for SafeTensors, GGUF, ONNX, PyTorch, and AWQ formats
- πΊοΈ Name Mapping: Intelligent tensor name mapping between HuggingFace and custom formats
- πΎ Memory Efficient: Memory-mapped loading for large models (30GB+)
- π§ Device Management: Automatic CUDA detection with CPU fallback
- π Progress Reporting: Optional progress callbacks for long-running operations
- π‘οΈ Type Safety: Comprehensive error handling with detailed context
- π Model Conversion: Direct format conversion with batch processing and progress tracking
Quick Start
Add mlmf to your Cargo.toml:
[dependencies]
mlmf = "0.1"
Basic Usage
use mlmf::{LoadOptions, loader};
use candle_core::{Device, DType};
// Load a LLaMA model from SafeTensors
let device = Device::cuda_if_available(0).unwrap_or(Device::Cpu);
let options = LoadOptions {
device: device.clone(),
dtype: DType::F16,
use_mmap: true,
validate_cuda: false,
progress: Some(mlmf::progress::default_progress()),
};
let loaded_model = loader::load_safetensors("./models/llama-7b", options)?;
// Access components
let var_builder = loaded_model.var_builder;
let config = loaded_model.config;
let name_mapper = loaded_model.name_mapper;
// Use name mapper to convert HF names to your format
if let Some(mapped_name) = name_mapper.map_name("model.layers.0.self_attn.q_proj.weight") {
println!("Mapped name: {}", mapped_name);
}
Architecture Detection
use mlmf::name_mapping::{TensorNameMapper, Architecture};
let tensor_names = vec![
"model.embed_tokens.weight".to_string(),
"model.layers.0.self_attn.q_proj.weight".to_string(),
"model.norm.weight".to_string(),
];
let mapper = TensorNameMapper::from_tensor_names(&tensor_names)?;
assert_eq!(mapper.architecture(), Architecture::LLaMA);
Model Conversion
use mlmf::conversion::{convert_model, ConversionFormat, ConversionOptions};
use std::path::Path;
// Convert from SafeTensors to ONNX
let options = ConversionOptions::default();
let result = convert_model(
Path::new("model.safetensors"),
Path::new("model.onnx"),
ConversionFormat::ONNX,
options,
)?;
println!("Conversion completed in {:.2}s", result.duration.as_secs_f64());
Architecture
MLMF provides a modular architecture with the following components:
loader: High-level loading APIconversion: Direct model format conversion with batch processingname_mapping: Architecture detection and tensor name mappingconfig: HuggingFace config parsing with field aliasesformats: Format-specific loaders and exporters (SafeTensors, GGUF, ONNX, PyTorch, AWQ)validation: CUDA validation and dtype checkingprogress: Progress reporting utilities
Supported Models
- LLaMA Family: LLaMA 2/3, TinyLlama, Qwen, Mistral
- GPT Family: GPT-2, GPT-J
- GPT-NeoX Family: GPT-NeoX, Pythia, StableLM
Examples
See the examples/ directory for complete working examples:
load_llama.rs- Loading LLaMA models from SafeTensorsload_gpt2.rs- Loading GPT-2 modelsload_gguf.rs- Loading quantized GGUF models
Performance
MLMF is optimized for performance:
- Memory-mapped loading: Loads 70B models (130GB) in ~10 seconds
- Architecture detection: Typically completes in <100ms
- Zero-copy: Direct tensor access without unnecessary copying
- Incremental builds: Changes compile in <10 seconds
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
This project is licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT License (LICENSE-MIT)
at your option.
Dependencies
~26β35MB
~670K SLoC