PMetal

Powdered Metal — High-performance LLM fine-tuning framework for Apple Silicon, written in Rust.

PMetal is a machine learning framework that brings Unsloth-style optimizations to macOS. It leverages custom Metal shaders and the MLX framework to achieve state-of-the-art training throughput on Apple Silicon GPUs.

Quick Start

Installation

# Clone the repository
git clone https://github.com/epistates/pmetal.git
cd pmetal

# Build in release mode
cargo build --release

Fine-tune a Model

# LoRA fine-tuning with auto-detected max-seq-len and sequence packing
./target/release/pmetal train \
  --model qwen/Qwen3-0.6B-Base \
  --dataset path/to/train.jsonl \
  --output ./output \
  --lora-r 16 \
  --batch-size 4 \
  --learning-rate 2e-4

Run Reasoning Inference

# Inference with thinking mode enabled
./target/release/pmetal infer \
  --model qwen/Qwen3-0.6B-Base \
  --lora ./output/lora_weights.safetensors \
  --prompt "Does absolute truth exist?" \
  --chat \
  --show-thinking

Architecture

PMetal is organized as a Rust workspace with 15 specialized crates:

pmetal/
├── pmetal-core         # Foundation: configs, traits, types
├── pmetal-metal        # Custom Metal GPU kernels
├── pmetal-mlx          # MLX backend integration (KV cache, RoPE, etc.)
├── pmetal-models       # LLM architectures (Llama, Qwen, DeepSeek, etc.)
├── pmetal-lora         # LoRA/QLoRA training implementations
├── pmetal-trainer      # Training loops (SFT, DPO, GRPO)
├── pmetal-data         # Dataset loading and preprocessing
├── pmetal-hub          # HuggingFace Hub integration
├── pmetal-distill      # Knowledge distillation
├── pmetal-merge        # Model merging (SLERP, TIES, DARE)
├── pmetal-gguf         # GGUF format with imatrix quantization
├── pmetal-mhc          # Manifold-Constrained Hyper-Connections
├── pmetal-distributed  # Distributed training support
├── pmetal-vocoder      # BigVGAN neural vocoder
└── pmetal-cli          # Command-line interface

Dependency Graph

                    ┌─────────────────┐
                    │  pmetal-cli   │
                    └────────┬────────┘
                             │
         ┌───────────────────┼───────────────────┐
         │                   │                   │
         ▼                   ▼                   ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ pmetal-trainer│ │ pmetal-lora   │ │ pmetal-data   │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
         │                   │                   │
         └───────────────────┼───────────────────┘
                             │
         ┌───────────────────┼───────────────────┐
         │                   │                   │
         ▼                   ▼                   ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ pmetal-models │ │  pmetal-mlx   │ │ pmetal-metal  │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
         │                   │                   │
         └───────────────────┼───────────────────┘
                             │
                             ▼
                    ┌─────────────────┐
                    │  pmetal-core  │
                    └─────────────────┘

Supported Models

Family	Variants	LoRA	QLoRA	Full FT
Llama	2, 3, 3.1, 3.2, 3.3	✓	✓	✓
Llama 4	Scout, Maverick	✓	-	✓
Qwen	2, 2.5, 3, 3-MoE	✓	-	✓
DeepSeek	V3, V3.2, V3.2-Speciale	✓	-	✓
Mistral	7B, 8x7B	✓	✓	✓
Gemma	2, 3	✓	-	✓
Phi	3, 4	✓	-	✓
Cohere	Command R	✓	-	✓
Granite	3.0, 3.1	✓	-	✓
NemotronH	Hybrid (Mamba+Attention)	✓	-	✓
StarCoder2	3B, 7B, 15B	✓	-	✓
RecurrentGemma	Griffin	✓	-	✓
Jamba	1.5	✓	-	✓
GPT-OSS	20B, 120B	✓	-	-

Vision & Multimodal Models (In Progress)

Architecture implementations exist but are not yet integrated into the CLI dispatcher.

Family	Variants	Status
Pixtral	12B	Architecture implemented
Qwen2-VL	2B, 7B	Architecture implemented
MLlama	3.2-Vision	Architecture implemented
CLIP	ViT-L/14	Architecture implemented
Whisper	Base, Small, Medium, Large	Architecture implemented

Diffusion Models (Experimental)

Family	Variants	Status
Flux	1-dev, 1-schnell	Dispatcher + pipeline implemented

Training Methods

Supervised Fine-Tuning (SFT): Standard next-token prediction
LoRA: Low-Rank Adaptation with configurable rank and alpha
QLoRA: 4-bit quantized base weights with LoRA adapters
DoRA: Weight-Decomposed Low-Rank Adaptation
DPO: Direct Preference Optimization for RLHF
GRPO: Group Relative Policy Optimization
DAPO: Decoupled Clip and Dynamic Sampling Policy Optimization
GSPO: Group Sequence Policy Optimization (fixes GRPO length bias)
PPO: Proximal Policy Optimization
ORPO: Odds Ratio Preference Optimization (reference-free)
SimPO: Simple Preference Optimization
KTO: Kahneman-Tversky Optimization (unpaired preference data)
Online DPO: Online Direct Preference Optimization with reward models
Diffusion Training: LLaDA-style masked diffusion for language models

Key Features

Metal Optimizations

Custom Metal shaders provide significant speedups:

FlashAttention: O(n) memory attention with fused softmax
Fused LoRA: Combined forward pass for adapter layers
Fused Cross-Entropy: Unsloth-style chunked loss computation
Fused RoPE: Rotary position embeddings in-kernel
Fused Sampler: JIT-compiled token sampling

Sequence Packing

Efficiently pack multiple sequences into single batches:

--use-sequence-packing  # Enable packing (99.7% efficiency)
--max-seq-len 2048      # Maximum packed sequence length

Gradient Checkpointing

Trade compute for memory on large models:

--gradient-checkpointing  # Enable memory-efficient training

Dataset Formats

Supported formats for training data:

ShareGPT (conversations):

{"conversations": [{"from": "human", "value": "..."}, {"from": "gpt", "value": "..."}]}

Alpaca (instruction):

{"instruction": "...", "input": "...", "output": "..."}

Messages (chat):

{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

Configuration

Training Parameters

Parameter	Default	Description
`--lora-r`	16	LoRA rank
`--lora-alpha`	32.0	LoRA scaling factor (2x rank)
`--batch-size`	4	Micro-batch size
`--learning-rate`	2e-4	Learning rate
`--max-seq-len`	0	Max seq len (0 = auto-detect)
`--epochs`	1	Number of training epochs
`--max-grad-norm`	1.0	Gradient clipping

Inference Parameters

Parameter	Default	Description
`--temperature`	Model default	Sampling temperature
`--top-k`	Model default	Top-k sampling
`--top-p`	Model default	Nucleus sampling
`--max-tokens`	256	Maximum generation length
`--repetition-penalty`	1.0	Repetition penalty

Development

Building

# Debug build
cargo build

# Release build with optimizations
cargo build --release

# Run tests
cargo test --all

# Run clippy
cargo clippy --all

Adding a New Model Architecture

Implement the CausalLMModel trait in pmetal-models
Add architecture detection in dispatcher.rs
Create LoRA wrapper in pmetal-lora if needed
Update the model registry

Benchmarks

Run the included benchmarks:

# FFI overhead benchmark
cargo bench --bench ffi_overhead

Troubleshooting

Metal Toolchain Missing

If you see "cannot execute tool 'metal'":

xcodebuild -downloadComponent MetalToolchain

Out of Memory

Try these options:

Reduce --batch-size
Enable --gradient-checkpointing
Use --use-sequence-packing for variable-length data
Reduce --max-seq-len

License

Licensed under either of:

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Acknowledgments

MLX - Apple's machine learning framework
mlx-rs - Rust bindings for MLX
Unsloth - Inspiration for fused kernel optimizations
HuggingFace - Model hub and tokenizers

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github/workflows		.github/workflows
benches		benches
crates		crates
data		data
docs		docs
examples		examples
fuzz		fuzz
output		output
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
SECURITY.md		SECURITY.md
justfile		justfile
metrics.jsonl		metrics.jsonl

License

Licenses found

Epistates/pmetal

Folders and files

Latest commit

History

Repository files navigation

PMetal

Quick Start

Installation

Fine-tune a Model

Run Reasoning Inference

Architecture

Dependency Graph

Supported Models

Vision & Multimodal Models (In Progress)

Diffusion Models (Experimental)

Training Methods

Key Features

Metal Optimizations

Sequence Packing

Gradient Checkpointing

Dataset Formats

Configuration

Training Parameters

Inference Parameters

Development

Building

Adding a New Model Architecture

Benchmarks

Troubleshooting

Metal Toolchain Missing

Out of Memory

License

Acknowledgments

About

Topics

Resources

License

Licenses found

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 1

Languages

Packages