gemma3.c

gemma3.c is a from‑scratch CPU inference engine for the Gemma 3 4B IT model.

✨ Highlights

⚙️ 100% Pure C (C11) – zero external dependencies
🧠 Full Gemma 3 architecture – GQA, hybrid attention, SwiGLU
🗺️ Memory‑mapped weights – BF16 SafeTensors via mmap
🔤 Native SentencePiece tokenizer – 262K vocab
🌊 Streaming output – token‑by‑token callbacks
💬 Interactive chat mode
📦 CLI + Library API
🐧 Linux/macOS native, 🪟 Windows via WSL (recommended) or MinGW
🔗 OpenBLAS support (optional) – BLAS-accelerated matrix operations
🧵 Multi-threaded inference – Thread pool for parallel computation

🚀 Quick Start

⚠️ POSIX‑first: native on Linux/macOS. On Windows use WSL or MinGW (no mmap).

1️⃣ Download model

export HF_TOKEN=your_token_here
pip install huggingface_hub
python download_model.py

2️⃣ Build

make

3️⃣ Run

# Single prompt
./gemma3 -m ./gemma-3-4b-it -p "Explain quantum computing simply."

OpenBLAS builds: make blas and make blas-threads require OpenBLAS:

Linux: sudo apt install libopenblas-dev

macOS: brew install openblas

📥 Model Download

The included Python script:

Handles HuggingFace auth
Downloads all shards
Resumes broken downloads
Verifies integrity

python download_model.py --token YOUR_HF_TOKEN

Manual alternatives: huggingface-cli or git lfs.

🛠️ Build Targets

make              # Release build (default)
make debug        # Debug symbols
make fast         # Native optimizations (-march=native -ffast-math)
make threads      # Thread pool parallelization
make blas         # OpenBLAS acceleration (requires libopenblas)
make blas-threads # OpenBLAS + threads (best performance)
make clean        # Remove build artifacts
make help         # Show all targets

🧪 CLI Options

-m <path>    Model directory
-p <text>    Prompt
-i           Interactive mode
-s <text>    System prompt
-n <n>       Max tokens
-t <f>       Temperature
-k <n>       Top‑k
--top-p <f>  Top‑p
-c <n>       Context size
--seed <n>   RNG seed
-v           Verbose

📚 Library Example

gemma3_ctx *ctx = gemma3_load_dir("./gemma-3-4b-it");

gemma3_gen_params params = gemma3_default_params();
char *out = gemma3_generate(ctx, "Hello!", &params, NULL, NULL);
printf("%s\n", out);
free(out);

gemma3_free(ctx);

🧠 Model Specs

Param	Value
Vocab	262,208
Layers	34
Hidden	2,560
Heads	8 (4 KV, GQA)
Context	128K
Pattern	5 local : 1 global

💾 Memory

Weights: ~8 GB on disk (BF16)
Runtime RAM: ~3 GB total

Reduce usage:

./gemma3 -m ./gemma-3-4b-it -c 512 -p "Hello"

⚡ Performance (CPU)

Prefill: ~2–5 tok/s
Generation: ~1–3 tok/s

For better performance:

make fast          # Single-threaded with native optimizations
make threads       # Multi-core parallelization
make blas-threads  # Best performance (requires OpenBLAS)

⚠️ Limitations

CPU only
Text only
No quantization (yet)

🪪 License

MIT License. Model weights under Google’s Gemma license.

If you ever wanted to see Gemma 3 breathe in pure C, this is it.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
download_model.py		download_model.py
gemma3.c		gemma3.c
gemma3.h		gemma3.h
gemma3_kernels.c		gemma3_kernels.c
gemma3_kernels.h		gemma3_kernels.h
gemma3_safetensors.c		gemma3_safetensors.c
gemma3_threads.c		gemma3_threads.c
gemma3_threads.h		gemma3_threads.h
gemma3_tokenizer.c		gemma3_tokenizer.c
gemma3_transformer.c		gemma3_transformer.c
main.c		main.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gemma3.c

✨ Highlights

🚀 Quick Start

1️⃣ Download model

2️⃣ Build

3️⃣ Run

📥 Model Download

🛠️ Build Targets

🧪 CLI Options

📚 Library Example

🧠 Model Specs

💾 Memory

⚡ Performance (CPU)

⚠️ Limitations

🪪 License

About

Uh oh!

Releases

Languages

robitec97/gemma3.c

Folders and files

Latest commit

History

Repository files navigation

gemma3.c

✨ Highlights

🚀 Quick Start

1️⃣ Download model

2️⃣ Build

3️⃣ Run

📥 Model Download

🛠️ Build Targets

🧪 CLI Options

📚 Library Example

🧠 Model Specs

💾 Memory

⚡ Performance (CPU)

⚠️ Limitations

🪪 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Languages