GGUF Quantization Docs (Unofficial)

What is GGUF quantization?

GGUF quantization is an umbrella term for an LLM quantization ecosystem that includes:

GGML (tensor library for machine learning);
llama.cpp (LLM inference engine mostly targeting CPUs);
GGUF (binary file format for storing quantized models).

GGUF quantization implements Post-Training Quantization (PTQ): given an already-trained Llama-like model in high precision, it reduces the bit width of each individual weight. The resulting checkpoint requires less memory and thus facillitates inference on consumer-grade hardware.

Who built it? Why are there no official docs?

GGUF was inspired by previous PTQ methods, including GPTQ, AWQ, QLoRA and QuIP#. But unlike most prior work that came out of research labs, the GGUF ecosystem was developed by the prolific open-source contributor Georgi Gerganov and a few others.

Writing docs and papers is simply not their priority, see this comment:

As the ecosystem rapidly grew over time, people are confused about the various algorithm iterations and settings.

What is this repository?

This repository serves as unofficial documentation for the GGUF quantization ecosystem.

It written mostly manually by a human. Any sections written by AI will be clearly flagged as ⚠️🤖.

Contributing

Contributions are more than welcome! If you find mistakes or omissions, feel free to submit a pull request.

Just a few simple rules:

Reliable references: PRs should be supported by reliable references (e.g. code and author comments from the official llama.cpp repository). Medium articles and Reddit threads don't qualify.
No AI slop: We all know when something is written by AI. Please only contribute when you have a human urge for expression.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmarks.md		benchmarks.md
commands.md		commands.md
i-quants.md		i-quants.md
importance-matrix.md		importance-matrix.md
k-quants.md		k-quants.md
legacy-quants.md		legacy-quants.md
naming.md		naming.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GGUF Quantization Docs (Unofficial)

Table of Contents

Explainers

Practical Guides

What is GGUF quantization?

Who built it? Why are there no official docs?

What is this repository?

Contributing

About

Uh oh!

Releases

Packages

License

iuliaturc/gguf-docs

Folders and files

Latest commit

History

Repository files navigation

GGUF Quantization Docs (Unofficial)

Table of Contents

Explainers

Practical Guides

What is GGUF quantization?

Who built it? Why are there no official docs?

What is this repository?

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages