Thanks to visit codestin.com
Credit goes to github.com

Skip to content

iuliaturc/gguf-docs

Repository files navigation

GGUF Quantization Docs (Unofficial)

Table of Contents

Explainers

Practical Guides

What is GGUF quantization?

GGUF quantization is an umbrella term for an LLM quantization ecosystem that includes:

  • GGML (tensor library for machine learning);
  • llama.cpp (LLM inference engine mostly targeting CPUs);
  • GGUF (binary file format for storing quantized models).

GGUF quantization implements Post-Training Quantization (PTQ): given an already-trained Llama-like model in high precision, it reduces the bit width of each individual weight. The resulting checkpoint requires less memory and thus facillitates inference on consumer-grade hardware.

Who built it? Why are there no official docs?

GGUF was inspired by previous PTQ methods, including GPTQ, AWQ, QLoRA and QuIP#. But unlike most prior work that came out of research labs, the GGUF ecosystem was developed by the prolific open-source contributor Georgi Gerganov and a few others.

Writing docs and papers is simply not their priority, see this comment:

No Papers

As the ecosystem rapidly grew over time, people are confused about the various algorithm iterations and settings.

What is this repository?

This repository serves as unofficial documentation for the GGUF quantization ecosystem.

It written mostly manually by a human. Any sections written by AI will be clearly flagged as ⚠️🤖.

Contributing

Contributions are more than welcome! If you find mistakes or omissions, feel free to submit a pull request.

Just a few simple rules:

  • Reliable references: PRs should be supported by reliable references (e.g. code and author comments from the official llama.cpp repository). Medium articles and Reddit threads don't qualify.
  • No AI slop: We all know when something is written by AI. Please only contribute when you have a human urge for expression.

About

Docs for GGUF quantization (unofficial)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published