Thanks to visit codestin.com
Credit goes to github.com

Skip to content

sergeyklay/factly

Repository files navigation

Factly

CI Coverage Docs

Factly is a modern CLI tool designed to evaluate the factuality of Large Language Models (LLMs) on the Massive Multitask Language Understanding (MMLU) benchmark. It provides a robust framework for prompt engineering experiments and factual accuracy assessment.

Features

  • Evaluate LLM factuality on the MMLU benchmark with detailed results
  • Support for various prompt engineering experiments via configurable system instructions
  • Generate comparative visualizations of factuality scores across models and prompts
  • Structured output for easy analysis and comparison
  • Built with modern Python tooling (Python 3.12, uv, click, pydantic)
  • Extensible and reproducible evaluation workflows

Note

Currently, only OpenAI models are supported.

Quick Start

# Run MMLU evaluation with default settings
factly mmlu

# Run MMLU evaluation and generate plots
factly mmlu --plot

# Get help on all available options
factly mmlu --help

# Get help on all available commands
factly --help

That's it! The tool uses optimized default parameters and saves all outputs to the output directory.

Note

For detailed installation instructions, please see the Installation Guide. And for usage instructions, use cases, examples, and advanced configuration options, please see the Usage Guide.

Project Information

Factly is released under the MIT License, its documentation lives at Read the Docs, the code on GitHub, and the latest release on PyPI. It's rigorously tested on Python 3.12+.

If you'd like to contribute to Factly you're most welcome!

Support

Should you have any question, any remark, or if you find a bug, or if there is something you can't do with the Factly, please open an issue.

Packages

No packages published

Contributors 2

  •  
  •  

Languages