Evalica [ɛˈʋalit͡sa] (eh-vah-lee-tsah) is a Python library that transforms pairwise comparisons into ranked lists of items. It offers convenient high-performant Rust implementations of the corresponding methods via PyO3, and additionally provides naïve Python code for most of them. Evalica is fully compatible with NumPy arrays and pandas data frames.
The logo was created using Recraft.
Imagine that we would like to rank the different meals and have the following dataset of three comparisons produced by food experts.
| Item X | Item Y | Winner | 
|---|---|---|
| pizza | burger | x | 
| burger | sushi | y | 
| pizza | sushi | tie | 
Given this hypothetical example, Evalica takes these three columns and computes the outcome of the given pairwise comparison according to the chosen model. Note that the first argument is the column Item X, the second argument is the column Item Y, and the third argument corresponds to the column Winner.
>>> from evalica import elo, Winner
>>> result = elo(
...     ['pizza', 'burger', 'pizza'],
...     ['burger', 'sushi', 'sushi'],
...     [Winner.X, Winner.Y, Winner.Draw],
... )
>>> result.scores
pizza     1014.972058
burger     970.647200
sushi     1014.380742
Name: elo, dtype: float64As a result, we obtain Elo scores of our items. In this example, pizza was the most favoured item, sushi was the runner-up, and burger was the least preferred item.
| Item | Score | 
|---|---|
| pizza | 1014.97 | 
| burger | 970.65 | 
| sushi | 1014.38 | 
Evalica also provides a simple command-line interface, allowing the use of these methods in shell scripts and for prototyping.
$ evalica -i food.csv bradley-terry                
item,score,rank
Tacos,2.509025136024378,1
Sushi,1.1011561298265815,2
Burger,0.8549063627182466,3
Pasta,0.7403814336665869,4
Pizza,0.5718366915548537,5Refer to the food.csv file as an input example.
Evalica has a built-in Gradio application that can be launched as python3 -m evalica.gradio. Please ensure that the library was installed as pip install evalica[gradio].
| Method | In Python | In Rust | 
|---|---|---|
| Counting | ✅ | ✅ | 
| Average Win Rate | ✅ | ✅ | 
| Bradley–Terry | ✅ | ✅ | 
| Elo | ✅ | ✅ | 
| Eigenvalue | ✅ | ✅ | 
| PageRank | ✅ | ✅ | 
| Newman | ✅ | ✅ | 
Evalica is a mixed Rust/Python project that uses PyO3, so it requires setting up the Maturin build system.
To set up the environment, we recommend using the uv package manager, as demonstrated in our test suite:
$ uv venv
$ uv pip install maturin
$ source .venv/bin/activate
$ maturin develop --uv --extras dev,docs,gradioIn case uv is not available, you can use the following workaround:
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install maturin
$ maturin develop --extras dev,docs,gradioWe welcome pull requests on GitHub: https://github.com/dustalov/evalica. To contribute, fork the repository, create a separate branch for your changes, and submit a pull request.
- Ustalov, D. Reliable, Reproducible, and Really Fast Leaderboards with Evalica. 2025. Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations. 46–53. arXiv: 2412.11314 [cs.CL].
@inproceedings{Ustalov:25,
  author    = {Ustalov, Dmitry},
  title     = {{Reliable, Reproducible, and Really Fast Leaderboards with Evalica}},
  year      = {2025},
  booktitle = {Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations},
  pages     = {46--53},
  address   = {Abu Dhabi, UAE},
  publisher = {Association for Computational Linguistics},
  eprint    = {2412.11314},
  eprinttype = {arxiv},
  eprintclass = {cs.CL},
  url       = {https://aclanthology.org/2025.coling-demos.6},
  language  = {english},
}The code for replicating the experiments is available in the coling2025 directory.
Copyright (c) 2024–2025 Dmitry Ustalov. See LICENSE for details.