UzLiB is the first comprehensive multiple-choice question benchmark designed to evaluate the linguistic abilities of Large Language Models (LLMs) in the Uzbek language. It measures how well AI models understand correct Uzbek language forms, usage, and nuances.
For a detailed background on the motivation and creation process, please refer to our blog post (in Uzbek).
UzLiB questions were sourced from popular quizzes on the following Telegram channels specializing in Uzbek linguistics:
Each question underwent manual verification and standardization to ensure quality and consistency. The dataset includes conversion to Latin script and shuffling of answer choices.
Original (raw, pre-standardization) and processed versions of the benchmark are available in the data/ folder.
The results presented in the leaderboard were obtained using a consistent prompt template and standard generation parameters (temperature=1.0, top-p=0.95) across all models, reflecting typical usage scenarios. Evaluation scripts and model outputs are provided for transparency.
- Python 3.8+
- API keys for relevant LLM services (if applicable).
- Access to models you wish to evaluate.
- 
Clone the repository: git clone https://github.com/tahrirchi/uzlib.git cd uzlib/
- 
Install dependencies: pip install -r requirements.txt 
- 
Set up environment variables: cp .env.sample .env # Edit .env with your API keys and service endpoints
To evaluate a specific model:
python run_uzlib.py --model_name MODEL_NAMESupported models for leaderboard replication are listed in utils.py.
For efficient local evaluation of compatible open-source models (e.g., Mistral, Llama, Gemma families):
- 
Start the vLLM OpenAI-Compatible Server: vllm serve MODEL_NAME --api-key token-abc123 For the behbudiy/Llama-3.1-8B-Instuct-Uzmodel, use this special command:vllm serve behbudiy/Llama-3.1-8B-Instuct-Uz --api-key token-abc123 --chat-template "{% for message in messages %}{{'<|begin_of_text|>' if loop.first else ''}}<|start_header_id|>{{ message.role }}<|end_header_id|>\n\n{{ message.content }}\n\n<|eot_id|>{% endfor %}{% if add_generation_prompt %}<|start_header_id|>assistant<|end_header_id|>\n\n{% endif %}"For the behbudiy/Mistral-Nemo-Instruct-Uzmodel, use this special command:vllm serve behbudiy/Mistral-Nemo-Instruct-Uz --api-key token-abc123 --tokenizer_mode mistral --config_format mistral --load_format mistral For the bxod/Llama-3.2-1B-Instruct-uzandbxod/Llama-3.2-3B-Instruct-uzmodels, use this special command:vllm serve bxod/Llama-3.2-3B-Instruct-uz --api-key token-abc123 --chat-template "{% for message in messages %}{% if message['role'] == 'system' %}<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n{{ message['content'] }}<|eot_id|>{% elif message['role'] == 'user' %}<|start_header_id|>user<|end_header_id|>\n{{ message['content'] }}<|eot_id|>{% elif message['role'] == 'assistant' %}<|start_header_id|>assistant<|end_header_id|>\n{{ message['content'] }}<|eot_id|>{% endif %}{% endfor %}{% if add_generation_prompt %}<|start_header_id|>assistant<|end_header_id|>\n{% endif %}"
- 
Start evaluating deployed model. 
After running evaluations (outputs are stored in artifacts/), update the leaderboard:
python generate_leaderboard.pyContributions are welcome! To evaluate new models:
- Add the model configuration in utils.py.
- Implement or adjust the client interaction logic if necessary.
- Run the evaluation and consider submitting results via a Pull Request.
If you use UzLiB in your work, please cite it as follows:
@misc{Shopulatov2025UzLiB,
      title={{UzLiB: A Benchmark for Evaluating LLMs on Uzbek Linguistics}},
      author={Abror Shopulatov},
      year={2025},
      howpublished={\url{https://huggingface.co/datasets/tahrirchi/uzlib}},
      note={Accessed: YYYY-MM-DD} % Update with access date
}(Please update the note field with the date you accessed the resource.)
For inquiries regarding the benchmark or code, please contact [email protected].