Crispy reranking models from Mixedbread. State-of-the-art models for search relevance, powered by reinforcement learning.
- State-of-the-art performance - Outperforms leading open and closed-source rerankers on major benchmarks
- 100+ languages - Strong multilingual support out of the box
- Long context - Handle up to 8k tokens (32k-compatible)
- Code & SQL - Excellent at ranking code snippets and technical content
- Function Call Ranking - Supports reranking of function calls for multi-tool agents
- Fast inference - 8x faster than comparable models
- Easy integration - Drop-in improvement for existing search systems
- Open source - Apache 2.0-licensed, easy to customize
- Managed API - For production use with additional features. We support embeddings, reranking, and an end-to-end multi-modal retrieval solution.
pip install -U mxbai-rerankfrom mxbai_rerank import MxbaiRerankV2
# Initialize the reranker
reranker = MxbaiRerankV2("mixedbread-ai/mxbai-rerank-base-v2") # or large-v2
# Example query and documents
query = "Who wrote 'To Kill a Mockingbird'?"
documents = [
"'To Kill a Mockingbird' is a novel by Harper Lee published in 1960.",
"The novel 'Moby-Dick' was written by Herman Melville.",
"Harper Lee was born in 1926 in Monroeville, Alabama."
]
results = reranker.rank(query=query, documents=documents)
print(results)We offer multiple model variants. For more details, see our mxbai-rerank-v2 technical blog post.
- mxbai-rerank-base-v2 (0.5B) - Best balance of speed and accuracy
- mxbai-rerank-large-v2 (1.5B) - Highest accuracy, still with excellent speed
For more details, see our mxbai-rerank-v1 technical blog post.
- mxbai-rerank-xsmall-v1 (0.1B) - Fastest inference, lower accuracy
- mxbai-rerank-base-v1 (0.2B) - Smaller, faster model
- mxbai-rerank-large-v1 (1.5B) - Large model with highest accuracy
| Model | BEIR Avg | Multilingual | Chinese | Code Search | Latency (s) |
|---|---|---|---|---|---|
| mxbai-rerank-large-v2 | 57.49 | 29.79 | 84.16 | 32.05 | 0.89 |
| mxbai-rerank-base-v2 | 55.57 | 28.56 | 83.70 | 31.73 | 0.67 |
| mxbai-rerank-large-v1 | 49.32 | 21.88 | 72.53 | 30.72 | 2.24 |
*Latency measured on A100 GPU
The v2 models automatically use Flash Attention 2 when available for faster inference:
pip install flash-attn --no-build-isolationreranker = MxbaiRerankV2(
"mixedbread-ai/mxbai-rerank-base-v2",
max_length=8192 # Default, can be adjusted up to model limits (32k for v2 models)
)results = reranker.rank(query=query, documents=documents, instruction="Figure out the best code snippet for the user query.")For managed API access with additional features, such as object reranking and instructions:
from mixedbread import Mixedbread
mxbai = Mixedbread(api_key="YOUR_API_KEY")
results = mxbai.rerank(
model="mixedbread-ai/mxbai-rerank-large-v2",
query="your query",
input=["doc1", "doc2", "doc3"]
)The models were trained using a three-step process:
- GRPO (Guided Reinforcement Prompt Optimization)
- Contrastive Learning
- Preference Learning
For more details, check our technical blog post or preprint paper.
Paper following soon.
If you use this work, please cite:
@article{li2025prorank,
title={ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Reranking},
author={Li, Xianming and Shakir, Aamir and Huang, Rui and Lipp, Julius and Li, Jing},
journal={arXiv preprint arXiv:2506.03487},
year={2025}
}This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a pull request or report an issue on GitHub.