Massive Text Embedding Benchmark

MTEB is a Python framework for evaluating embeddings and retrieval systems for both text and image. MTEB covers more than 1000 languages and diverse tasks, from classics like classification and clustering to use-case specialized tasks such as legal, code, or healthcare retrieval.

You can get started using mteb, check out our documentation.

Overview
📈 Leaderboard	The interactive leaderboard of the benchmark
Get Started.
🏃 Get Started	Overview of how to use mteb
🤖 Defining Models	How to use existing model and define custom ones
📋 Selecting tasks	How to select tasks, benchmarks, splits etc.
🏭 Running Evaluation	How to run the evaluations, including cache management, speeding up evaluations etc.
📊 Loading Results	How to load and work with existing model results
Overview.
📋 Tasks	Overview of available tasks
📐 Benchmarks	Overview of available benchmarks
🤖 Models	Overview of available Models
Contributing
🤖 Adding a model	How to submit a model to MTEB and to the leaderboard
👩‍💻 Adding a dataset	How to add a new task/dataset to MTEB
👩‍💻 Adding a benchmark	How to add a new benchmark to MTEB and to the leaderboard
🤝 Contributing	How to contribute to MTEB and set it up for development

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Massive Text Embedding Benchmark

Popular repositories Loading

Repositories

People

Top languages

Uh oh!

Most used topics

Uh oh!