Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View tomaarsen's full-sized avatar

Organizations

@nltk @huggingface @embeddings-benchmark @Hugging-Face-Helping-Hand

Block or report tomaarsen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

👷 Build compute kernels

Nix 213 34 Updated Jan 16, 2026

Fast BM25 search engine with category theory abstractions

Python 7 Updated Jan 14, 2026
Python 24 9 Updated May 19, 2025
Python 67 6 Updated Dec 12, 2025

Mutlimodal reranker training and benchmarks

Python 4 Updated Dec 1, 2025

HSEB: Hybrid Search Engine Benchmark

Python 20 2 Updated Oct 5, 2025

Nearly Inference Free Embeddings: make your RAG queries 500x faster

Python 69 4 Updated Nov 16, 2025

GenAI Agent Framework, the Pydantic way

Python 14,318 1,546 Updated Jan 17, 2026

Fast Diversification for Search & Retrieval

Python 462 27 Updated Nov 17, 2025

Pruna is a model optimization framework built for developers, enabling you to deliver faster, more efficient models with minimal overhead.

Python 1,073 77 Updated Jan 16, 2026

Post-training with Tinker

Python 2,739 296 Updated Jan 17, 2026

Connect any LLM to your internal knowledge sources and chat with it in real time alongside your team. OSS alternative to NotebookLM, Perplexity, and Glean. Join our Discord: https://discord.gg/ejRN…

Python 12,479 1,082 Updated Jan 16, 2026

Swift Package to implement a transformers-like API in Swift

Swift 1,233 159 Updated Jan 6, 2026
Python 2,260 199 Updated Nov 29, 2025

Parkiet is a 1.6B parameter Dutch text-to-speech model (TTS)

Python 56 3 Updated Sep 30, 2025

Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting irrelevant tokens from its vocabulary. This repository contain…

Python 61 5 Updated Oct 25, 2024

Sparse Embedding Compression for Scalable Retrieval in Recommender Systems

Python 33 2 Updated Nov 21, 2025

A tool for generating embeddings of classes organized into an an ontology

HTML 7 3 Updated Oct 6, 2025

💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

Python 12,030 765 Updated Jan 16, 2026

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…

Python 36,203 5,104 Updated Jan 9, 2026

Build, enrich, and transform datasets using AI models with no code

TypeScript 1,616 137 Updated Oct 23, 2025

A massively multilingual modern encoder language model

Python 123 9 Updated Oct 13, 2025

RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 8,645 766 Updated Jan 16, 2026

Knowledgeable Embedding: Injecting dynamically updatable entity knowledge into embeddings to enhance RAG

Python 14 Updated Aug 31, 2025
Python 721 47 Updated Nov 30, 2025

Embedding Atlas is a tool that provides interactive visualizations for large embeddings. It allows you to visualize, cross-filter, and search embeddings and metadata.

TypeScript 4,531 252 Updated Jan 13, 2026

Scalable, fast, and disk-friendly vector search in Postgres, the successor of pgvecto.rs.

Rust 1,499 53 Updated Jan 10, 2026

Data transformation framework for AI. Ultra performant, with incremental processing. 🌟 Star if you like it!

Rust 5,862 432 Updated Jan 17, 2026

Decase any tokenizer

Python 5 1 Updated Aug 1, 2025
Next