tomaarsen

Tom Aarsen tomaarsen

Sentence Transformers, SetFit & NLTK maintainer, ML Engineer & Fellow @ 🤗Hugging Face

902 followers · 0 following

Achievements

x4 x3 x3

Achievements

x4 x3 x3

Highlights

1 security advisory credit

Organizations

Starred repositories

huggingface / kernel-builder

👷 Build compute kernels

Nix 213 34 Updated Jan 16, 2026

aiexplorations / vajra_bm25

Fast BM25 search engine with category theory abstractions

Python 7 Updated Jan 14, 2026

xhluca / bm25-benchmarks

Python 24 9 Updated May 19, 2025

datologyai / luxical

Python 67 6 Updated Dec 12, 2025

UlrickBL / multimodal_reranker

Mutlimodal reranker training and benchmarks

Python 4 Updated Dec 1, 2025

hseb-benchmark / hseb

HSEB: Hybrid Search Engine Benchmark

Python 20 2 Updated Oct 5, 2025

stephantul / pynife

Nearly Inference Free Embeddings: make your RAG queries 500x faster

Python 69 4 Updated Nov 16, 2025

pydantic / pydantic-ai

GenAI Agent Framework, the Pydantic way

Python 14,318 1,546 Updated Jan 17, 2026

Pringled / pyversity

Fast Diversification for Search & Retrieval

Python 462 27 Updated Nov 17, 2025

PrunaAI / pruna

Pruna is a model optimization framework built for developers, enabling you to deliver faster, more efficient models with minimal overhead.

Python 1,073 77 Updated Jan 16, 2026

thinking-machines-lab / tinker-cookbook

Post-training with Tinker

Python 2,739 296 Updated Jan 17, 2026

MODSetter / SurfSense

Connect any LLM to your internal knowledge sources and chat with it in real time alongside your team. OSS alternative to NotebookLM, Perplexity, and Glean. Join our Discord: https://discord.gg/ejRN…

Python 12,479 1,082 Updated Jan 16, 2026

huggingface / swift-transformers

Swift Package to implement a transformers-like API in Swift

Swift 1,233 159 Updated Jan 6, 2026

Mega4alik / ollm

Python 2,260 199 Updated Nov 29, 2025

pevers / parkiet

Parkiet is a 1.6B parameter Dutch text-to-speech model (TTS)

Python 56 3 Updated Sep 30, 2025

asahi417 / lm-vocab-trimmer

Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting irrelevant tokens from its vocabulary. This repository contain…

Python 61 5 Updated Oct 25, 2024

recombee / CompresSAE

Sparse Embedding Compression for Scalable Retrieval in Recommender Systems

Python 33 2 Updated Nov 21, 2025

david4096 / on2vec

A tool for generating embeddings of classes organized into an an ontology

HTML 7 3 Updated Oct 6, 2025

thinking-machines-lab / batch_invariant_ops

Python 949 71 Updated Nov 4, 2025

neuml / txtai

💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

Python 12,030 765 Updated Jan 16, 2026

huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…

Python 36,203 5,104 Updated Jan 9, 2026

huggingface / aisheets

Build, enrich, and transform datasets using AI models with no code

TypeScript 1,616 137 Updated Oct 23, 2025

JHU-CLSP / mmBERT

A massively multilingual modern encoder language model

Python 123 9 Updated Oct 13, 2025

yichuan-w / LEANN

RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 8,645 766 Updated Jan 16, 2026

knowledgeable-embedding / knowledgeable-embedding

Knowledgeable Embedding: Injecting dynamically updatable entity knowledge into embeddings to enhance RAG

Python 14 Updated Aug 31, 2025

NVlabs / Jet-Nemotron

Python 721 47 Updated Nov 30, 2025

apple / embedding-atlas

Embedding Atlas is a tool that provides interactive visualizations for large embeddings. It allows you to visualize, cross-filter, and search embeddings and metadata.

TypeScript 4,531 252 Updated Jan 13, 2026

tensorchord / VectorChord

Scalable, fast, and disk-friendly vector search in Postgres, the successor of pgvecto.rs.

Rust 1,499 53 Updated Jan 10, 2026

cocoindex-io / cocoindex

Data transformation framework for AI. Ultra performant, with incremental processing. 🌟 Star if you like it!

Rust 5,862 432 Updated Jan 17, 2026

stephantul / tokenizer-decasing

Decase any tokenizer

Python 5 1 Updated Aug 1, 2025