Stars
ClinVec: Unified Embeddings of Clinical Codes Enable Knowledge-Grounded AI in Medicine
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Extracting and Exploring Adjacency-Induced Correlations from Image Data
Lets make video diffusion practical!
Active Learning for Text Classification in Python
KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning and factual Q&A solutions for professional domain knowledge ba…
InstructLab Core package. Use this to chat with a model and execute the InstructLab workflow to train a model using custom taxonomy data.
Anafora is a web-based raw text annotation tool
An open-source RAG-based tool for chatting with your documents.
AAAI 2025: Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs
Implementation of papers in 100 lines of code.
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"
A massively parallel, high-level programming language
A Fast, Adaptive, Stable, and Transferable Topic Model (NeurIPS 2024)
A Topic Modeling System Toolkit (ACL 2024 Demo)
Aligned Neural Topic Model (ANTM) for Exploring Evolving Topics: a dynamic neural topic model that uses document embeddings (data2vec) to compute clusters of semantically similar documents at diffe…
Superfast CUDA implementation of Word2Vec and Latent Dirichlet Allocation (LDA)
Treeffuser is an easy-to-use package for probabilistic prediction and probabilistic regression on tabular data with tree-based diffusion models.
Bayesian Analysis with Python (Second Edition)
"DeepDPM: Deep Clustering With An Unknown Number of Clusters" [Ronen, Finder, and Freifeld, CVPR 2022]
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
Fast and memory-efficient exact attention