Stars
MiroTrain is an efficient and algorithm-first framework research agent.
MiroFlow is an agent framework that enables tool-use agent tasks, featuring a reproducible GAIA score of 82.4%.
MiroThinker is an open source deep research agent optimized for research and prediction. It achieves a 80.8% Avg@8 score on the challenging GAIA benchmark.
Repository for MuSiQue: Multi-hop Questions via Single-hop Question Composition, TACL 2022
Synthetic data curation for post-training and structured data extraction
Financial News Aggregator - Real Time & Query API for Financial News
Democratizing Internet-scale financial data.
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
The definitive Web UI for local AI, with powerful features and easy setup.
Instruct-tune LLaMA on consumer hardware
A collection of libraries to optimise AI model performances
Running large language models on a single GPU for throughput-oriented scenarios.
Power CLI and Workflow manager for LLMs (core package)
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data…
Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.
Dataset of GPT-2 outputs for research in detection, biases, and more
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Hackable and optimized Transformers building blocks, supporting a composable construction.
Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"
Cracking the Coding Interview 6th Ed. Python Solutions