-
Peking University
- Beijing
Stars
A curated list of fellowships for graduate students in Computer Science and related fields.
An R package to quantify uncertainty for transfer errors
A package for statistically rigorous scientific discovery using machine learning. Implements prediction-powered inference.
A professionally curated list of awesome Conformal Prediction videos, tutorials, books, papers, PhD and MSc theses, articles and open-source libraries.
Lightweight, useful implementation of conformal prediction on real data.
A curated list of fellowships for graduate students in Computer Science and related fields.
The official repository for our ACL 2024 paper: Are LLM-based Evaluators Confusing NLG Quality Criteria?
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
A high-throughput and memory-efficient inference and serving engine for LLMs
Arena-Hard-Auto: An automatic LLM benchmark.
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
The official repository for our EMNLP 2024 paper, Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability.
Statsmodels: statistical modeling and econometrics in Python
Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.
Fast computation of Krippendorff's alpha agreement measure in Python.
[COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild
A computer algebra system written in pure Python
Seahorse is a dataset for multilingual, multi-faceted summarization evaluation. It consists of 96K summaries with human ratings along 6 quality dimensions: comprehensibility, repetition, grammar, a…
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
ACL2023 - AlignScore, a metric for factual consistency evaluation.
Resources for paper "Reference Matters: Benchmarking Factual Error Correction for Dialogue Summarization with Fine-grained Evaluation Framework"
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Awesome-LLM: a curated list of Large Language Model
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Download market data from Yahoo! Finance's API