-
University of Cambridge
- Tonga
-
19:25
(UTC +08:00) - https://cartus.github.io/
- @ZhijiangG
Highlights
- Pro
Stars
A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Language Models (LLMs).
Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"
[NeurIPS 2025🔥]Main source code of SRPO framework.
Code repo for FaStFact: Faster, Stronger Long-Form Factuality Evaluations in LLMs.
Official Repo for SvS: A Self-play with Variational Problem Synthesis strategy for RLVR training
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
The official implemention of "Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration"
[ICML 2025🔥] ParallelComp: Parallel Long-Context Compressor for Length Extrapolation
An LLM framework for deep and efficient scientific peer review
Extract information from various climate scientific graphics to combat misinformation and support scientific communication
Latest Advances on Federated LLM Learning
[NeurIPS'25] EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code
Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"
[NeurIPS 2025 D&B (Spotlight🌟)] TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenario
[PVLDB 2024 Best Paper Nomination] TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods
[NeurIPS 2025] Atom of Thoughts for Markov LLM Test-Time Scaling
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
A series of technical report on Slow Thinking with LLM
The official GitHub repository of the paper "Recent advances in large langauge model benchmarks against data contamination: From static to dynamic evaluation"
Efficient triton implementation of Native Sparse Attention.
CiteCheck: Towards Accurate Citation Faithfulness Detection
Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"
Official Repo for Open-Reasoner-Zero
Latest Advances on System-2 Reasoning