Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Change the repository type filter

All

    Repositories list

    • SCSS
      4101Updated Nov 12, 2025Nov 12, 2025
    • Python
      0100Updated Nov 5, 2025Nov 5, 2025
    • MCTS-RAG

      Public
      Data and Code for EMNLP 2025 Findings Paper "MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search"
      Python
      117930Updated Nov 4, 2025Nov 4, 2025
    • MDCure

      Public
      MDCure: A Scalable Pipeline for Multi-Document Instruction-Following (ACL 2025)
      Python
      2901Updated Nov 3, 2025Nov 3, 2025
    • MetaFaith

      Public
      MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs (EMNLP 2025)
      Python
      1700Updated Nov 3, 2025Nov 3, 2025
    • Repository for the paper "CourtReasoner: Can LLM Agents Reason Like Judges?"
      Python
      0100Updated Oct 30, 2025Oct 30, 2025
    • Python
      0610Updated Oct 15, 2025Oct 15, 2025
    • MSRS

      Public
      Data and Code for COLM 2025 Paper "MSRS: Evaluating Multi-Source Retrieval-Augmented Generation"
      Python
      53300Updated Aug 29, 2025Aug 29, 2025
    • RoMMath

      Public
      Data and Code for NAACL 2025 paper "Are Multimodal LLMs Robust Against Adversarial Perturbations? RoMMath: A Systematic Evaluation on Multimodal Math Reasoning"
      Python
      0200Updated Aug 26, 2025Aug 26, 2025
    • SciArena

      Public
      Analysis code for Neurips 2025 paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"
      Python
      85421Updated Aug 6, 2025Aug 6, 2025
    • LimitGen

      Public
      Data and Code for ACL 2025 Paper "Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers"
      Jupyter Notebook
      0710Updated Jul 24, 2025Jul 24, 2025
    • AbGen

      Public
      Data and code for the ACL 2025 paper "AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research"
      Python
      2400Updated Jul 24, 2025Jul 24, 2025
    • SciSketch

      Public
      Python
      0100Updated Jun 20, 2025Jun 20, 2025
    • cpsc477

      Public
      Course website for CPSC 477/577 Natural Language Processing Spring 2025 at Yale University
      SCSS
      0100Updated Apr 10, 2025Apr 10, 2025
    • Physics

      Public
      Python
      22120Updated Apr 1, 2025Apr 1, 2025
    • MMVU

      Public
      Data and Code for CVPR 2025 paper "MMVU: Measuring Expert-Level Multi-Discipline Video Understanding"
      Python
      17500Updated Feb 28, 2025Feb 28, 2025
    • SciDQA

      Public
      Python
      0400Updated Feb 26, 2025Feb 26, 2025
    • M3SciQA

      Public
      Python
      11110Updated Jan 13, 2025Jan 13, 2025
    • Data and Code for ACL 2024 paper "DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents"
      Python
      02320Updated Dec 21, 2024Dec 21, 2024
    • Jupyter Notebook
      0300Updated Nov 18, 2024Nov 18, 2024
    • TAIL

      Public
      A Toolkit for Automatic and Realistic Long-Context Large Language Model Evaluation
      Python
      0600Updated Nov 14, 2024Nov 14, 2024
    • TOMATO

      Public
      Python
      03430Updated Nov 8, 2024Nov 8, 2024
    • COMAL

      Public
      Python
      0100Updated Oct 31, 2024Oct 31, 2024
    • ReIFE

      Public
      Python
      0200Updated Oct 10, 2024Oct 10, 2024
    • ODSum

      Public
      Data and code for paper "ODSum: New Benchmarks for Open Domain Multi-Document Summarization"
      Python
      21110Updated Sep 20, 2024Sep 20, 2024
    • MRoSE

      Public
      Python
      0000Updated Sep 19, 2024Sep 19, 2024
    • Data and Code for the paper "FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains"
      Python
      52410Updated Aug 10, 2024Aug 10, 2024
    • refdpo

      Public
      Python
      11610Updated Jul 23, 2024Jul 23, 2024
    • This is the repo for ACL 2024 Finding paper - Unveiling the Spectrum of Data Contamination in Language Model: A Survey from Detection to Remediation
      0900Updated Jun 27, 2024Jun 27, 2024
    • Python
      21300Updated May 16, 2024May 16, 2024