Thanks to visit codestin.com
Credit goes to github.com

Skip to content

RUC-NLPIR/LLM4IR-Survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

LLM4IR-Survey

This is the collection of papers related to large language models for information retrieval. These papers are organized according to our survey paper Large Language Models for Information Retrieval: A Survey.

Feel free to contact us if you find a mistake or have any advice. Email: [email protected] and [email protected].

🌟 Citation

Please kindly cite our paper if helps your research:

@article{LLM4IRSurvey,
    author={Yutao Zhu and
            Huaying Yuan and
            Shuting Wang and
            Jiongnan Liu and
            Wenhan Liu and
            Chenlong Deng and
            Haonan Chen and
            Zhicheng Dou and
            Ji-Rong Wen},
    title={Large Language Models for Information Retrieval: A Survey},
    journal={CoRR},
    volume={abs/2308.07107},
    year={2023},
    url={https://arxiv.org/abs/2308.07107},
    eprinttype={arXiv},
    eprint={2308.07107}
}

🚀 Update Log

  • Version 4 [2025-09-17]

    • Search Agent: We reformulate the search agent section.
    • Reranker: We add several listwise rerankers and Section 'Reasoning-intensive Rerankers'.
  • Version 3 [2024-09-03]

    • We refine the background to pay more attention to IR.
    • Rewriter: We add a new section "Formats of Rewritten Queries" to provide a more clear classfication and incorporated up-to-date methods.
    • Retriever: We incorporated up-to-date methods that utilize LLM to enlarge the dataset used for training retrievers or to improve the overall structure and design of retriever systems.
    • Reranker: We have added some unsupervised rerankers, several studies focusing on training data augmentation, and discussions on the limitations of LLM rerankers.
    • Reader: We added the latest studies on readers, particularly enriching the works in the active reader section.
    • Search Agent: We added the latest studies on static and dynamic search agents, particularly enriching the works in benchmarking and self-planning.
  • Version 2 [2024-01-19]

    • We added a new section to introduce search agents, which represent an innovative approach to integrating LLMs with IR systems.
    • Rewriter: We added recent works on LLM-based query rewriting, most of which focus on conversational search.
    • Retriever: We added the latest techniques that leverage LLMs to expand the training corpus for retrievers or to enhance retrievers' architectures.
    • Reranker: We added recent LLM-based ranking works to each of the three part: Utilizing LLMs as Supervised Rerankers, Utilizing LLMs as Unsupervised Rerankers, and Utilizing LLMs for Training Data Augmentation.
    • Reader: We added the latest studies in LLM-enhanced reader area, including a section introducing the reference compression technique, a section discussing the applications of LLM-enhanced readers, and a section analyzing the characteristics of LLM-enhanced readers.
    • Future Direction: We added a section about search agents and a section discussing the bias caused by leveraging LLMs into IR systems.

📋 Table of Content

📄 Paper List

Query Rewriter

Prompting Methods

  1. Query2doc: Query Expansion with Large Language Models, Wang et al., arXiv 2023. [Paper]
  2. Generative and Pseudo-Relevant Feedback for Sparse, Dense and Learned Sparse Retrieval, Mackie et al., arXiv 2023. [Paper]
  3. Generative Relevance Feedback with Large Language Models, Mackie et al., SIGIR 2023 (short paper). [Paper]
  4. GRM: Generative Relevance Modeling Using Relevance-Aware Sample Estimation for Document Retrieval, Mackie et al., arXiv 2023. [Paper]
  5. Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search, Mao et al., arXiv 2023. [Paper]
  6. Precise Zero-Shot Dense Retrieval without Relevance Labels, Gao et al., ACL 2023. [Paper]
  7. Query Expansion by Prompting Large Language Models, Jagerman et al., arXiv 2023. [Paper]
  8. Large Language Models are Strong Zero-Shot Retriever, Shen et al., arXiv 2023. [Paper]
  9. Enhancing Conversational Search: Large Language Model-Aided Informative Query Rewriting, Ye et al., EMNLP 2023 (Findings). [Paper]
  10. Can generative llms create query variants for test collections? an exploratory study, M. Alaofi et al., SIGIR 2023 (short paper). [Paper]
  11. Corpus-Steered Query Expansion with Large Language Models, Lei et al., EACL 2024 (Short Paper). [Paper]
  12. Large language model based long-tail query rewriting in taobao search, Peng et al., WWW 2024. [Paper]
  13. Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers?, Li et al., SIGIR 2024. [Paper]
  14. Query Performance Prediction using Relevance Judgments Generated by Large Language Models, Meng et al., arXiv 2024. [Paper]
  15. RaFe: Ranking Feedback Improves Query Rewriting for RAG, Mao et al., arXiv 2024. [Paper]
  16. Crafting the Path: Robust Query Rewriting for Information Retrieval, Baek et al., arXiv 2024. [Paper]
  17. Query Rewriting for Retrieval-Augmented Large Language Models, Ma et al., arXiv 2023. [Paper]

Fine-tuning Methods

  1. QUILL: Query Intent with Large Language Models using Retrieval Augmentation and Multi-stage Distillation, Srinivasan et al., EMNLP 2022 (Industry). [Paper] (This paper explore fine-tuning methods in baseline experiments.)

Knowledge Distillation Methods

  1. QUILL: Query Intent with Large Language Models using Retrieval Augmentation and Multi-stage Distillation, Srinivasan et al., EMNLP 2022 (Industry). [Paper]
  2. Knowledge Refinement via Interaction Between Search Engines and Large Language Models, Feng et al., arXiv 2023. [Paper]
  3. Query Rewriting for Retrieval-Augmented Large Language Models, Ma et al., arXiv 2023. [Paper]

Retriever

Leveraging LLMs to Generate Search Data

  1. InPars: Data Augmentation for Information Retrieval using Large Language Models, Bonifacio et al., arXiv 2022. [Paper]
  2. Pre-training with Large Language Model-based Document Expansion for Dense Passage Retrieval, Ma et al., arXiv 2023. [Paper]
  3. InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval, Jeronymo et al., arXiv 2023. [Paper]
  4. Promptagator: Few-shot Dense Retrieval From 8 Examples, Dai et al., ICLR 2023. [Paper]
  5. AugTriever: Unsupervised Dense Retrieval by Scalable Data Augmentation, Meng et al., arXiv 2023. [Paper]
  6. UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers, Saad-Falco et al., arXiv 2023. [Paper]
  7. Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models, Peng et al., arXiv 2023. [Paper]
  8. CONVERSER: Few-shot Conversational Dense Retrieval with Synthetic Data Generation, Huang et al., ACL 2023. [Paper]
  9. Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval, Thakur et al., arXiv 2023. [Paper]
  10. Questions Are All You Need to Train a Dense Passage Retriever, Sachan et al., ACL 2023. [Paper]
  11. Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators, Chen et al., EMNLP 2023. [Paper]
  12. Gecko: Versatile Text Embeddings Distilled from Large Language Models, Lee et al., arXiv 2024. [Paper]
  13. Improving Text Embeddings with Large Language Models, Wang et al., ACL 2024. [Paper]

Employing LLMs to Enhance Model Architecture

  1. Text and Code Embeddings by Contrastive Pre-Training, Neelakantan et al., arXiv 2022. [Paper]
  2. Fine-Tuning LLaMA for Multi-Stage Text Retrieval, Ma et al., arXiv 2023. [Paper]
  3. Large Dual Encoders Are Generalizable Retrievers, Ni et al., EMNLP 2022. [Paper]
  4. Task-aware Retrieval with Instructions, Asai et al., ACL 2023 (Findings). [Paper]
  5. Transformer memory as a differentiable search index, Tay et al., NeurIPS 2022. [Paper]
  6. Large Language Models are Built-in Autoregressive Search Engines, Ziems et al., ACL 2023 (Findings). [Paper]
  7. Chatretriever: Adapting large language models for generalized and robust conversational dense retrieval, Mao et al., arXiv. [Paper]
  8. How does generative retrieval scale to millions of passages?, Pradeep et al., ACL 2023. [Paper]
  9. CorpusLM: Towards a Unified Language Model on Corpus for Knowledge-Intensive Tasks, Li et al., SIGIR. [Paper]

Reranker

Utilizing LLMs as Supervised Rerankers

  1. Multi-Stage Document Ranking with BERT, Nogueira et al., arXiv 2019. [Paper]
  2. Document Ranking with a Pretrained Sequence-to-Sequence Model, Nogueira et al., EMNLP 2020 (Findings). [Paper]
  3. Text-to-Text Multi-view Learning for Passage Re-ranking, Ju et al., SIGIR 2021 (Short Paper). [Paper]
  4. The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models, Pradeep et al., arXiv 2021. [Paper]
  5. RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses, Zhuang et al., SIGIR 2023 (Short Paper). [Paper]
  6. Fine-Tuning LLaMA for Multi-Stage Text Retrieval, Ma et al., arXiv 2023. [Paper]
  7. A Two-Stage Adaptation of Large Language Models for Text Ranking, Zhang et al., ACL 2024 (Findings). [Paper]
  8. Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models, Zhang et al., arXiv 2023. [Paper]
  9. ListT5: Listwise Reranking with Fusion-in-Decoder Improves Zero-shot Retrieval, Yoon et al., ACL 2024. [Paper]
  10. Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models, Peng et al., arXiv 2024. [Paper]
  11. Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models, Liu et al., arXiv 2024. [Paper]

Utilizing LLMs as Unsupervised Rerankers

  1. Holistic Evaluation of Language Models, Liang et al., arXiv 2022. [Paper]
  2. Improving Passage Retrieval with Zero-Shot Question Generation, Sachan et al., EMNLP 2022. [Paper]
  3. Discrete Prompt Optimization via Constrained Generation for Zero-shot Re-ranker, Cho et al., ACL 2023 (Findings). [Paper]
  4. Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking, Zhuang et al., EMNLP 2023 (Findings). [Paper]
  5. PaRaDe: Passage Ranking using Demonstrations with Large Language Models, Drozdov et al., EMNLP 2023 (Findings). [Paper]
  6. Beyond Yes and No: Improving Zero-Shot LLM Rankers via Scoring Fine-Grained Relevance Labels, Zhuang et al., arXiv 2023. [Paper]
  7. Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent, Sun et al., EMNLP 2023. [Paper]
  8. Zero-Shot Listwise Document Reranking with a Large Language Model, Ma et al., arXiv 2023. [Paper]
  9. Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models, Tang et al., arXiv 2023. [Paper]
  10. Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting, Qin et al., NAACL 2024 (Findings). [Paper]
  11. A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models, Zhuang et al., SIGIR 2024. [Paper]
  12. InstUPR: Instruction-based Unsupervised Passage Reranking with Large Language Models, Huang and Chen, arXiv 2024. [Paper]
  13. Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers, Guo et al., arXiv 2024. [Paper]
  14. DemoRank: Selecting Effective Demonstrations for Large Language Models in Ranking Task, Liu et al., arXiv 2024. [Paper]
  15. An Investigation of Prompt Variations for Zero-shot LLM-based Rankers, Sun et al., arXiv 2024. [Paper]
  16. TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy, Chen et al., arXiv 2024. [Paper]
  17. Top-Down Partitioning for Efficient List-Wise Ranking, Parry et al., arXiv 2024. [Paper]
  18. PRP-Graph: Pairwise Ranking Prompting to LLMs with Graph Aggregation for Effective Text Re-ranking, Luo et al., ACL 2024. [Paper]
  19. Consolidating Ranking and Relevance Predictions of Large Language Models through Post-Processing, Yan et al., arXiv 2024. [Paper]
  20. Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models, Liu et al., ACL 2025. [Paper]
  21. CoRanking: Collaborative Ranking with Small and Large Ranking Agents, Liu et al., EMNLP 2025 (Findings). [Paper]
  22. APEER : Automatic Prompt Engineering Enhances Large Language Model Reranking, Jin et al., WWW 2025. [Paper]
  23. Consolidating Ranking and Relevance Predictions of Large Language Models through Post-Processing, Yan et al., EMNLP 2024. [Paper]

Utilizing LLMs for Training Data Augmentation

  1. ExaRanker: Explanation-Augmented Neural Ranker, Ferraretto et al., SIGIR 2023 (Short Paper). [Paper]
  2. InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers, Boytsov et al., arXiv 2023. [Paper]
  3. Generating Synthetic Documents for Cross-Encoder Re-Rankers, Askari et al., arXiv 2023. [Paper]
  4. Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers, Sun et al., arXiv 2023. [Paper]
  5. RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models, Pradeep et al., arXiv 2023. [Paper]
  6. RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!, Pradeep et al., arXiv 2023. [Paper]
  7. ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMs, Ferraretto et al., arXiv 2024. [Paper]
  8. Expand, Highlight, Generate: RL-driven Document Generation for Passage Reranking, Askari et al., EMNLP 2023. [Paper]
  9. FIRST: Faster Improved Listwise Reranking with Single Token Decoding, Reddy et al., arXiv 2024. [Paper]

Reasoning-intensive Rerankers

  1. ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability, Liu et al., arXiv 2025. [Paper]
  2. Rank1: Test-Time Compute for Reranking in Information Retrieval, Weller et al., arXiv 2025. [Paper]
  3. Rank-K: Test-Time Reasoning for Listwise Reranking, Yang et al., arXiv 2025. [Paper]
  4. REARANK: Reasoning Re-ranking Agent via Reinforcement Learning, Zhang et al., arXiv 2025. [Paper]
  5. Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning, Zhuang et al., arXiv 2025. [Paper]
  6. TFRank: Think-Free Reasoning Enables Practical Pointwise LLM Ranking, Fan et al., arXiv 2025. [Paper]

Reader

Passive Reader

  1. REALM: Retrieval-Augmented Language Model Pre-Training, Guu et al., ICML 2020. [Paper]
  2. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Lewis et al., NeurIPS 2020. [Paper]
  3. REPLUG: Retrieval-Augmented Black-Box Language Models, Shi et al., arXiv 2023. [Paper]
  4. Atlas: Few-shot Learning with Retrieval Augmented Language Models, Izacard et al., JMLR 2023. [Paper]
  5. Internet-augmented Language Models through Few-shot Prompting for Open-domain Question Answering, Lazaridou et al., arXiv 2022. [Paper]
  6. Rethinking with Retrieval: Faithful Large Language Model Inference, He et al., arXiv 2023. [Paper]
  7. FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation, Vu et al., arxiv 2023. [Paper]
  8. Enabling Large Language Models to Generate Text with Citations, Gao et al., EMNLP 2023. [Paper]
  9. Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models, Yu et al., arxiv 2023. [Paper]
  10. Improving Retrieval-Augmented Large Language Models via Data Importance Learning, Lyu et al., arXiv 2023. [Paper]
  11. Search Augmented Instruction Learning, Luo et al., EMNLP 2023 (Findings). [Paper]
  12. RADIT: Retrieval-Augmented Dual Instruction Tuning, Lin et al., arXiv 2023. [Paper]
  13. Improving Language Models by Retrieving from Trillions of Tokens, Borgeaud et al., ICML 2022. [Paper]
  14. In-Context Retrieval-Augmented Language Models, Ram et al., arXiv 2023. [Paper]
  15. Interleaving Retrieval with Chain-of-thought Reasoning for Knowledge-intensive Multi-step Questions, Trivedi et al., ACL 2023. [Paper]
  16. Improving Language Models via Plug-and-Play Retrieval Feedback, Yu et al., arXiv 2023. [Paper]
  17. Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy, Shao et al., EMNLP 2023 (Findings). [Paper]
  18. Retrieval-Generation Synergy Augmented Large Language Models, Feng et al., arXiv 2023. [Paper]
  19. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection, Asai et al., arXiv 2023. [Paper]
  20. Active Retrieval Augmented Generation, Jiang et al., EMNLP 2023. [Paper]

Active Reader

  1. Measuring and Narrowing the Compositionality Gap in Language Models, Press et al., arXiv 2022. [Paper]
  2. DEMONSTRATE–SEARCH–PREDICT: Composing Retrieval and Language Models for Knowledge-intensive NLP, Khattab et al., arXiv 2022. [Paper]
  3. Answering Questions by Meta-Reasoning over Multiple Chains of Thought, Yoran et al., arXiv 2023. [Paper]
  4. PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers, Lee ei al., arXiv 2024. [Paper]
  5. Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs, Wang et al., arXiv 2024. [Paper]

Compressor

  1. LeanContext: Cost-Efficient Domain-Specific Question Answering Using LLMs, Arefeen et al., arXiv 2023. [Paper]
  2. RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation, Xu et al., arXiv 2023. [Paper]
  3. TCRA-LLM: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction, Liu et al., EMNLP 2023 (Findings). [Paper]
  4. Learning to Filter Context for Retrieval-Augmented Generation, Wang et al., arXiv 2023. [Paper]

Analysis

  1. Lost in the Middle: How Language Models Use Long Contexts, Liu et al., arXiv 2023. [Paper]
  2. Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation, Ren et al., arXiv 2023. [Paper]
  3. Exploring the Integration Strategies of Retriever and Large Language Models, Liu et al., arXiv 2023. [Paper]
  4. Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models, Aksitov et al., arXiv 2023. [Paper]
  5. When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories, Mallen et al., ACL 2023. [Paper]

Applications

  1. Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering, Wang et al., arXiv 2023. [Paper]
  2. ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science, Munikoti et al., arXiv 2023. [Paper]
  3. Crosslingual Retrieval Augmented In-context Learning for Bangla, Li et al., arXiv 2023. [Paper]
  4. Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature, Lozano et al., arXiv 2023. [Paper]
  5. Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models, Zhang et al., ICAIF 2023. [Paper]
  6. Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models, Louis et al., arXiv 2023. [Paper]
  7. RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit, Liu et al., arXiv 2023. [Paper]
  8. Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models, Jiang et al., arXiv 2023. [Paper]
  9. RaLLe: A Framework for Developing and Evaluating Retrieval-Augmented Large Language Models, Hoshi et al., EMNLP 2023. [Paper]
  10. Don't forget private retrieval: distributed private similarity search for large language models, Zyskind et al., arXiv 2023. [Paper]

Search Agent

Information Seeking Module

  1. A cognitive writing perspective for constrained long-form text generation, Wan et al., ACL 2025 (Findings). [Paper]
  2. CoSearchAgent: A Lightweight Collaborative Search Agent with Large Language Models, Gong et al., SIGIR 2024. [Paper]
  3. Search-o1: Agentic search-enhanced large reasoning models, Li et al., arXiv 2025. [Paper]
  4. Agent Laboratory: Using LLM Agents as Research Assistants, Schmidgall et al., arXiv 2025. [Paper]
  5. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, Lu et al., arXiv 2024. [Paper]
  6. Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models, Yu et al., arXiv 2024. [Paper]
  7. SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis, Sun et al., arXiv 2025. [Paper]
  8. Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning, Dong et al., arXiv 2025. [Paper]
  9. ZeroSearch: Incentivize the Search Capability of LLMs without Searching, Sun et al., arXiv 2025. [Paper]
  10. Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution, Qiu et al., arXiv 2025. [Paper]

Benchmarks and Resources

  1. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension, Joshi et al., ACL 2017. [Paper]
  2. Measuring short-form factuality in large language models, Wei et al., arXiv 2024. [Paper]
  3. When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories, Mallen et al., ACL 2023. [Paper]
  4. Natural Questions: a Benchmark for Question Answering Research, Kwiatkowski et al., ACL 2019. [Paper]
  5. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering, Yang et al., EMNLP 2018. [Paper]
  6. Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps, Ho et al., COLING 2020. [Paper]
  7. Humanity's Last Exam, Phan et al., arXiv 2025. [Paper]
  8. BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents, Wei et al., arXiv 2025. [Paper]
  9. GAIA: a benchmark for General AI Assistants, Mialon et al., ICLR 2024. [Paper]
  10. AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?, Yoran et al., EMNLP 2024. [Paper]
  11. Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks, Fourney et al., arXiv 2024. [Paper]
  12. SWE-bench: Can Language Models Resolve Real-World GitHub Issues?, Jimenez et al., arXiv 2023. [Paper]
  13. OctoPack: Instruction Tuning Code Large Language Models, Muennighoff et al., ICLR 2024. [Paper]
  14. MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering, Chan et al., ICLR 2025. [Paper]
  15. MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation, Huang et al., ICML 2024. [Paper]
  16. RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts, Wijk et al., Arxiv 2024. [Paper]
  17. ResearchTown: Simulator of Human Research Community, Yu et al., Arxiv 2024. [Paper]
  18. WebArena: A Realistic Web Environment for Building Autonomous Agents, Zhou et al., ICLR 2024. [Paper]
  19. Spa-Bench: a comprehensive Benchmark for Smartphone Agent Evaluation, Chen et al., ICLR 2025. [Paper]
  20. WebWalker: Benchmarking LLMs in Web Traversal, Wu et al., ACL 2025. [Paper]
  21. WebDancer: Towards Autonomous Information Seeking Agency, Wu et al., Arxiv 2025. [Paper]
  22. WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization, Tao et al., Arxiv 2025. [Paper]
  23. WebSailor: Navigating Super-human Reasoning for Web Agent, Li et al., Arxiv 2025. [Paper]

Other Resources

  1. ACL 2023 Tutorial: Retrieval-based Language Models and Applications, Asai et al., ACL 2023. [Link]
  2. A Survey of Large Language Models, Zhao et al., arXiv 2023. [Paper]
  3. Information Retrieval Meets Large Language Models: A Strategic Report from Chinese IR Community, Ai et al., arXiv 2023. [Paper]

About

This is the repo for the survey of LLM4IR.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 8