Codestin Search App

alphaXiv

Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

Papers Datasets

Renmin University of China

36,878

18 Feb 2025

computer-science computation-and-language machine-learning

Large Language Diffusion Models

Ant Group

Renmin University of China

zebin you

This research introduces LLaDA, an 8-billion parameter large language diffusion model, demonstrating that a non-autoregressive architecture can achieve competitive general-purpose capabilities with leading autoregressive models like LLaMA3 8B. The model performs comparably in in-context learning and instruction following, while uniquely overcoming the "reversal curse" prevalent in traditional autoregressive LLMs.

1,233

3,167

26 Jul 2025

agentic-frameworks agents computer-science

Agentic Reinforced Policy Optimization

Renmin University of China Kuaishou Technology

Agentic Reinforced Policy Optimization (ARPO) trains Large Language Models to act as agents in multi-turn tool interaction scenarios. The method leverages observed increases in token entropy after tool calls to adaptively branch sampling, which leads to superior performance on 13 reasoning and deep search benchmarks while achieving comparable accuracy with half the tool-call budget of previous methods.

634

2,812

30 Sep 2025

computer-science artificial-intelligence computation-and-language

FlowRL: Matching Reward Distributions for LLM Reasoning

Shanghai AI Laboratory

Shanghai Jiao Tong University

Tsinghua University

Stanford University

Renmin University of China

Peking University

Microsoft

Toyota Technological Institute at Chicago

daixuan cheng

FlowRL reimagines large language model post-training by optimizing for reward distribution matching instead of simple reward maximization, yielding improved accuracy on math and code benchmarks and substantially greater diversity in reasoning paths.

1,751

14 Aug 2025

agents computer-science artificial-intelligence

Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

ByteDance

Renmin University of China

The research introduces "Pass@k Training," a novel Reinforcement Learning with Verifiable Rewards (RLVR) method that utilizes the Pass@k metric as a reward signal to enhance exploration in large language models. This approach consistently improves both Pass@k and Pass@1 performance, allowing a 7B model trained with a combined strategy to surpass larger closed-source LLMs like Grok-2, GPT-4o, and Claude-3.7-Sonnet on challenging reasoning benchmarks.

2,589

10 Jul 2025

chain-of-thought computer-science computation-and-language

A Survey on Latent Reasoning

University of Manchester

Fudan University

Nanjing University

Renmin University of China

Peking University

University of Wisconsin-Madison Hong Kong Polytechnic University

University of California, Santa Cruz M-A-P

This comprehensive survey from a large multi-institutional collaboration examines "Latent Reasoning" in Large Language Models, an emerging paradigm that performs multi-step inference entirely within the model's high-bandwidth continuous hidden states to overcome the limitations of natural language-based explicit reasoning. It highlights the significant bandwidth advantage of latent representations (approximately 2700x higher) and provides a unified taxonomy of current methodologies.

238

5,576

12 Aug 2025

agents chain-of-thought computer-science

Reasoning with Exploration: An Entropy Perspective on Reinforcement Learning for LLMs

Shanghai Jiao Tong University

Renmin University of China

Microsoft Beijing Institute for General Artificial Intelligence

daixuan cheng

Researchers from Renmin University, Microsoft Research Asia, Shanghai Jiao Tong University, and BIGAI introduce an entropy-based advantage shaping method for Reinforcement Learning (RL) fine-tuning of Large Language Models. This method consistently improves multi-step reasoning capabilities, yielding substantial gains on Pass@K metrics across mathematical reasoning benchmarks by encouraging deeper and more exploratory reasoning chains.

15,743

09 Mar 2025

computer-science artificial-intelligence computation-and-language

A Survey on LLM-as-a-Judge

Chinese Academy of Sciences

Imperial College London

Renmin University of China IDEA Research, International Digital Economy Academy

This survey paper formalizes the concept of LLM-as-a-Judge, categorizes existing methods, and empirically evaluates strategies to enhance reliability and mitigate biases. It finds that GPT-4-turbo leads in performance and that majority voting among multiple evaluations effectively improves the reliability of LLM-based judgments.

1,824

05 Aug 2025

computer-science continual-learning computation-and-language

MemOS: A Memory OS for AI System

Tongji University

University of Science and Technology of China

Beihang University

Shanghai Jiao Tong University

Zhejiang University

Renmin University of China

Peking University Institute for Advanced Algorithms Research, Shanghai Research Institute of China Telecom MemTensor (Shanghai) Technology Co., Ltd.

Jiahao Huo

MemOS provides an operating system-inspired framework for Large Language Models, treating memory as a first-class, schedulable, and evolvable system resource to overcome limitations in long-context reasoning and knowledge evolution. This system consistently outperforms state-of-the-art baselines on memory-intensive benchmarks like LOCOMO and significantly reduces inference latency through KV-based memory acceleration.

2,562

2,905

23 Jan 2024

adversarial-attacks adversarial-robustness computer-science

How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

Stanford University

Renmin University of China

Virginia Tech UC, Davis

Researchers introduced a method to jailbreak Large Language Models by humanizing them and applying social science-based persuasion techniques, demonstrating high attack success rates (over 92% on Llama-2 and GPT models) and proposing adaptive defenses that significantly mitigate these vulnerabilities.

323

931

13 Jun 2024

attention-mechanisms computer-science computer-vision-security

Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

Renmin University of China

Renmin University researchers developed the Adaptive Temporal Motion Guided Graph Convolution Network (ATM-GCN), a framework that explicitly models temporal dependencies and motion features via an adaptively weighted graph convolutional network for micro-expression recognition. This approach establishes new state-of-the-art results on challenging datasets such as CAS(ME)^3 and Composite, demonstrating improved accuracy by effectively capturing subtle facial dynamics.

7,191

15 Apr 2025

agent-based-systems computer-science artificial-intelligence

AFlow: Automating Agentic Workflow Generation

Fudan University

Nanjing University

Renmin University of China The Hong Kong University of Science and Technology (Guangzhou)

HKUST King Abdullah University of Science and Technology DeepWisdom Universit ́e de Montr ́eal & Mila

Xiong-Hui Chen

Bang Liu

AFLOW introduces an automated framework for generating and optimizing agentic workflows for Large Language Models, reformulating workflow optimization as a search problem over code-represented workflows. The system leverages Monte Carlo Tree Search with LLM-based optimization to iteratively refine workflows, yielding a 19.5% average performance improvement over existing automated methods while enabling smaller, more cost-effective LLMs to achieve performance parity with larger models.

130

4,497

27 May 2025

agents chain-of-thought computer-science

Reinforcing General Reasoning without Verifiers

Chinese Academy of Sciences

National University of Singapore

University of Oxford

Renmin University of China Sea AI Lab

VeriFree proposes a novel approach to training large language models (LLMs) for general reasoning that eliminates the need for explicit verifiers by directly optimizing the likelihood of a reference answer conditioned on the model's generated reasoning trace. The method achieves substantial accuracy gains on challenging general reasoning benchmarks like MMLU-Pro and SuperGPQA, and demonstrates improved learning efficiency and transferability of reasoning skills to diverse domains.

2,407

03 Oct 2023

computer-science conversational-ai artificial-intelligence

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Tsinghua University

Renmin University of China

Yale University Tencent Inc.Modelbest Inc.Zhihu Inc.

An open-source framework, ToolLLM, enables large language models to master over 16,000 real-world APIs, featuring the ToolBench instruction-tuning dataset and the ToolLLaMA model, which achieves performance comparable to ChatGPT.

5,280

8,030

09 Jan 2025

computer-science artificial-intelligence computation-and-language

Search-o1: Agentic Search-Enhanced Large Reasoning Models

Tsinghua University

Renmin University of China

董冠霆

xiaoxi Li

This paper introduces a framework that enhances large reasoning models with agentic search and knowledge refinement capabilities

626

9,834

02 Mar 2025

agent-based-systems computer-science conversational-ai

A Survey on Large Language Model based Autonomous Agents

Renmin University of China

This survey provides a comprehensive and systematic review of Large Language Model-based autonomous agents, proposing a unified architectural framework and categorizing their construction methods, diverse applications, and evaluation strategies. The work synthesizes over 100 recent papers, identifying commonalities, taxonomies, and outlining critical challenges for future research in the field.

7,067

21 Apr 2024

agent-based-systems computer-science artificial-intelligence

A Survey on the Memory Mechanism of Large Language Model based Agents

Huawei Noah’s Ark Lab

Renmin University of China

Xu Chen

This survey presents the first comprehensive review of memory mechanisms for Large Language Model (LLM)-based agents, categorizing memory by sources, forms, and operations. It details how memory facilitates agent self-evolution and supports diverse applications while outlining current limitations and future research directions in this field.

362

1,693

25 May 2025

computer-science machine-learning generative-models

LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models

Ant Group

Tsinghua University

Renmin University of China

Researchers from Renmin University of China and Ant Group develop Variance-Reduced Preference Optimization (VRPO), a framework that enables effective Direct Preference Optimization for Masked Diffusion Models by addressing the high variance and bias introduced by Monte Carlo ELBO estimation through three unbiased techniques: increased sampling budget, optimal allocation of samples across timesteps, and antithetic sampling that shares random variables between policy and reference models, achieving consistent improvements across mathematical reasoning (+4.7 on GSM8K), code generation (+3.0 on HumanEval), and alignment tasks (+4.3 on Arena-Hard) with LLaDA 1.5 demonstrating competitive performance against autoregressive models while establishing the first successful application of RL-based alignment to large language diffusion models.

63,157

18 Mar 2025

agents chain-of-thought computer-science

R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Renmin University of China DataCanvas Alaya NeW

Eliver Q

R1-Searcher, from Renmin University of China, introduces a two-stage outcome-based reinforcement learning framework that enables Large Language Models to autonomously invoke and leverage external search systems. This approach significantly outperforms strong RAG baselines on multi-hop question answering benchmarks and demonstrates robust generalization to out-of-domain and online search scenarios.

261

37,342

09 Jun 2025

adversarial-attacks adversarial-robustness agents

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

University of Washington Wuhan University

University of Illinois at Urbana-Champaign

UCLA

Chinese Academy of Sciences Shanghai AI Laboratory

New York University

National University of Singapore

Fudan University

Georgia Institute of Technology

University of Science and Technology of China

Zhejiang University University of Electronic Science and Technology of China

Renmin University of China

The Hong Kong Polytechnic University

Peking University Griffith University

Nanyang Technological University

Johns Hopkins University

The University of Hong Kong

The Pennsylvania State University A*STAR Shanghai University University of Illinois at Chicago Singapore Management University

Southern University of Science and Technology

HKUST

Tencent TeleAI Squirrel Ai Learning Hong Kong University of Science and Technology (Guangzhou)The University of North Carolina at Chapel Hill Ben Gurion University Center for Applied Scientific Computing

Kun Wang

Fu An

This survey paper defines and applies a 'full-stack' safety concept for Large Language Models (LLMs), systematically analyzing safety concerns across their entire lifecycle from data to deployment and commercialization. The collaboration synthesizes findings from over 900 papers, providing a unified taxonomy of attacks and defenses while identifying key insights and future research directions for LLM and LLM-agent safety.

2,061

04 Jun 2025

computer-science computation-and-language computer-vision-and-pattern-recognition

LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning

Ant Group

Renmin University of China

LLaDA-V introduces a purely diffusion-based Multimodal Large Language Model (MLLM) capable of end-to-end training and sampling across modalities. It achieves competitive performance against and often surpasses autoregressive MLLMs like LLaMA3-V across various multimodal benchmarks, demonstrating enhanced data scalability in several key domains.

247

There are no more papers matching your filters at the moment.

Install Browser Extension

Blog|We're hiring

alphaXiv

Explore

Login

Feedback

Dark mode

Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

Large Language Diffusion Models

Agentic Reinforced Policy Optimization

FlowRL: Matching Reward Distributions for LLM Reasoning

Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

A Survey on Latent Reasoning

Reasoning with Exploration: An Entropy Perspective on Reinforcement Learning for LLMs

A Survey on LLM-as-a-Judge

MemOS: A Memory OS for AI System

How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

AFlow: Automating Agentic Workflow Generation

Reinforcing General Reasoning without Verifiers

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Search-o1: Agentic Search-Enhanced Large Reasoning Models

A Survey on Large Language Model based Autonomous Agents

A Survey on the Memory Mechanism of Large Language Model based Agents

LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models

R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning