Thanks to visit codestin.com
Credit goes to www.alphaxiv.org

Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

GitHub
Renmin University of China logoRenmin University of China
This research introduces LLaDA, an 8-billion parameter large language diffusion model, demonstrating that a non-autoregressive architecture can achieve competitive general-purpose capabilities with leading autoregressive models like LLaMA3 8B. The model performs comparably in in-context learning and instruction following, while uniquely overcoming the "reversal curse" prevalent in traditional autoregressive LLMs.
1,233
Agentic Reinforced Policy Optimization (ARPO) trains Large Language Models to act as agents in multi-turn tool interaction scenarios. The method leverages observed increases in token entropy after tool calls to adaptively branch sampling, which leads to superior performance on 13 reasoning and deep search benchmarks while achieving comparable accuracy with half the tool-call budget of previous methods.
634
FlowRL reimagines large language model post-training by optimizing for reward distribution matching instead of simple reward maximization, yielding improved accuracy on math and code benchmarks and substantially greater diversity in reasoning paths.
92
The research introduces "Pass@k Training," a novel Reinforcement Learning with Verifiable Rewards (RLVR) method that utilizes the Pass@k metric as a reward signal to enhance exploration in large language models. This approach consistently improves both Pass@k and Pass@1 performance, allowing a 7B model trained with a combined strategy to surpass larger closed-source LLMs like Grok-2, GPT-4o, and Claude-3.7-Sonnet on challenging reasoning benchmarks.
89
This comprehensive survey from a large multi-institutional collaboration examines "Latent Reasoning" in Large Language Models, an emerging paradigm that performs multi-step inference entirely within the model's high-bandwidth continuous hidden states to overcome the limitations of natural language-based explicit reasoning. It highlights the significant bandwidth advantage of latent representations (approximately 2700x higher) and provides a unified taxonomy of current methodologies.
238
Researchers from Renmin University, Microsoft Research Asia, Shanghai Jiao Tong University, and BIGAI introduce an entropy-based advantage shaping method for Reinforcement Learning (RL) fine-tuning of Large Language Models. This method consistently improves multi-step reasoning capabilities, yielding substantial gains on Pass@K metrics across mathematical reasoning benchmarks by encouraging deeper and more exploratory reasoning chains.
This survey paper formalizes the concept of LLM-as-a-Judge, categorizes existing methods, and empirically evaluates strategies to enhance reliability and mitigate biases. It finds that GPT-4-turbo leads in performance and that majority voting among multiple evaluations effectively improves the reliability of LLM-based judgments.
MemOS provides an operating system-inspired framework for Large Language Models, treating memory as a first-class, schedulable, and evolvable system resource to overcome limitations in long-context reasoning and knowledge evolution. This system consistently outperforms state-of-the-art baselines on memory-intensive benchmarks like LOCOMO and significantly reduces inference latency through KV-based memory acceleration.
2,562
Researchers introduced a method to jailbreak Large Language Models by humanizing them and applying social science-based persuasion techniques, demonstrating high attack success rates (over 92% on Llama-2 and GPT models) and proposing adaptive defenses that significantly mitigate these vulnerabilities.
323
Renmin University researchers developed the Adaptive Temporal Motion Guided Graph Convolution Network (ATM-GCN), a framework that explicitly models temporal dependencies and motion features via an adaptively weighted graph convolutional network for micro-expression recognition. This approach establishes new state-of-the-art results on challenging datasets such as CAS(ME)^3 and Composite, demonstrating improved accuracy by effectively capturing subtle facial dynamics.
· +2
AFLOW introduces an automated framework for generating and optimizing agentic workflows for Large Language Models, reformulating workflow optimization as a search problem over code-represented workflows. The system leverages Monte Carlo Tree Search with LLM-based optimization to iteratively refine workflows, yielding a 19.5% average performance improvement over existing automated methods while enabling smaller, more cost-effective LLMs to achieve performance parity with larger models.
130
VeriFree proposes a novel approach to training large language models (LLMs) for general reasoning that eliminates the need for explicit verifiers by directly optimizing the likelihood of a reference answer conditioned on the model's generated reasoning trace. The method achieves substantial accuracy gains on challenging general reasoning benchmarks like MMLU-Pro and SuperGPQA, and demonstrates improved learning efficiency and transferability of reasoning skills to diverse domains.
2
An open-source framework, ToolLLM, enables large language models to master over 16,000 real-world APIs, featuring the ToolBench instruction-tuning dataset and the ToolLLaMA model, which achieves performance comparable to ChatGPT.
5,280
·
This paper introduces a framework that enhances large reasoning models with agentic search and knowledge refinement capabilities
626
This survey provides a comprehensive and systematic review of Large Language Model-based autonomous agents, proposing a unified architectural framework and categorizing their construction methods, diverse applications, and evaluation strategies. The work synthesizes over 100 recent papers, identifying commonalities, taxonomies, and outlining critical challenges for future research in the field.
This survey presents the first comprehensive review of memory mechanisms for Large Language Model (LLM)-based agents, categorizing memory by sources, forms, and operations. It details how memory facilitates agent self-evolution and supports diverse applications while outlining current limitations and future research directions in this field.
362
Researchers from Renmin University of China and Ant Group develop Variance-Reduced Preference Optimization (VRPO), a framework that enables effective Direct Preference Optimization for Masked Diffusion Models by addressing the high variance and bias introduced by Monte Carlo ELBO estimation through three unbiased techniques: increased sampling budget, optimal allocation of samples across timesteps, and antithetic sampling that shares random variables between policy and reference models, achieving consistent improvements across mathematical reasoning (+4.7 on GSM8K), code generation (+3.0 on HumanEval), and alignment tasks (+4.3 on Arena-Hard) with LLaDA 1.5 demonstrating competitive performance against autoregressive models while establishing the first successful application of RL-based alignment to large language diffusion models.
46
R1-Searcher, from Renmin University of China, introduces a two-stage outcome-based reinforcement learning framework that enables Large Language Models to autonomously invoke and leverage external search systems. This approach significantly outperforms strong RAG baselines on multi-hop question answering benchmarks and demonstrates robust generalization to out-of-domain and online search scenarios.
261
· +1
This survey paper defines and applies a 'full-stack' safety concept for Large Language Models (LLMs), systematically analyzing safety concerns across their entire lifecycle from data to deployment and commercialization. The collaboration synthesizes findings from over 900 papers, providing a unified taxonomy of attacks and defenses while identifying key insights and future research directions for LLM and LLM-agent safety.
5
LLaDA-V introduces a purely diffusion-based Multimodal Large Language Model (MLLM) capable of end-to-end training and sampling across modalities. It achieves competitive performance against and often surpasses autoregressive MLLMs like LLaMA3-V across various multimodal benchmarks, demonstrating enhanced data scalability in several key domains.
247
There are no more papers matching your filters at the moment.