Codestin Search App

alphaXiv

Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

Papers Datasets

1,467

15 Oct 2025

computer-science artificial-intelligence machine-learning

The Art of Scaling Reinforcement Learning Compute for LLMs

Harvard University UT Austin UCL

UC Berkeley

Meta Periodic Labs

Researchers introduced a predictive framework for Reinforcement Learning (RL) in Large Language Models (LLMs) using a sigmoidal compute-performance curve, enabling performance extrapolation from smaller runs. Their ScaleRL recipe, demonstrated over 100,000 GPU-hours, achieves an asymptotic reward of 0.61 on verifiable math problems, outperforming established methods while exhibiting predictable scaling across model size, generation length, and multi-task settings.

4,444

13 Oct 2025

computer-science computer-vision-and-pattern-recognition machine-learning

Diffusion Transformers with Representation Autoencoders

New York University

Representation Autoencoders (RAEs) redefine the latent space for Diffusion Transformers (DiT) by utilizing frozen, pretrained visual encoders with lightweight decoders. This framework achieves state-of-the-art image generation, obtaining an FID of 1.13 on ImageNet 512x512, and demonstrates up to 47x faster convergence rates than prior DiT models.

1,102

1,290

14 Oct 2025

agents autonomous-vehicles computer-science

Robot Learning: A Tutorial

University of Oxford

Hugging Face

A tutorial developed by the University of Oxford and Hugging Face guides readers through modern robot learning, detailing the transition from classical methods to data-driven, learning-based paradigms. It provides conceptual understanding and practical tools using the `lerobot` open-source library, covering Reinforcement Learning, Imitation Learning, and generalist Vision-Language-Action policies with end-to-end examples.

389

414

14 Oct 2025

computer-science artificial-intelligence generative-models

RAG-Anything: All-in-One RAG Framework

Researchers at The University of Hong Kong developed RAG-ANYTHING, an all-in-one framework that addresses the text-centric limitation of existing Retrieval-Augmented Generation (RAG) systems by uniformly processing text, images, tables, and equations. This system leverages a dual-graph construction and cross-modal hybrid retrieval to achieve 63.4% accuracy on DocBench and 42.8% on MMLongBench, showing improved performance, particularly on long, multimodal documents.

8,281

893

15 Oct 2025

computer-science artificial-intelligence machine-learning

Tensor Logic: The Language of AI

University of Washington

Tensor Logic introduces a foundational language for AI, demonstrating that neural, symbolic, and statistical paradigms can be unified under a single mathematical construct: the tensor equation. The framework enables sound and transparent reasoning directly within embedding spaces, offering tunable control over the spectrum from deductive to analogical inference.

131

16 Oct 2025

computer-science artificial-intelligence computation-and-language

Reasoning with Sampling: Your Base Model is Smarter Than You Think

Harvard University

Researchers at Harvard University developed power sampling, a training-free method leveraging the Metropolis-Hastings algorithm to sample from a sharpened distribution of a base large language model. This technique unlocks latent reasoning capabilities, achieving single-shot performance comparable to or exceeding reinforcement learning post-training methods across various tasks, while also preserving generation diversity.

186

16 Oct 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Nanyang Technological University SenseTime Research Xi Jiaotong University

Researchers from S-Lab NTU, SenseTime Research, and Xi’an Jiaotong University introduced NEO, a family of native vision-language models built on a unified primitive and end-to-end training. NEO demonstrates competitive performance against modular VLMs and surpasses other native approaches on various benchmarks, despite using significantly less pre-training and SFT data.

306

14 Oct 2025

agentic-frameworks agents computer-science

A Survey of Vibe Coding with Large Language Models

Researchers from ICT, CAS and collaborating institutions present the first comprehensive survey of Vibe Coding, a novel LLM-powered software development methodology, formalizing its processes and outlining five distinct development models. The work thoroughly analyzes the ecosystem's infrastructure, revealing critical challenges in human-AI collaboration and a shift in developer roles.

500

14 Oct 2025

computer-science computer-vision-and-pattern-recognition data-curation

Detect Anything via Next Point Prediction

International Digital Economy Academy (IDEA)

Researchers at the International Digital Economy Academy (IDEA) introduced Rex-Omni, a 3-billion-parameter Multimodal Large Language Model capable of unifying various visual perception tasks. The model achieves state-of-the-art or highly competitive performance across 11 diverse benchmarks by integrating robust language understanding with precise object localization.

139

16 Oct 2025

agentic-frameworks agents computer-science

Agentic Entropy-Balanced Policy Optimization

Renmin University of China Kuaishou Technology

Researchers from Renmin University of China and Kuaishou Technology developed Agentic Entropy-Balanced Policy Optimization (AEPO), an algorithm designed to stabilize and enhance the training of web agents by dynamically balancing entropy during rollout and policy updates. AEPO achieved 47.6% Pass@1 on the GAIA benchmark and reduced tool calls by approximately half compared to other RL methods, demonstrating improved performance and training stability on complex, multi-turn tasks.

665

171

15 Oct 2025

agents computer-science artificial-intelligence

VLA-0: Building State-of-the-Art VLAs with Zero Modification

NVIDIA

NVIDIA researchers introduce VLA-0, a Vision-Language-Action model that achieves state-of-the-art robotic manipulation by directly representing robot actions as numerical text strings and fine-tuning an unmodified Vision-Language Model. This minimalist design outperforms more complex or extensively pretrained alternatives in both simulation and real-world tasks.

130

16 Oct 2025

computer-science artificial-intelligence machine-learning

RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning

Carnegie Mellon University

Shanghai Jiao Tong University

Tsinghua University Shanghai Qizhi Institute

The University of Hong Kong University of North Carolina at Chapel Hill

Researchers from a collaborative team including Shanghai Qizhi Institute and Tsinghua University developed RL-100, a unified framework for real-world robotic manipulation that achieves 100% success rates across seven challenging tasks by combining imitation learning with iterative offline and online reinforcement learning. This framework addresses the latency of multi-step diffusion policies through consistency distillation, enabling high-frequency control at up to 378 Hz while outperforming human experts in efficiency.

15 Oct 2025

computer-science computation-and-language machine-learning

BitNet Distillation

Microsoft

BitNet Distillation by Microsoft Research provides a three-stage framework to convert full-precision Large Language Models into efficient 1.58-bit models for downstream tasks. It achieves up to 10x memory savings and 2.65x CPU inference speedup, preserving task performance against full-precision models and addressing scalability challenges.

182

14 Oct 2025

computer-science computer-vision-and-pattern-recognition machine-learning

AnyUp: Universal Feature Upsampling

ETH Zurich

Google Max Planck Institute for Informatics TU Munich

AnyUp introduces a universal method for generating high-resolution feature maps from diverse low-resolution vision encoders without requiring model-specific retraining. The approach achieves state-of-the-art performance across various dense prediction tasks and generalizes robustly to unseen feature types and resolutions.

16 Oct 2025

computer-science computer-vision-and-pattern-recognition machine-learning

Learning an Image Editing Model without Image Editing Pairs

Carnegie Mellon University

Adobe

NP-Edit, developed by researchers at Carnegie Mellon University and Adobe, introduces a training paradigm for instruction-following image editing models that eliminates the need for paired input-target data. The system leverages differentiable feedback from Vision-Language Models and a distribution matching loss, achieving competitive performance and often outperforming larger models in few-step generation on benchmarks such as GEdit-Benchmark and DreamBooth.

177

16 Oct 2025

agentic-frameworks agents computer-science

Agentic Design of Compositional Machines

The Chinese University of Hong Kong The Chinese University of Hong Kong (Shenzhen)

Researchers from The Chinese University of Hong Kong developed a framework for assessing large language models' ability to design functional, physically simulated machines using a novel environment and agentic workflows. They demonstrated that while LLMs can generate functional designs, they require advanced techniques like iterative refinement and reinforcement learning to overcome limitations in spatial and physical reasoning.

16 Oct 2025

computer-science computer-vision-and-pattern-recognition inference-optimization

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Baidu Inc.PaddlePaddle Team

PaddleOCR-VL, an ultra-compact 0.9B parameter vision-language model from Baidu's PaddlePaddle Team, enables efficient and accurate multilingual document parsing by extracting structured information from complex documents in 109 languages. It achieves state-of-the-art performance on benchmarks like OmniDocBench v1.5 with an overall score of 92.56, while demonstrating 15.8% higher page throughput and consuming 40% less GPU memory compared to leading baselines.

16 Oct 2025

agentic-frameworks agents ai-for-genomics

LabOS: The AI-XR Co-Scientist That Sees and Works With Humans

University of Washington

Stanford University

Princeton University

The Ohio State University

LabOS is an AI co-scientist system, developed by researchers at Stanford and Princeton, that integrates a self-evolving AI agent with an XR-enabled physical lab interface to accelerate scientific discovery. It achieved over 90% accuracy in real-time error detection for lab procedures and successfully identified novel targets in cancer immunotherapy and cell fusion research.

135

15 Oct 2025

agents attention-mechanisms computer-science

Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization

Alibaba Group

Shanghai Jiao Tong University

Researchers from Shanghai Jiao Tong University and Alibaba Group introduce a method that leverages attention mechanisms to identify a "preplan-and-anchor" reasoning rhythm within Large Language Models. This understanding enables a fine-grained reinforcement learning approach, leading to improved performance and efficiency across diverse reasoning benchmarks.

289

14 Oct 2025

autonomous-vehicles computer-science artificial-intelligence

DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving

Chinese Academy of Sciences

DriveVLA-W0 integrates world modeling into Vision-Language-Action (VLA) models for autonomous driving, utilizing future image prediction as a dense self-supervision signal. This framework amplifies data scaling laws, enabling VLAs to achieve state-of-the-art performance and enhanced generalization by learning robust environmental representations.

There are no more papers matching your filters at the moment.

Install Browser Extension

Blog|We're hiring

alphaXiv

Explore

Login

Feedback

Dark mode

Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

The Art of Scaling Reinforcement Learning Compute for LLMs

Diffusion Transformers with Representation Autoencoders

Robot Learning: A Tutorial

RAG-Anything: All-in-One RAG Framework

Tensor Logic: The Language of AI

Reasoning with Sampling: Your Base Model is Smarter Than You Think

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

A Survey of Vibe Coding with Large Language Models

Detect Anything via Next Point Prediction

Agentic Entropy-Balanced Policy Optimization

VLA-0: Building State-of-the-Art VLAs with Zero Modification

RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning

BitNet Distillation

AnyUp: Universal Feature Upsampling

Learning an Image Editing Model without Image Editing Pairs

Agentic Design of Compositional Machines

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

LabOS: The AI-XR Co-Scientist That Sees and Works With Humans

Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization

DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving