Codestin Search App

Updated on 2025.11.12

Table of Contents

LLM Reasoning

Publish Date	Title	Authors	PDF	Code
2025-07-23	InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation	Shuai Yang et.al.	2507.17520	null
2025-07-23	MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs	Alexander R. Fabbri et.al.	2507.17476	null
2025-07-23	HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs	Zhaolin Cai et.al.	2507.17394	null
2025-07-23	Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance	Rishi Parekh et.al.	2507.17273	null
2025-07-22	Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning	Junhao Shen et.al.	2507.16814	null
2025-07-22	Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning	Ang Li et.al.	2507.16746	null
2025-07-23	WAKENLLM: Evaluating Reasoning Potential and Stability in LLMs via Fine-Grained Benchmarking	Zipeng Ling et.al.	2507.16199	null
2025-07-21	Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization	Shengchao Liu et.al.	2507.16110	null
2025-07-21	The Impact of Language Mixing on Bilingual LLM Reasoning	Yihao Li et.al.	2507.15849	null
2025-07-21	EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent	Jiaao Li et.al.	2507.15428	null
2025-07-20	LEKIA: A Framework for Architectural Alignment via Expert Knowledge Injection	Boning Zhao et.al.	2507.14944	null
2025-07-18	A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning	Licheng Liu et.al.	2507.14295	null
2025-07-18	Team of One: Cracking Complex Video QA with Model Synergy	Jun Xie et.al.	2507.13820	null
2025-07-17	The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner	Zhouqi Hua et.al.	2507.13332	null
2025-07-17	Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark	Junsu Kim et.al.	2507.13314	null
2025-07-17	HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models	Ashray Gupta et.al.	2507.13238	null
2025-07-17	Probabilistic Soundness Guarantees in LLM Reasoning Chains	Weiqiu You et.al.	2507.12948	null
2025-07-16	Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize?	Yanjian Zhang et.al.	2507.11423	null
2025-07-15	KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?	Soumadeep Saha et.al.	2507.11408	null
2025-07-15	Guiding LLM Decision-Making with Fairness Reward Models	Zara Hall et.al.	2507.11344	null
2025-07-15	MSA at ImageCLEF 2025 Multimodal Reasoning: Multilingual Multimodal Reasoning With Ensemble Vision Language Models	Seif Ahmed et.al.	2507.11114	null
2025-07-15	Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation	Yanbo Wang et.al.	2507.11001	null
2025-07-15	Modeling Understanding of Story-Based Analogies Using Large Language Models	Kalit Inani et.al.	2507.10957	null
2025-07-14	Foundation Model Driven Robotics: A Comprehensive Review	Muhammad Tayyab Khan et.al.	2507.10087	null
2025-07-13	Reframing SAR Target Recognition as Visual Reasoning: A Chain-of-Thought Dataset with Multimodal LLMs	Chaoran Li et.al.	2507.09535	null
2025-07-11	GraphRunner: A Multi-Stage Framework for Efficient and Accurate Graph-Based Retrieval	Savini Kashmira et.al.	2507.08945	null
2025-07-11	Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning	Xingguang Ji et.al.	2507.08649	null
2025-07-11	ChainEdit: Propagating Ripple Effects in LLM Knowledge Editing through Logical Rule-Guided Chains	Zilu Dong et.al.	2507.08427	null
2025-07-10	ALCo-FM: Adaptive Long-Context Foundation Model for Accident Prediction	Pinaki Prasad Guha Neogi et.al.	2507.08153	null
2025-07-10	MIRA: A Novel Framework for Fusing Modalities in Medical RAG	Jinhong Wang et.al.	2507.07902	null
2025-07-10	The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs	Jierun Chen et.al.	2507.07562	null
2025-07-10	RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning	Hongzhi Zhang et.al.	2507.07451	null
2025-07-11	StarDojo: Benchmarking Open-Ended Behaviors of Agentic Multimodal LLMs in Production-Living Simulations with Stardew Valley	Weihao Tan et.al.	2507.07445	null
2025-07-09	MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning	Chengfei Wu et.al.	2507.07297	null
2025-07-07	DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning	Shreyas Vinaya Sathyanarayana et.al.	2507.07060	null
2025-07-09	First Return, Entropy-Eliciting Explore	Tianyu Zheng et.al.	2507.07017	null
2025-07-09	Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs	Yahan Yu et.al.	2507.06999	null
2025-07-09	Are They All Good? Evaluating the Quality of CoTs in LLM-based Code Generation	Binquan Zhang et.al.	2507.06980	null
2025-07-10	Rethinking Verification for LLM Code Generation: From Generation to Testing	Zihan Ma et.al.	2507.06920	null
2025-07-09	From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization	Xinjie Chen et.al.	2507.06573	null
2025-07-13	Perception-Aware Policy Optimization for Multimodal Reasoning	Zhenhailong Wang et.al.	2507.06448	null
2025-07-08	Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling	Prahitha Movva et.al.	2507.06183	null
2025-07-10	Skywork-R1V3 Technical Report	Wei Shen et.al.	2507.06167	null
2025-07-08	KERAG_R: Knowledge-Enhanced Retrieval-Augmented Generation for Recommendation	Zeyuan Meng et.al.	2507.05863	null
2025-07-09	Measuring how changes in code readability attributes affect code quality evaluation by Large Language Models	Igor Regis da Silva Simoes et.al.	2507.05289	null
2025-07-07	Spatio-Temporal LLM: Reasoning about Environments and Actions	Haozhen Zheng et.al.	2507.05258	null
2025-07-07	Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning	Yana Wei et.al.	2507.05255	null
2025-07-07	MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction	Kaleem Ullah Qasim et.al.	2507.04893	null
2025-07-17	DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge	Wenyao Zhang et.al.	2507.04447	null
2025-07-05	CoT-Segmenter: Enhancing OOD Detection in Dense Road Scenes via Chain-of-Thought Reasoning	Jeonghyo Song et.al.	2507.03984	null
2025-07-04	Effects of structure on reasoning in instance-level Self-Discover	Sachith Gunasekara et.al.	2507.03347	null
2025-07-03	RCA Copilot: Transforming Network Data into Actionable Insights via Large Language Models	Alexander Shan et.al.	2507.03224	null
2025-07-03	Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization	Marco Simoni et.al.	2507.03051	null
2025-07-02	Look-Back: Implicit Visual Re-focusing in MLLM Reasoning	Shuo Yang et.al.	2507.03019	null
2025-07-01	From Answers to Rationales: Self-Aligning Multimodal Reasoning with Answer-Oriented Chain-of-Thought	Wentao Tan et.al.	2507.02984	null
2025-06-26	Large Language Model Agent for Modular Task Execution in Drug Discovery	Janghoon Ock et.al.	2507.02925	null
2025-07-03	MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs	Purbesh Mitra et.al.	2507.02851	null
2025-07-03	Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation	Jungkoo Kang et.al.	2507.02253	null
2025-07-02	Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs	Mohammad Ali Alomrani et.al.	2507.02076	null
2025-07-02	GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning	GLM-V Team et.al.	2507.01006	null
2025-07-01	HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning	Zhi Jing et.al.	2507.00833	null
2025-07-01	Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning	Maggie Huan et.al.	2507.00432	null
2025-07-01	Causal Prompting for Implicit Sentiment Analysis with Large Language Models	Jing Ren et.al.	2507.00389	null
2025-06-22	TalentMine: LLM-Based Extraction and Question-Answering from Multimodal Talent Tables	Varun Mannam et.al.	2507.00041	null
2025-07-03	Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers	Zhaochen Su et.al.	2506.23918	null
2025-06-30	Interactive Reasoning: Visualizing and Controlling Chain-of-Thought Reasoning in Large Language Models	Rock Yuren Pang et.al.	2506.23678	null
2025-06-30	MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI	Huanjin Yao et.al.	2506.23563	null
2025-06-29	Are Large Language Models Capable of Deep Relational Reasoning? Insights from DeepSeek-R1 and Benchmark Comparisons	Chi Chiu So et.al.	2506.23128	null
2025-06-29	Decoding Memes: Benchmarking Narrative Role Classification across Multilingual and Multimodal Models	Shivam Sharma et.al.	2506.23122	null
2025-06-28	MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning	Yulun Jiang et.al.	2506.22992	null
2025-06-26	APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization	Minjie Hong et.al.	2506.21655	null
2025-06-24	FrankenBot: Brain-Morphic Modular Orchestration for Robotic Manipulation with Vision-Language Models	Shiyi Wang et.al.	2506.21627	null
2025-06-30	FinEval-KR: A Financial Domain Evaluation Framework for Large Language Models' Knowledge and Reasoning	Shaoyu Dou et.al.	2506.21591	null
2025-06-11	Debunk and Infer: Multimodal Fake News Detection via Diffusion-Generated Evidence and LLM Reasoning	Kaiying Yan et.al.	2506.21557	null
2025-06-26	HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context	Qize Yang et.al.	2506.21277	null
2025-06-26	Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?	Haoang Chi et.al.	2506.21215	null
2025-06-25	MultiFinRAG: An Optimized Multimodal Retrieval-Augmented Generation (RAG) Framework for Financial Question Answering	Chinmay Gondhalekar et.al.	2506.20821	null
2025-06-25	Generative AI for Vulnerability Detection in 6G Wireless Networks: Advances, Case Study, and Future Directions	Shuo Yang et.al.	2506.20488	null
2025-06-24	KnowMap: Efficient Knowledge-Driven Task Adaptation for LLMs	Kelin Fu et.al.	2506.19527	null
2025-06-24	MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models	Yinan Xia et.al.	2506.19257	null
2025-06-25	Thought Anchors: Which LLM Reasoning Steps Matter?	Paul C. Bogdan et.al.	2506.19143	null
2025-06-23	Finding Clustering Algorithms in the Transformer Architecture	Kenneth L. Clarkson et.al.	2506.19125	null
2025-06-23	Human-Aligned Faithfulness in Toxicity Explanations of LLMs	Ramaravind K. Mothilal et.al.	2506.19113	null
2025-06-23	Baba is LLM: Reasoning in a Game with Dynamic Rules	Fien van Wetten et.al.	2506.19095	null
2025-06-23	OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization	Yiyou Sun et.al.	2506.18880	null
2025-06-24	ReDit: Reward Dithering for Improved LLM Policy Optimization	Chenxing Wei et.al.	2506.18631	null
2025-06-22	Adapting Vision-Language Models for Evaluating World Models	Mariya Hendriksen et.al.	2506.17967	null
2025-06-20	Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling?	Mingyuan Wu et.al.	2506.17417	null
2025-06-14	CORONA: A Coarse-to-Fine Framework for Graph-based Recommendation with Large Language Models	Junze Chen et.al.	2506.17281	null
2025-06-25	No Free Lunch: Rethinking Internal Feedback for LLM Reasoning	Yanzhi Zhang et.al.	2506.17219	null
2025-06-20	Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens	Zeyuan Yang et.al.	2506.17218	link
2025-06-20	MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation	Shoubin Yu et.al.	2506.17113	link
2025-06-20	MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models	Xiaolong Wang et.al.	2506.17046	null
2025-06-20	LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation	Tongtian Yue et.al.	2506.16691	null
2025-06-19	GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View	Fenghua Cheng et.al.	2506.16633	null
2025-06-19	History-Augmented Vision-Language Models for Frontier-Based Zero-Shot Object Navigation	Mobin Habibpour et.al.	2506.16623	null
2025-06-19	How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?	Giuseppe Lando et.al.	2506.16450	null
2025-06-19	TrajSceneLLM: A Multimodal Perspective on Semantic GPS Trajectory Analysis	Chunhou Ji et.al.	2506.16401	link
2025-07-17	SHREC: A Framework for Advancing Next-Generation Computational Phenotyping with Large Language Models	Sarah Pungitore et.al.	2506.16359	null
2025-06-19	GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning	Yi Chen et.al.	2506.16141	link
2025-06-23	SLR: An Automated Synthesis Framework for Scalable Logical Reasoning	Lukas Helff et.al.	2506.15787	null
2025-06-18	CC-LEARN: Cohort-based Consistency Learning	Xiao Ye et.al.	2506.15662	null
2025-06-18	MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering	Xinqi Fan et.al.	2506.15298	null
2025-06-17	Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective	Zhoujun Cheng et.al.	2506.14965	link
2025-06-17	Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework	Mohna Chakraborty et.al.	2506.14948	null
2025-06-17	PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning	Yizhen Zhang et.al.	2506.14907	link
2025-06-12	FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models	Yao Zhang et.al.	2506.14824	null
2025-06-17	RadFabric: Agentic AI System with Reasoning Capability for Radiology	Wenting Chen et.al.	2506.14142	null
2025-06-17	A Hierarchical Test Platform for Vision Language Model (VLM)-Integrated Real-World Autonomous Driving	Yupeng Zhou et.al.	2506.14100	null
2025-06-16	How Does LLM Reasoning Work for Code? A Survey and a Call to Action	Ira Ceka et.al.	2506.13932	null
2025-06-16	VL-GenRM: Enhancing Vision-Language Verification via Vision Experts and Iterative Training	Jipeng Zhang et.al.	2506.13888	null
2025-06-16	LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning	Miho Koda et.al.	2506.13841	link
2025-06-16	Steering LLM Thinking with Budget Guidance	Junyan Li et.al.	2506.13752	link
2025-06-16	Decompositional Reasoning for Graph Retrieval with Large Language Models	Valentin Six et.al.	2506.13380	null
2025-07-10	Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models	James Chua et.al.	2506.13206	null
2025-06-16	FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design	Kai Lan et.al.	2506.13066	null
2025-06-26	Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning	Haibo Qiu et.al.	2506.13056	null
2025-06-20	Domain Specific Benchmarks for Evaluating Multimodal Large Language Models	Khizar Anjum et.al.	2506.12958	null
2025-06-15	SciDA: Scientific Dynamic Assessor of LLMs	Junting Zhou et.al.	2506.12909	null
2025-06-14	Graph of Verification: Structured Verification of LLM Reasoning with Directed Acyclic Graphs	Jiwei Fang et.al.	2506.12509	null
2025-06-14	Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics	Asifullah khan et.al.	2506.12365	null
2025-06-22	MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval	Mingjun Xu et.al.	2506.12364	null
2025-06-13	Tracing LLM Reasoning Processes with Strategic Games: A Framework for Planning, Revision, and Resource-Constrained Decision Making	Xiaopeng Yuan et.al.	2506.12012	null
2025-06-22	How Visual Representations Map to Language Feature Space in Multimodal LLMs	Constantin Venhoff et.al.	2506.11976	null
2025-06-13	LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?	Zihan Zheng et.al.	2506.11928	null
2025-06-13	EasyARC: Evaluating Vision Language Models on True Visual Reasoning	Mert Unsal et.al.	2506.11595	null
2025-06-13	VFaith: Do Large Multimodal Models Really Reason on Seen Images Rather than Previous Memories?	Jiachen Yu et.al.	2506.11571	null
2025-07-04	LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment	Shipeng Li et.al.	2506.11480	null
2025-06-09	KokushiMD-10: Benchmark for Evaluating Large Language Models on Ten Japanese National Healthcare Licensing Examinations	Junyu Liu et.al.	2506.11114	null
2025-06-13	MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning	Yuxuan Luo et.al.	2506.10963	null
2025-06-12	Improving Named Entity Transcription with Contextual LLM-based Revision	Viet Anh Trinh et.al.	2506.10779	null
2025-06-12	NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors	Numaan Naeem et.al.	2506.10627	link
2025-06-25	Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning	Yuhao Zhou et.al.	2506.10521	null
2025-06-12	Reliable Reasoning Path: Distilling Effective Guidance for LLM Reasoning with Knowledge Graphs	Yilin Xiao et.al.	2506.10508	null
2025-06-16	Specification and Evaluation of Multi-Agent LLM Systems -- Prototype and Cybersecurity Applications	Felix Härer et.al.	2506.10467	link
2025-06-12	Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty	Zehui Ling et.al.	2506.10446	null
2025-06-12	Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts	Zaijing Li et.al.	2506.10357	null
2025-06-12	Code Execution as Grounded Supervision for LLM Reasoning	Dongwon Jung et.al.	2506.10343	link
2025-06-11	ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering	Caijun Jia et.al.	2506.10116	null
2025-06-19	Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing	Junfei Wu et.al.	2506.09965	link
2025-06-11	Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning	Xiangning Yu et.al.	2506.09853	null
2025-06-11	AD^2-Bench: A Hierarchical CoT Benchmark for MLLM in Autonomous Driving under Adverse Conditions	Zhaoyang Wei et.al.	2506.09557	null
2025-06-11	Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models	Shuai Wang et.al.	2506.09532	null
2025-06-13	e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs	Amrith Setlur et.al.	2506.09026	null
2025-06-10	Learning to Reason Across Parallel Samples for LLM Reasoning	Jianing Qi et.al.	2506.09014	null
2025-06-10	SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning	Xiao Liang et.al.	2506.08989	link
2025-06-10	Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning	Kongcheng Zhang et.al.	2506.08745	link
2025-06-10	Safe and Economical UAV Trajectory Planning in Low-Altitude Airspace: A Hybrid DRL-LLM Approach with Compliance Awareness	Yanwei Gong et.al.	2506.08532	null
2025-06-10	Reinforce LLM Reasoning through Multi-Agent Reflection	Yurun Yuan et.al.	2506.08379	null
2025-06-18	Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency	Chenlong Wang et.al.	2506.08343	null
2025-06-09	From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium	Xie Yi et.al.	2506.08292	link
2025-06-09	Automatic Generation of Inference Making Questions for Reading Comprehension Assessments	Wanjing Anya Ma et.al.	2506.08260	link
2025-06-12	Play to Generalize: Learning to Reason Through Game Play	Yunfei Xie et.al.	2506.08011	link
2025-06-11	Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations	Yizhen Li et.al.	2506.07943	null
2025-06-09	WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning	Jie Yang et.al.	2506.07905	link
2025-06-10	Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation	Jiaxiang Chen et.al.	2506.07820	null
2025-06-11	AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking	Silin Gao et.al.	2506.07751	null
2025-06-10	Synthesis by Design: Controlled Data Generation via Structural Guidance	Lei Xu et.al.	2506.07664	null
2025-06-11	SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems	Peiran Li et.al.	2506.07564	null
2025-06-09	SELT: Self-Evaluation Tree Search for LLMs with Task Decomposition	Mengsong Wu et.al.	2506.07557	null
2025-06-09	Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions	Lu Ma et.al.	2506.07527	link
2025-06-11	MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models	Philip R. Liu et.al.	2506.07400	link
2025-06-09	Improving LLM Reasoning through Interpretable Role-Playing Steering	Anyi Wang et.al.	2506.07335	null
2025-06-08	Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs	Roy Eisenstadt et.al.	2506.07240	null
2025-06-08	Advancing Multimodal Reasoning Capabilities of Multimodal Large Language Models via Visual Perception Reward	Tong Xiao et.al.	2506.07218	null
2025-06-08	Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs	Wenrui Zhou et.al.	2506.07180	null
2025-06-08	Learning Compact Vision Tokens for Efficient Large Multimodal Models	Hao Tang et.al.	2506.07138	link
2025-06-08	Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models	Samir Abdaljalil et.al.	2506.07106	null
2025-06-12	Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation	Jaechul Roh et.al.	2506.06971	link
2025-06-07	Boosting LLM Reasoning via Spontaneous Self-Correction	Xutong Zhao et.al.	2506.06923	null
2025-06-07	Harnessing Vision-Language Models for Time Series Anomaly Detection	Zelin He et.al.	2506.06836	null
2025-06-07	VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs	Can Li et.al.	2506.06727	null
2025-06-07	Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning	Shubham Parashar et.al.	2506.06632	null
2025-06-14	RARL: Improving Medical VLM Reasoning and Generalization with Reinforcement Learning and LoRA under Data and Hardware Constraints	Tan-Hanh Pham et.al.	2506.06600	null
2025-06-06	SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation	Yanwei Ren et.al.	2506.06470	null
2025-06-06	Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance	Ruizhong Qiu et.al.	2506.06444	link
2025-06-06	PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts	Hengzhi Li et.al.	2506.06211	null
2025-06-06	Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router	Chenyang Shao et.al.	2506.05901	null
2025-06-06	BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions	Saptarshi Sengupta et.al.	2506.05766	null
2025-06-05	MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning	Zikui Cai et.al.	2506.05523	null
2025-06-05	DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning	Tanmay Parekh et.al.	2506.05128	null
2025-06-09	Reason-to-Recommend: Using Interaction-of-Thought Reasoning to Enhance LLM Recommendation	Keyu Zhao et.al.	2506.05069	null
2025-06-12	Context Is Not Comprehension	Alex Pan et.al.	2506.04907	null
2025-06-05	ICPC-Eval: Probing the Frontiers of LLM Reasoning with Competitive Programming Contests	Shiyi Xu et.al.	2506.04894	link
2025-06-10	Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design	Lin Sun et.al.	2506.04734	null
2025-06-05	Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation	Yuyang Wanyan et.al.	2506.04614	null
2025-06-05	MuSciClaims: Multimodal Scientific Claim Verification	Yash Kumar Lal et.al.	2506.04585	null
2025-06-04	Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences	Hadi Hosseini et.al.	2506.04478	null
2025-06-04	RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought	Yi Lu et.al.	2506.04277	null
2025-06-04	Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning	Shuang Chen et.al.	2506.04207	null
2025-06-04	R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning	Qingfei Zhao et.al.	2506.04185	link
2025-06-04	MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos	Kejian Zhu et.al.	2506.04141	null
2025-06-04	Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM Reasoning	Junqi Gao et.al.	2506.03939	link
2025-06-04	Reason from Future: Reverse Thought Chain Enhances LLM Reasoning	Yinlong Xu et.al.	2506.03673	null
2025-06-16	Zero-Shot Temporal Interaction Localization for Egocentric Videos	Erhang Zhang et.al.	2506.03662	link
2025-06-04	MiMo-VL Technical Report	Xiaomi LLM-Core Team et.al.	2506.03569	link
2025-06-04	Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback	Xiaoying Zhang et.al.	2506.03106	null
2025-06-04	Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning	Chen Qian et.al.	2506.02867	link
2025-06-14	TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression	Zhong-Zhi Li et.al.	2506.02678	link
2025-06-03	A Smart Multimodal Healthcare Copilot with Powerful LLM Reasoning	Xuejiao Zhao et.al.	2506.02470	link
2025-06-02	Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts	Haizhong Zheng et.al.	2506.02177	null
2025-06-02	Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains	Juncheng Wu et.al.	2506.02126	null
2025-06-02	Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning	Shenzhi Wang et.al.	2506.01939	null
2025-06-02	Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books	Chen Zhang et.al.	2506.01796	null
2025-06-02	R2SM: Referring and Reasoning for Selective Masks	Yu-Lin Shih et.al.	2506.01795	null
2025-06-02	SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning	Zhongwei Wan et.al.	2506.01713	null
2025-06-02	K12Vista: Exploring the Boundaries of MLLMs in K-12 Education	Chong Li et.al.	2506.01676	null
2025-06-02	EvolveNav: Self-Improving Embodied Reasoning for LLM-Based Vision-Language Navigation	Bingqian Lin et.al.	2506.01551	null
2025-06-02	Compiler Optimization via LLM Reasoning for Efficient Model Serving	Sujun Tang et.al.	2506.01374	null
2025-06-02	The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning	Xinyu Zhu et.al.	2506.01347	link
2025-06-01	GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking	Yufei Zhan et.al.	2506.01078	link
2025-06-01	Enhancing LLM Reasoning for Time Series Classification by Tailored Thinking and Fused Decision	Jiahui Zhou et.al.	2506.00807	null
2025-05-31	Beyond Context to Cognitive Appraisal: Emotion Reasoning as a Theory of Mind Benchmark for Large Language Models	Gerard Christopher Yeo et.al.	2506.00334	null
2025-05-30	Tournament of Prompts: Evolving LLM Instructions Through Structured Debates and Elo Ratings	Anirudh Nair et.al.	2506.00178	null
2025-05-30	Werewolf: A Straightforward Game Framework with TTS for Improved User Engagement	Qihui Fan et.al.	2506.00160	null
2025-05-28	Rethinking Hybrid Retrieval: When Small Embeddings and LLM Re-ranking Beat Bigger Models	Arjun Rao et.al.	2506.00049	null
2025-05-30	Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents	Yaxin Luo et.al.	2505.24878	link
2025-05-30	Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks	Tajamul Ashraf et.al.	2505.24876	link
2025-05-30	Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning	Shuyao Xu et.al.	2505.24850	link
2025-05-30	Random Rule Forest (RRF): Interpretable Ensembles of LLM-Generated Questions for Predicting Startup Success	Ben Griffin et.al.	2505.24622	null
2025-06-10	Can Slow-thinking LLMs Reason Over Time? Empirical Studies in Time Series Forecasting	Jiahao Wang et.al.	2505.24511	link
2025-05-30	Reason-SVG: Hybrid Reward RL for Aha-Moments in Vector Graphics Generation	Ximing Xing et.al.	2505.24499	null
2025-05-30	How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning	Hongyi James Cai et.al.	2505.24273	null
2025-06-02	MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM	Bowen Dong et.al.	2505.24238	null
2025-05-30	Semi-structured LLM Reasoners Can Be Rigorously Audited	Jixuan Leng et.al.	2505.24217	null
2025-05-30	HardTests: Synthesizing High-Quality Test Cases for LLM Coding	Zhongmou He et.al.	2505.24098	null
2025-05-29	Preemptive Hallucination Reduction: An Input-Level Approach for Multimodal Language Model	Nokimul Hasan Arif et.al.	2505.24007	null
2025-05-29	VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL	Yichen Feng et.al.	2505.23977	null
2025-05-29	Infi-Med: Low-Resource Medical MLLMs with Robust Reasoning Evaluation	Zeyu Liu et.al.	2505.23867	null
2025-05-29	Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought	Yunze Man et.al.	2505.23766	null
2025-06-03	DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning	Ziyin Zhang et.al.	2505.23754	link
2025-05-29	Don't Take the Premise for Granted: Evaluating the Premise Critique Ability of Large Language Models	Jinzhe Li et.al.	2505.23715	link
2025-05-29	Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation	Ziling Cheng et.al.	2505.23701	null
2025-05-29	Probability-Consistent Preference Optimization for Enhanced LLM Reasoning	Yunqiao Yang et.al.	2505.23540	link
2025-05-29	Diversity-Aware Policy Optimization for Large Language Model Reasoning	Jian Yao et.al.	2505.23433	null
2025-05-29	GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning	Jusheng Zhang et.al.	2505.23399	null
2025-06-05	MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence Calibration	Zhitao He et.al.	2505.23224	link
2025-05-29	Elicit and Enhance: Advancing Multimodal Reasoning in Medical Scenarios	Linjie Mu et.al.	2505.23118	null
2025-06-06	Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models	Zeyu Liu et.al.	2505.23091	null
2025-05-29	Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction	Guangyi Liu et.al.	2505.23034	null
2025-05-29	StrucSum: Graph-Structured Reasoning for Long Document Extractive Summarization with LLMs	Haohan Yuan et.al.	2505.22950	null
2025-05-28	VidText: Towards Comprehensive Evaluation for Video Text Understanding	Zhoufaran Yang et.al.	2505.22810	link
2025-05-28	Decomposing Elements of Problem Solving: What "Math" Does RL Teach?	Tian Qin et.al.	2505.22756	link
2025-05-28	AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models	Feng Luo et.al.	2505.22662	null
2025-05-28	SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning	Jiaqi Huang et.al.	2505.22596	null
2025-05-28	ClaimPKG: Enhancing Claim Verification via Pseudo-Subgraph Generation with Lightweight Specialized LLM	Hoang Pham et.al.	2505.22552	null
2025-05-28	Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO	Lai Wei et.al.	2505.22453	link
2025-05-29	Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition	Hanting Chen et.al.	2505.22375	null
2025-05-28	Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start	Lai Wei et.al.	2505.22334	link
2025-05-28	If Pigs Could Fly... Can LLMs Logically Reason Through Counterfactuals?	Ishwar B Balappanawar et.al.	2505.22318	null
2025-05-28	Rethinking the Unsolvable: When In-Context Search Meets Test-Time Scaling	Fanzeng Xia et.al.	2505.22290	null
2025-05-28	What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning	Gangwei Jiang et.al.	2505.22148	null
2025-05-28	OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning	Shifang Zhao et.al.	2505.22039	null
2025-05-27	Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation	Tharindu Kumarage et.al.	2505.21784	null
2025-05-27	Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models	Sohyun An et.al.	2505.21765	null
2025-05-27	R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing	Tianyu Fu et.al.	2505.21600	link
2025-05-31	More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models	Chengzhi Liu et.al.	2505.21523	null
2025-05-27	Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?	Junhao Cheng et.al.	2505.21374	link
2025-05-27	MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs	Jiakang Yuan et.al.	2505.21327	null
2025-05-27	Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning	Mingyang Song et.al.	2505.21178	null
2025-05-27	DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response	Junjue Wang et.al.	2505.21089	null
2025-06-04	LLMs Think, But Not In Your Flow: Reasoning-Level Personalization for Black-Box Large Language Models	Jieyong Kim et.al.	2505.21082	null
2025-05-27	Def-DTS: Deductive Reasoning for Open-domain Dialogue Topic Segmentation	Seungmin Lee et.al.	2505.21033	null
2025-05-27	Reason-Align-Respond: Aligning LLM Reasoning with Knowledge Graphs for KGQA	Xiangqing Shen et.al.	2505.20971	null
2025-05-28	VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models	Kui Wu et.al.	2505.20718	null
2025-05-27	Accelerating RL for LLM Reasoning with Optimal Advantage Regression	Kianté Brantley et.al.	2505.20686	null
2025-05-27	Can Past Experience Accelerate LLM Reasoning?	Bo Pan et.al.	2505.20643	null
2025-05-26	Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning	Shenao Zhang et.al.	2505.20561	null
2025-05-26	Enhancing Logical Reasoning in Language Models via Symbolically-Guided Monte Carlo Process Supervision	Xingwei Tan et.al.	2505.20415	null
2025-05-23	Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence	Amirhosein Ghasemabadi et.al.	2505.20325	null
2025-05-26	KnowTrace: Bootstrapping Iterative Retrieval-Augmented Generation with Structured Knowledge Tracing	Rui Li et.al.	2505.20245	link
2025-06-04	DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning	Qi Cao et.al.	2505.20241	null
2025-05-26	THiNK: Can Large Language Models Think-aloud?	Yongan Yu et.al.	2505.20184	link
2025-05-26	Visual Abstract Thinking Empowers Multimodal Reasoning	Dairu Liu et.al.	2505.20164	link
2025-05-26	Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning	Jaehun Jung et.al.	2505.20161	null
2025-05-26	Agentic 3D Scene Generation with Spatially Contextualized VLMs	Xinhang Liu et.al.	2505.20129	null
2025-05-26	REARANK: Reasoning Re-ranking Agent via Reinforcement Learning	Le Zhang et.al.	2505.20046	link
2025-05-26	An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning	Andrew Zamai et.al.	2505.19954	null
2025-05-26	Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval	Rong-Cheng Tu et.al.	2505.19952	null
2025-05-26	Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions	Siqi Kou et.al.	2505.19949	null
2025-05-26	HS-STAR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation	Feng Xiong et.al.	2505.19866	null
2025-05-26	Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective	Junnan Liu et.al.	2505.19815	link
2025-05-26	MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval	Rong-Cheng Tu et.al.	2505.19707	null
2025-05-26	Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning	Minheng Ni et.al.	2505.19702	null
2025-05-26	Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models	Lachlan McGinness et.al.	2505.19676	null
2025-05-26	Interleaved Reasoning for Large Language Models via Reinforcement Learning	Roy Xie et.al.	2505.19640	null
2025-05-26	Self-Reflective Planning with Knowledge Graphs: Enhancing LLM Reasoning Reliability for Question Answering	Jiajun Zhu et.al.	2505.19410	null
2025-05-25	SituatedThinker: Grounding LLM Reasoning with Real-World through Situated Thinking	Junnan Liu et.al.	2505.19300	link
2025-05-28	VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use	Mingyuan Wu et.al.	2505.19255	null
2025-05-25	ASPO: Adaptive Sentence-Level Preference Optimization for Fine-Grained Multimodal Reasoning	Yeyuan Wang et.al.	2505.19100	null
2025-05-30	SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning	Kun Xiang et.al.	2505.19099	link
2025-05-25	SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards	Chuming Shen et.al.	2505.19094	link
2025-05-25	ReFineVLA: Reasoning-Aware Teacher-Guided Transfer Fine-Tuning	Tuan Van Vo et.al.	2505.19080	null
2025-05-25	Can Large Language Models Infer Causal Relationships from Real-World Text?	Ryan Saklad et.al.	2505.18931	null
2025-05-24	Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation	Jiwan Chung et.al.	2505.18842	null
2025-05-24	Enhancing LLMs' Reasoning-Intensive Multimedia Search Capabilities through Fine-Tuning and Reinforcement Learning	Jinzheng Li et.al.	2505.18831	null
2025-05-24	How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark	Minglai Yang et.al.	2505.18761	link
2025-05-24	GainRAG: Preference Alignment in Retrieval-Augmented Generation through Gain Signal Synthesis	Yi Jiang et.al.	2505.18710	link
2025-05-24	Steering LLM Reasoning Through Bias-Only Adaptation	Viacheslav Sinii et.al.	2505.18706	null
2025-05-31	ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation	Zhen Li et.al.	2505.18668	link
2025-05-24	Unraveling Misinformation Propagation in LLM Reasoning	Yiyang Feng et.al.	2505.18555	link
2025-05-23	One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration	Jinbang Huang et.al.	2505.18382	null
2025-05-23	Seeing Beyond Words: MatVQA for Challenging Visual-Scientific Reasoning in Materials Science	Sifan Wu et.al.	2505.18319	null
2025-05-23	Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL	Che Liu et.al.	2505.17952	null
2025-05-23	Stepwise Reasoning Checkpoint Analysis: A Test Time Scaling Method to Enhance LLMs' Reasoning	Zezhong Wang et.al.	2505.17829	null
2025-05-23	Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning	Michael Hassid et.al.	2505.17813	null
2025-05-23	Towards General Continuous Memory for Vision-Language Models	Wenyi Wu et.al.	2505.17670	null
2025-05-23	EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications	Ancheng Xu et.al.	2505.17654	null
2025-05-29	Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective	Deyang Kong et.al.	2505.17652	null
2025-05-27	Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration	Jingtong Gao et.al.	2505.17621	null
2025-05-23	MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation	Jihan Yao et.al.	2505.17613	null
2025-05-23	On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning	Yifan Zhang et.al.	2505.17508	null
2025-05-23	From Reasoning to Generalization: Knowledge-Augmented LLMs for ARC Benchmark	Chao Lei et.al.	2505.17482	null
2025-05-23	Hydra: Structured Cross-Source Enhanced Large Language Model Reasoning	Xingyu Tan et.al.	2505.17464	null
2025-05-23	LeTS: Learning to Think-and-Search via Process-and-Outcome Reward Hybridization	Qi Zhang et.al.	2505.17447	null
2025-05-23	Misaligning Reasoning with Answers -- A Framework for Assessing LLM CoT Robustness	Enyi Jiang et.al.	2505.17406	null
2025-05-22	LiloDriver: A Lifelong Learning Framework for Closed-loop Motion Planning in Long-tail Autonomous Driving Scenarios	Huaiyuan Yao et.al.	2505.17209	link
2025-05-21	NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation	Weiming Wu et.al.	2505.17121	null
2025-05-21	Systematic Evaluation of Machine-Generated Reasoning and PHQ-9 Labeling for Depression Detection Using Large Language Models	Zongru Shao et.al.	2505.17119	null
2025-05-21	Swarm Intelligence Enhanced Reasoning: A Density-Driven Framework for LLM-Based Multi-Agent Optimization	Ying Zhu et.al.	2505.17115	null
2025-05-21	CAMA: Enhancing Multimodal In-Context Learning with Context-Aware Modulated Attention	Yanshu Li et.al.	2505.17097	null
2025-05-22	ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark	Sara Ghaboura et.al.	2505.17021	link
2025-05-22	SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward	Kaixuan Fan et.al.	2505.17018	link
2025-05-22	$\text{R}^2\text{ec}$ : Towards Large Recommender Models with Reasoning	Runyang You et.al.	2505.16994	link
2025-05-22	Don't "Overthink" Passage Reranking: Is Reasoning Truly Necessary?	Nour Jedidi et.al.	2505.16886	null
2025-05-26	DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation	Bowen Zheng et.al.	2505.16810	null
2025-05-22	Two-way Evidence self-Alignment based Dual-Gated Reasoning Enhancement	Kexin Zhang et.al.	2505.16806	null
2025-05-22	Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning	Xinghao Chen et.al.	2505.16782	link
2025-05-22	Collaboration among Multiple Large Language Models for Medical Question Answering	Kexin Shang et.al.	2505.16648	null
2025-05-27	Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains	Wenhui Tan et.al.	2505.16552	null
2025-05-22	SATURN: SAT-based Reinforcement Learning to Unleash Language Model Reasoning	Huanyu Liu et.al.	2505.16368	link
2025-05-22	EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning	Jiawei Liu et.al.	2505.16312	link
2025-05-22	Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA	Rishabh Maheshwary et.al.	2505.16293	null
2025-05-22	Training-Free Reasoning and Reflection in MLLMs	Hongchen Wei et.al.	2505.16151	null
2025-05-22	Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning	Shicheng Xu et.al.	2505.16142	null
2025-05-26	Abstractions-of-Thought: Intermediate Representations for LLM Reasoning in Hardware Design	Matthew DeLorenzo et.al.	2505.15873	null
2025-05-21	LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models	Ruilin Yao et.al.	2505.15616	null
2025-05-21	Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL	Xintong Zhang et.al.	2505.15436	null
2025-05-21	Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning	Yurun Yuan et.al.	2505.15311	null
2025-05-21	Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs	Jie Ma et.al.	2505.15210	link
2025-05-21	Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning	Jinghui Lu et.al.	2505.15154	null
2025-05-21	The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning	Shivam Agarwal et.al.	2505.15134	link
2025-05-21	Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision	Eric Hanchen Jiang et.al.	2505.14999	null
2025-05-20	Self-Evolving Curriculum for LLM Reasoning	Xiaoyin Chen et.al.	2505.14970	null
2025-05-20	MORALISE: A Structured Benchmark for Moral Alignment in Visual Language Models	Xiao Lin et.al.	2505.14728	null
2025-05-18	KGAlign: Joint Semantic-Structural Knowledge Encoding for Multimodal Fake News Detection	Tuan-Vinh La et.al.	2505.14714	link
2025-05-23	Emerging Properties in Unified Multimodal Pretraining	Chaorui Deng et.al.	2505.14683	null
2025-05-27	General-Reasoner: Advancing LLM Reasoning Across All Domains	Xueguang Ma et.al.	2505.14652	null
2025-05-22	TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning	Zhangchen Xu et.al.	2505.14625	link
2025-05-20	SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas	Anjiang Wei et.al.	2505.14615	null
2025-05-21	KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation	Jiajun Shi et.al.	2505.14552	link
2025-05-23	Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning	Zhaohui Yang et.al.	2505.14403	null
2025-05-26	DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning	Ziwei Zheng et.al.	2505.14362	link
2025-05-20	Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning	Minwu Kim et.al.	2505.14216	link
2025-05-20	RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning	Qianyue Hao et.al.	2505.14140	null
2025-05-20	Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning	Jingqi Tong et.al.	2505.13886	link
2025-05-20	Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning	Jiwon Song et.al.	2505.13866	link
2025-05-18	RAGXplain: From Explainable Evaluation to Actionable Guidance of RAG Pipelines	Dvir Cohen et.al.	2505.13538	null
2025-05-16	IRLBench: A Multi-modal, Culturally Grounded, Parallel Irish-English Benchmark for Open-Ended LLM Reasoning Evaluation	Khanh-Tung Tran et.al.	2505.13498	link
2025-05-19	MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision	Lingxiao Du et.al.	2505.13427	link
2025-05-19	MR. Judge: Multimodal Reasoner as a Judge	Renjie Pi et.al.	2505.13403	null
2025-05-20	Sense and Sensitivity: Examining the Influence of Semantic Recall on Long Context Code Reasoning	Adam Štorek et.al.	2505.13353	null
2025-05-19	Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately	Yuhang Wang et.al.	2505.13326	null
2025-05-19	Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space	Hengli Li et.al.	2505.13308	link
2025-05-19	RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning	Qiguang Chen et.al.	2505.13307	link
2025-05-19	Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning	Mingrui Chen et.al.	2505.13261	null
2025-05-23	SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information	Chih-Kai Yang et.al.	2505.13237	link
2025-05-21	Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model	Yong Ren et.al.	2505.13062	null
2025-05-25	Fractured Chain-of-Thought Reasoning	Baohao Liao et.al.	2505.12992	null
2025-05-19	DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management	Xuerui Su et.al.	2505.12951	null
2025-05-19	The Traitors: Deception and Trust in Multi-Agent Language Model Simulations	Pedro M. P. Curvo et.al.	2505.12923	link
2025-05-19	AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning	Kai Zhang et.al.	2505.12782	null
2025-05-19	Incentivizing Multimodal Reasoning in Large Models for Direct Robot Manipulation	Weiliang Tang et.al.	2505.12744	null
2025-05-18	Reasoning-CV: Fine-tuning Powerful Reasoning LLMs for Knowledge-Assisted Claim Verification	Zhi Zheng et.al.	2505.12348	link
2025-05-18	LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?	Maoyuan Ye et.al.	2505.12307	link
2025-05-18	MMS-VPR: Multimodal Street-Level Visual Place Recognition Dataset and Benchmark	Yiwei Ou et.al.	2505.12254	null
2025-05-17	Do Code LLMs Do Static Analysis?	Chia-Yi Su et.al.	2505.12118	link
2025-05-17	Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier	Jianyuan Zhong et.al.	2505.11966	null
2025-05-22	PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging	Quoc-Huy Trinh et.al.	2505.11872	null
2025-05-17	Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning	Yansong Ning et.al.	2505.11827	link
2025-05-16	REMOR: Automated Peer Review Generation with LLM Reasoning and Multi-Objective Reinforcement Learning	Pawin Taechoyotin et.al.	2505.11718	null
2025-05-16	Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner	Wenchuan Zhang et.al.	2505.11404	link
2025-05-23	SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning	Zheng Li et.al.	2505.11274	null
2025-05-24	Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans	Yansheng Qiu et.al.	2505.11141	null
2025-05-16	Scaling Reasoning can Improve Factuality in Large Language Models	Mike Zhang et.al.	2505.11140	link
2025-05-16	Humans expect rationality and cooperation from LLM opponents in strategic games	Darija Barak et.al.	2505.11011	null
2025-05-16	Vaiage: A Multi-Agent Solution to Personalized Travel Planning	Binwen Liu et.al.	2505.10922	null
2025-05-15	Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning	Yoichi Ishibashi et.al.	2505.10182	null
2025-05-15	XRAG: Cross-lingual Retrieval-Augmented Generation	Wei Liu et.al.	2505.10089	null
2025-05-13	The Truth Becomes Clearer Through Debate! Multi-Agent Systems with Large Language Models Unmask Fake News	Yuhan Liu et.al.	2505.08532	null
2025-05-13	Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation	Enci Zhang et.al.	2505.08364	null
2025-05-12	KAQG: A Knowledge-Graph-Enhanced RAG for Difficulty-Controlled Question Generation	Ching Han Chen et.al.	2505.07618	null
2025-05-12	How well do LLMs reason over tabular data, really?	Cornelius Wolff et.al.	2505.07453	null
2025-05-12	Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning	Xiaokun Wang et.al.	2505.07263	null
2025-05-12	Critique Before Thinking: Mitigating Hallucination through Rationale-Augmented Instruction Tuning	Zexian Yang et.al.	2505.07172	null
2025-05-11	Seed1.5-VL Technical Report	Dong Guo et.al.	2505.07062	null
2025-05-17	Bridging AI and Carbon Capture: A Dataset for LLMs in Ionic Liquids and CBE Research	Gaurab Sarkar et.al.	2505.06964	link
2025-05-11	UniDiffGrasp: A Unified Framework Integrating VLM Reasoning and VLM-Guided Part Diffusion for Open-Vocabulary Constrained Grasping with Dual Arms	Xueyang Guo et.al.	2505.06832	null
2025-05-11	Overview of the NLPCC 2025 Shared Task 4: Multi-modal, Multilingual, and Multi-hop Medical Instructional Video Question Answering Challenge	Bin Li et.al.	2505.06814	null
2025-05-10	STRIVE: Structured Representation Integrating VLM Reasoning for Efficient Object Navigation	Haokun Zhu et.al.	2505.06729	null
2025-05-17	Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Representation Learning	Hang Gao et.al.	2505.06321	link
2025-05-07	Q-Heart: ECG Question Answering via Knowledge-Informed Multimodal LLMs	Hung Manh Pham et.al.	2505.06296	null
2025-05-09	From Millions of Tweets to Actionable Insights: Leveraging LLMs for User Profiling	Vahid Rahimzadeh et.al.	2505.06184	null
2025-05-12	APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning	Azim Ospanov et.al.	2505.05758	null
2025-05-09	Evolutionary thoughts: integration of large language models and evolutionary algorithms	Antonio Jimeno Yepes et.al.	2505.05756	link
2025-05-08	Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models	Yunxin Li et.al.	2505.04921	link
2025-05-07	Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers	Kusha Sareen et.al.	2505.04842	null
2025-05-06	Advancing Conversational Diagnostic AI with Multimodal Reasoning	Khaled Saab et.al.	2505.04653	null
2025-05-07	SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios	Ning Cheng et.al.	2505.04201	null
2025-05-20	On-Device LLM for Context-Aware Wi-Fi Roaming	Ju-Hyung Lee et.al.	2505.04174	link
2025-05-06	X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains	Qianchu Liu et.al.	2505.03981	null
2025-04-30	When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator	Md Fahim Anjum et.al.	2505.03786	link
2025-05-06	The Steganographic Potentials of Language Models	Artem Karpov et.al.	2505.03439	null
2025-05-12	Geospatial Mechanistic Interpretability of Large Language Models	Stef De Sabbata et.al.	2505.03368	link
2025-05-03	Accelerating Large Language Model Reasoning via Speculative Search	Zhihai Wang et.al.	2505.02865	null
2025-05-05	HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking	Runquan Gui et.al.	2505.02322	null
2025-05-04	DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving	Xinmeng Hou et.al.	2505.02123	link
2025-05-04	R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation	Meng-Hao Guo et.al.	2505.02018	null
2025-05-02	VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos	Zongxia Li et.al.	2505.01481	link
2025-05-01	Reasoning Capabilities and Invariability of Large Language Models	Alessandro Raganato et.al.	2505.00776	link
2025-04-30	Audo-Sight: Enabling Ambient Interaction For Blind And Visually Impaired Individuals	Bhanuja Ainary et.al.	2505.00153	null
2025-05-02	Rosetta-PL: Propositional Logic as a Benchmark for Large Language Model Reasoning	Shaun Baek et.al.	2505.00001	null
2025-05-21	Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models	Guanghao Zhou et.al.	2504.21277	null
2025-05-09	Token-Efficient RL for LLM Reasoning	Alan Lee et.al.	2504.20834	null
2025-04-29	Token-Efficient Prompt Injection Attack: Provoking Cessation in LLM Reasoning via Adaptive Token Compression	Yu Cui et.al.	2504.20493	null
2025-04-30	VideoMultiAgents: A Multi-Agent Framework for Video Question Answering	Noriyuki Kugo et.al.	2504.20091	link
2025-04-28	From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review	Mohamed Amine Ferrag et.al.	2504.19678	null
2025-05-17	SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning	Jiaqi Chen et.al.	2504.19162	null
2025-04-27	CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges	Yu Li et.al.	2504.19093	null
2025-04-24	Training Large Language Models to Reason via EM Policy Gradient	Tianbing Xu et.al.	2504.18587	null
2025-05-08	MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind	Zheng Zhang et.al.	2504.18039	null
2025-05-13	DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training	Xiaoyu Tian et.al.	2504.17565	null
2025-04-25	Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning	Chris et.al.	2504.16656	link
2025-04-27	Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL	Simone Papicchio et.al.	2504.15077	null
2025-04-20	a1: Steep Test-time Scaling Law via Environment Augmented Generation	Lingrui Mei et.al.	2504.14597	null
2025-04-20	CoLoTa: A Dataset for Entity-based Commonsense Reasoning over Long-Tail Knowledge	Armin Toroghi et.al.	2504.14462	null
2025-04-19	Improving RL Exploration for LLM Reasoning through Retrospective Replay	Shihan Dou et.al.	2504.14363	null
2025-05-21	An Empirical Study of LLM Reasoning Ability Under Strict Output Length Constraint	Yi Sun et.al.	2504.14350	null
2025-04-22	SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM	Xiaojiang Zhang et.al.	2504.14286	null
2025-04-19	CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations	Man Ho Lam et.al.	2504.14119	null
2025-04-18	Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods	Junlin Wang et.al.	2504.14047	null
2025-03-26	3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark	Ivan Sviridov et.al.	2504.13861	link
2025-05-16	Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?	Yang Yue et.al.	2504.13837	null
2025-04-18	Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning	Jianing Wang et.al.	2504.13500	link
2025-04-17	Can LLMs reason over extended multilingual contexts? Towards long-context evaluation beyond retrieval and haystacks	Amey Hengle et.al.	2504.12845	null
2025-05-19	GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks	Hao Xu et.al.	2504.12764	link
2025-04-17	Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning	Baining Zhao et.al.	2504.12680	link
2025-04-17	VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization	Menglan Chen et.al.	2504.12661	null
2025-04-24	GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning	Liangyu Xu et.al.	2504.12597	null
2025-04-13	HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation	Pei Liu et.al.	2504.12330	link
2025-04-16	d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning	Siyan Zhao et.al.	2504.12216	null
2025-04-16	Could Thinking Multilingually Empower LLM Reasoning?	Changjiang Gao et.al.	2504.11833	link
2025-04-15	A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce	Wei Xiong et.al.	2504.11343	link
2025-04-15	MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique	Shuhang Liu et.al.	2504.11009	null
2025-05-14	CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives	Ayoung Lee et.al.	2504.10823	null
2025-04-14	Weight-of-Thought Reasoning: Exploring Neural Network Weights for Enhanced LLM Reasoning	Saif Punjwani et.al.	2504.10646	link
2025-04-30	VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge	Yueqi Song et.al.	2504.10342	null
2025-04-14	SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model	Zongcan Ding et.al.	2504.10320	null
2025-04-14	PRM-BAS: Enhancing Multimodal Reasoning through PRM-guided Beam Annealing Search	Pengfei Hu et.al.	2504.10222	null
2025-04-15	Breaking the Data Barrier -- Building GUI Agents Through Task Generalization	Junlei Zhang et.al.	2504.10127	link
2025-04-14	CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation	Jia Li et.al.	2504.10046	null
2025-04-13	Short-Path Prompting in LLMs: Analyzing Reasoning Instability and Solutions for Robust Performance	Zuoli Tang et.al.	2504.09586	null
2025-04-13	Draw with Thought: Unleashing Multimodal Reasoning for Scientific Diagram Generation	Zhiqing Cui et.al.	2504.09479	null
2025-04-12	NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding	Aniket Pal et.al.	2504.09249	null
2025-04-12	A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems	Zixuan Ke et.al.	2504.09037	null
2025-04-11	Mixed Signals: Decoding VLMs' Reasoning and Underlying Bias in Vision-Language Conflict	Pouya Pezeshkpour et.al.	2504.08974	null
2025-05-08	VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning	Haozhe Wang et.al.	2504.08837	null
2025-04-06	AdaptRec: A Self-Adaptive Framework for Sequential Recommendations with Large Language Models	Tong Zhang et.al.	2504.08786	null
2025-04-01	Accelerating Causal Network Discovery of Alzheimer Disease Biomarkers via Scientific Literature-based Retrieval Augmented Generation	Xiaofan Zhou et.al.	2504.08768	null
2025-04-11	Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning	Fangzhi Xu et.al.	2504.08672	link
2025-04-11	VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering	Qi Zhi Lim et.al.	2504.08269	null
2025-04-15	Kimi-VL Technical Report	Kimi Team et.al.	2504.07491	link
2025-04-02	DeepSeek-R1 Thoughtology: Let's about LLM Reasoning	Sara Vera Marjanović et.al.	2504.07128	null
2025-04-09	KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs	Elan Markowitz et.al.	2504.07087	null
2025-04-09	DeduCE: Deductive Consistency as a Framework to Evaluate LLM Reasoning	Atharva Pandey et.al.	2504.07080	null
2025-04-09	To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning	Tian Qin et.al.	2504.07052	null
2025-04-09	SCI-Reason: A Dataset with Chain-of-Thought Rationales for Complex Multimodal Reasoning in Academic Areas	Chenghao Ma et.al.	2504.06637	null
2025-04-08	FEABench: Evaluating Language Models on Multiphysics Reasoning Ability	Nayantara Mudur et.al.	2504.06260	link
2025-04-23	Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization	Qingyang Zhang et.al.	2504.05812	link
2025-04-08	MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models	Pengfei Zhou et.al.	2504.05782	link
2025-04-08	Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought	Yi Peng et.al.	2504.05599	null
2025-04-06	ZeroED: Hybrid Zero-shot Error Detection through Large Language Model Reasoning	Wei Ni et.al.	2504.05345	null
2025-04-07	Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning	Sugyeong Eo et.al.	2504.05047	null
2025-04-07	LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts	Yimu Wang et.al.	2504.04653	null
2025-04-06	Hierarchical Planning for Complex Tasks with Knowledge Graph-RAG and Symbolic Verification	Cristina Cornelio et.al.	2504.04578	null
2025-04-06	Planning Safety Trajectories with Dual-Phase, Physics-Informed, and Transportation Knowledge-Driven Large Language Models	Rui Gan et.al.	2504.04562	link
2025-04-06	Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning	Xuerui Su et.al.	2504.04524	link
2025-04-06	Geo-OLM: Enabling Sustainable Earth Observation Studies with Cost-Efficient Open Language Models & State-Driven Workflows	Dimitrios Stamoulis et.al.	2504.04319	null
2025-04-04	Language Models Are Implicitly Continuous	Samuele Marro et.al.	2504.03933	link
2025-04-04	Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition	Rishi Hazra et.al.	2504.03930	null
2025-04-07	MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models	Wulin Xie et.al.	2504.03641	null
2025-04-04	Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)	Jing Bi et.al.	2504.03151	null
2025-04-04	LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph	Tu Ao et.al.	2504.03137	null
2025-04-25	Generative Evaluation of Complex Reasoning in Large Language Models	Haowei Lin et.al.	2504.02810	link
2025-04-10	Affordable AI Assistants with Knowledge Graph of Thoughts	Maciej Besta et.al.	2504.02670	null
2025-04-03	LexPam: Legal Procedure Awareness-Guided Mathematical Reasoning	Kepu Zhang et.al.	2504.02590	null
2025-04-03	AnesBench: Multi-Dimensional Evaluation of LLM Reasoning in Anesthesiology	Xiang Feng et.al.	2504.02404	link
2025-04-02	A Survey of Scaling in Large Language Model Reasoning	Zihan Chen et.al.	2504.02181	null
2025-04-02	Exploring LLM Reasoning Through Controlled Prompt Variations	Giannis Chatziveroglou et.al.	2504.02111	link
2025-04-02	Advancing AI-Scientist Understanding: Making LLM Think Like a Physicist with Interpretable Reasoning	Yinggan Xu et.al.	2504.01911	null
2025-04-02	TransientTables: Evaluating LLMs' Reasoning on Temporally Evolving Semi-structured Tables	Abhilash Shankarampeta et.al.	2504.01879	null
2025-04-02	Cross-Lingual Consistency: A Novel Inference Framework for Advancing Reasoning in Large Language Models	Zhiwei Yu et.al.	2504.01857	null
2025-04-03	GTR: Graph-Table-RAG for Cross-Table Question Answering	Jiaru Zou et.al.	2504.01346	null
2025-04-01	When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning	Nishad Singhi et.al.	2504.01005	null
2025-04-01	How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study	Yunjie Ji et.al.	2504.00829	null
2025-04-02	FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning	Jie Ma et.al.	2504.00487	link
2025-04-01	Agentic Multimodal AI for Hyperpersonalized B2B and B2C Advertising in Competitive Markets: An AI-Driven Competitive Advertising Framework	Sakhinana Sagar Srinivas et.al.	2504.00338	null
2025-03-31	Do Large Language Models Exhibit Spontaneous Rational Deception?	Samuel M. Taylor et.al.	2504.00285	null
2025-03-31	SVLA: A Unified Speech-Vision-Language Assistant with Multimodal Reasoning and Speech Generation	Ngoc Dung Huynh et.al.	2503.24164	null
2025-03-31	Boosting MLLM Reasoning with Text-Debiased Hint-GRPO	Qihan Huang et.al.	2503.23905	null
2025-03-31	WinoWhat: A Parallel Corpus of Paraphrased WinoGrande Sentences with Common Sense Categorization	Ine Gevers et.al.	2503.23779	null
2025-03-30	Evolutionary Prompt Optimization Discovers Emergent Multimodal Reasoning Strategies in Vision-Language Models	Sid Bharthulwar et.al.	2503.23503	null
2025-03-29	The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction	Yihuai Hong et.al.	2503.23084	null
2025-04-03	Cognitive Prompts Using Guilford's Structure of Intellect Model	Oliver Kramer et.al.	2503.22036	null
2025-03-27	SWI: Speaking with Intent in Large Language Models	Yuwei Yin et.al.	2503.21544	link
2025-03-27	Cultivating Game Sense for Yourself: Making VLMs Gaming Experts	Wenxuan Lu et.al.	2503.21263	null
2025-03-27	Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning	Huajie Tan et.al.	2503.20752	null
2025-03-26	Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging	Han Wu et.al.	2503.20641	link
2025-03-25	Gemini Robotics: Bringing AI into the Physical World	Gemini Robotics Team et.al.	2503.20020	null
2025-03-25	VisualQuest: A Diverse Image Dataset for Evaluating Visual Recognition in LLMs	Kelaiti Xiao et.al.	2503.19936	null
2025-04-06	A Multi-Agent Framework Integrating Large Language Models and Generative AI for Accelerated Metamaterial Design	Jie Tian et.al.	2503.19889	null
2025-03-25	Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking	Xiaoyu Tian et.al.	2503.19855	null
2025-03-24	Training-Free Personalization via Retrieval and Reasoning on Fingerprints	Deepayan Das et.al.	2503.18623	null
2025-03-23	Mind with Eyes: from Language Reasoning to Multimodal Reasoning	Zhiyu Lin et.al.	2503.18071	null
2025-04-19	Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning	Chenyu Zhang et.al.	2503.17987	null
2025-03-23	MedPlan:A Two-Stage RAG-Based System for Personalized Medical Plan Generation	Hsin-Ling Hsu et.al.	2503.17900	null
2025-03-22	A Modular Dataset to Demonstrate LLM Abstraction Capability	Adam Atanas et.al.	2503.17645	null
2025-03-22	ConSol: Sequential Probability Ratio Testing to Find Consistent LLM Reasoning Paths Efficiently	Jaeyeon Lee et.al.	2503.17587	link
2025-03-21	LEMMA: Learning from Errors for MatheMatical Advancement in LLMs	Zhuoshi Pan et.al.	2503.17439	link
2025-03-21	V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms	Javier J. Poveda Rodrigo et.al.	2503.17422	null
2025-03-21	Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique	Yansi Li et.al.	2503.17363	null
2025-03-21	OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement	Yihe Deng et.al.	2503.17352	link
2025-03-21	LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language	Kun Chu et.al.	2503.17309	link
2025-03-21	Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study	Li Zhang et.al.	2503.16788	link
2025-03-20	Towards Agentic Recommender Systems in the Era of Multimodal Large Language Models	Chengkai Huang et.al.	2503.16734	null
2025-03-21	MKG-Rank: Enhancing Large Language Models with Knowledge Graph for Multilingual Medical Question Answering	Feiyang Li et.al.	2503.16131	null
2025-03-20	Entropy-based Exploration Conduction for Multi-step Reasoning	Jinghan Zhang et.al.	2503.15848	null
2025-03-19	LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning	Federico Cocchi et.al.	2503.15621	link
2025-03-19	EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models	Yinan Liang et.al.	2503.15369	null
2025-04-01	Envisioning an AI-Enhanced Mental Health Ecosystem	Kellie Yu Hui Sim et.al.	2503.14883	null
2025-03-19	Think Like Human Developers: Harnessing Community Knowledge for Structured Code Reasoning	Chengran Yang et.al.	2503.14838	null
2025-03-18	Temporal Consistency for LLM Reasoning Process Error Identification	Jiacheng Guo et.al.	2503.14495	link
2025-03-21	Bridging Social Psychology and LLM Reasoning: Conflict-Aware Meta-Review Generation via Cognitive Alignment	Wei Chen et.al.	2503.13879	null
2025-03-18	Empowering GraphRAG with Knowledge Filtering and Integration	Kai Guo et.al.	2503.13804	null
2025-03-15	Cognitive Activation and Chaotic Dynamics in Large Language Models: A Quasi-Lyapunov Analysis of Reasoning Mechanisms	Xiaojian Li et.al.	2503.13530	null
2025-03-14	RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration	Hong Qing Yu et.al.	2503.13514	null
2025-03-17	A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives	Weiqiang Jin et.al.	2503.13415	null
2025-03-17	MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research	James Burgess et.al.	2503.13399	link
2025-03-17	Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning	Hai-Long Sun et.al.	2503.13360	null
2025-03-17	Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning	Junming Liu et.al.	2503.12972	null
2025-03-17	R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization	Jingyi Zhang et.al.	2503.12937	link
2025-03-28	Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation	Songjun Tu et.al.	2503.12854	link
2025-03-18	DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding	Xinyu Ma et.al.	2503.12797	link
2025-03-16	MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification	Zhaopan Xu et.al.	2503.12505	null
2025-03-31	Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition	Xiaoying Zhang et.al.	2503.12303	link
2025-03-20	Applications of Large Language Model Reasoning in Feature Generation	Dharani Chandra et.al.	2503.11989	null
2025-03-14	Neutralizing Bias in LLM Reasoning using Entailment Graphs	Liang Cheng et.al.	2503.11614	link
2025-03-14	VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity	Jing Bi et.al.	2503.11557	null
2025-03-14	RESPONSE: Benchmarking the Ability of Language Models to Undertake Commonsense Reasoning in Crisis Situation	Aissatou Diallo et.al.	2503.11348	null
2025-03-13	Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data	Paul Quinlan et.al.	2503.10883	null
2025-03-18	R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization	Yi Yang et.al.	2503.10615	link
2025-03-15	VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search	Yiming Jia et.al.	2503.10582	null
2025-03-13	VisualPRM: An Effective Process Reward Model for Multimodal Reasoning	Weiyun Wang et.al.	2503.10291	null
2025-03-18	"Well, Keep Thinking": Enhancing LLM Reasoning with Adaptive Injection Decoding	Hyunbin Jin et.al.	2503.10167	null
2025-03-13	How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game	Ziyue Wang et.al.	2503.10042	link
2025-04-08	Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning	Bowen Jin et.al.	2503.09516	link
2025-03-12	MindGYM: Enhancing Vision-Language Models via Synthetic Self-Challenging Questions	Zhe Xu et.al.	2503.09499	link
2025-03-12	A Survey on Enhancing Causal Reasoning Ability of Large Language Models	Xin Li et.al.	2503.09326	null
2025-03-11	Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework	Zhuo Zhi et.al.	2503.08308	null
2025-03-11	FASIONAD++ : Integrating High-Level Instruction and Information Bottleneck in FAt-Slow fusION Systems for Enhanced Safety in Autonomous Driving with Adaptive Feedback	Kangan Qian et.al.	2503.08162	null
2025-03-05	An Optimization Algorithm for Multimodal Data Alignment	Wei Zhang et.al.	2503.07636	null
2025-03-11	LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL	Yingzhe Peng et.al.	2503.07536	null
2025-03-10	MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning	Fanqing Meng et.al.	2503.07365	link
2025-03-10	Dynamic Path Navigation for Motion Agents with LLM Reasoning	Yubo Zhao et.al.	2503.07323	null
2025-03-11	Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models	Wenxuan Huang et.al.	2503.06749	link
2025-03-09	Graph Retrieval-Augmented LLM for Conversational Recommendation Systems	Zhangchi Qiu et.al.	2503.06430	null
2025-03-08	Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?	Kun Xiang et.al.	2503.06252	link
2025-03-15	Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning	Yanjun Chen et.al.	2503.06232	null
2025-03-08	KnowLogic: A Benchmark for Commonsense Reasoning via Knowledge-Driven Data Synthesis	Weidong Zhan et.al.	2503.06218	link
2025-03-07	Extracting and Emulsifying Cultural Explanation to Improve Multilingual Capability of LLMs	Hamin Koo et.al.	2503.05846	null
2025-03-07	Memory-augmented Query Reconstruction for LLM-based Knowledge Graph Reasoning	Mufan Xu et.al.	2503.05193	null
2025-03-07	Rewarding Curse: Analyze and Mitigate Reward Modeling Issues for LLM Reasoning	Jiachun Li et.al.	2503.05188	null
2025-03-07	Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching	Simon A. Aytes et.al.	2503.05179	link
2025-03-10	R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model	Hengguang Zhou et.al.	2503.05132	link
2025-03-04	Learning from Failures in Multi-Attempt Reinforcement Learning	Stephen Chung et.al.	2503.04808	null
2025-03-15	Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference	Thanh Le-Cong et.al.	2503.04779	null
2025-03-06	Better Process Supervision with Bi-directional Rewarding Signals	Wenxiang Chen et.al.	2503.04618	null
2025-04-02	SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning	Chen Li et.al.	2503.04530	null
2025-03-07	Question-Aware Gaussian Experts for Audio-Visual Question Answering	Hongyeob Kim et.al.	2503.04459	link
2025-03-06	Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English	Runtao Zhou et.al.	2503.04099	null
2025-03-06	ReasonGraph: Visualisation of Reasoning Paths	Zongqian Li et.al.	2503.03979	link
2025-03-05	Process-based Self-Rewarding Language Models	Shimao Zhang et.al.	2503.03746	link
2025-03-05	COSINT-Agent: A Knowledge-Driven Multimodal Agent for Chinese Open Source Intelligence	Wentao Li et.al.	2503.03215	null
2025-03-04	The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models	Ke Ji et.al.	2503.02875	null
2025-03-04	Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models	Zhifei Xie et.al.	2503.02318	null
2025-03-04	LLM-TabFlow: Synthetic Tabular Data Generation with Inter-column Logical Relationship Preservation	Yunbo Long et.al.	2503.02161	null
2025-03-03	CorrA: Leveraging Large Language Models for Dynamic Obstacle Avoidance of Autonomous Vehicles	Shanting Wang et.al.	2503.02076	null
2025-03-03	Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning	Wenjie Wu et.al.	2503.01642	null
2025-03-03	Pragmatic Inference Chain (PIC) Improving LLMs' Reasoning of Authentic Implicit Toxic Language	Xi Chen et.al.	2503.01539	null
2025-03-03	CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs	Artem Lykov et.al.	2503.01378	null
2025-03-06	SRAG: Structured Retrieval-Augmented Generation for Multi-Entity Question Answering over Wikipedia Graph	Teng Lin et.al.	2503.01346	null
2025-03-03	MINT: Multi-modal Chain of Thought in Unified Generative Models for Enhanced Image Generation	Yi Wang et.al.	2503.01298	null
2025-02-28	Personalized Causal Graph Reasoning for LLMs: A Case Study on Dietary Recommendations	Zhongqi Yang et.al.	2503.00134	null
2025-02-28	Contextualizing biological perturbation experiments through language	Menghua Wu et.al.	2502.21290	link
2025-02-28	Rectifying Belief Space via Unlearning to Harness LLMs' Reasoning	Ayana Niwa et.al.	2502.20620	null
2025-02-27	FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving	Guizhen Chen et.al.	2502.20238	link
2025-02-27	Collaborative Stance Detection via Small-Large Language Model Consistency Verification	Yu Yan et.al.	2502.19954	link
2025-02-27	Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models	Yuan Sui et.al.	2502.19918	null
2025-02-27	Order Doesn't Matter, But Reasoning Does: Training LLMs with Order-Centric Augmentation	Qianxi He et.al.	2502.19907	null
2025-03-21	Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention	Weiyan Shi et.al.	2502.19877	null
2025-03-05	Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning	Yanan Chen et.al.	2502.19622	null
2025-02-26	General Reasoning Requires Learning to Reason from the Get-go	Seungwook Han et.al.	2502.19402	null
2025-02-26	BIG-Bench Extra Hard	Mehran Kazemi et.al.	2502.19187	link
2025-02-25	Scalable Best-of-N Selection for Large Language Models via Self-Certainty	Zhewei Kang et.al.	2502.18581	link
2025-02-25	SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution	Yuxiang Wei et.al.	2502.18449	null
2025-02-25	Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning	Wenkai Yang et.al.	2502.18080	null
2025-02-21	Improving Value-based Process Verifier via Structural Prior Injection	Zetian Sun et.al.	2502.17498	null
2025-02-24	Making LLMs Reason? The Intermediate Language Problem in Neurosymbolic Approaches	Alexander Beiser et.al.	2502.17216	null
2025-02-24	Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI	Syed Abdul Gaffar Shakhadri et.al.	2502.17092	null
2025-02-24	Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology	Longchao Da et.al.	2502.17026	null
2025-02-24	All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark	Davide Testa et.al.	2502.16989	null
2025-02-24	AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models	Qin Zhu et.al.	2502.16906	link
2025-02-24	The Blessing of Reasoning: LLM-Based Contrastive Explanations in Black-Box Recommender Systems	Yuyan Wang et.al.	2502.16759	null
2025-02-23	Reasoning about Affordances: Causal and Compositional Reasoning in LLMs	Magnus F. Gjerde et.al.	2502.16606	null
2025-02-22	ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning	Shulin Huang et.al.	2502.16268	null
2025-02-27	Dynamic Parallel Tree Search for Efficient LLM Reasoning	Yifu Ding et.al.	2502.16235	null
2025-02-22	Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations	Chunyang Li et.al.	2502.16169	link
2025-03-04	Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models	Qianqi Yan et.al.	2502.16033	null
2025-02-21	MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use	Zaid Khan et.al.	2502.15872	null
2025-02-21	Do Multilingual LLMs Think In English?	Lisa Schut et.al.	2502.15603	null
2025-02-21	Evaluating Social Biases in LLM Reasoning	Xuyang Wu et.al.	2502.15361	null
2025-02-21	Stepwise Informativeness Search for Improving LLM Reasoning	Siyuan Wang et.al.	2502.15335	null
2025-02-21	Latent Factor Models Meets Instructions:Goal-conditioned Latent Factor Discovery without Task Supervision	Zhouhang Xie et.al.	2502.15147	null
2025-02-19	SIFT: Grounding LLM Reasoning in Contexts via Stickers	Zihao Zeng et.al.	2502.14922	link
2025-02-18	Think Inside the JSON: Reinforcement Strategy for Strict LLM Schema Adherence	Bhavik Agarwal et.al.	2502.14905	null
2025-03-04	Exploring Advanced Techniques for Visual Question Answering: A Comprehensive Comparison	Aiswarya Baby et.al.	2502.14827	null
2025-02-20	Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning	Tian Xie et.al.	2502.14768	link
2025-02-19	Enhancing LLM-Based Recommendations Through Personalized Reasoning	Jiahao Liu et.al.	2502.13845	link
2025-02-19	MCTS-KBQA: Monte Carlo Tree Search for Knowledge Base Question Answering	Guanming Xiong et.al.	2502.13428	null
2025-02-19	MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification	Linzhuang Sun et.al.	2502.13383	link
2025-02-22	Grounding LLM Reasoning with Knowledge Graphs	Alfonso Amayuelas et.al.	2502.13247	null
2025-02-18	Theorem Prover as a Judge for Synthetic Data Generation	Joshua Ong Jun Leang et.al.	2502.13137	null
2025-02-18	Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options	Lakshmi Nair et.al.	2502.12929	link
2025-02-18	S $^2$ R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning	Ruotian Ma et.al.	2502.12853	link
2025-02-18	CutPaste&Find: Efficient Multimodal Hallucination Detector with Visual-aid Knowledge Base	Cong-Duy Nguyen et.al.	2502.12591	null
2025-02-18	Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights	Shubham Parashar et.al.	2502.12521	null
2025-02-18	HopRAG: Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation	Hao Liu et.al.	2502.12442	null
2025-02-17	Evaluating Step-by-step Reasoning Traces: A Survey	Jinu Lee et.al.	2502.12289	null
2025-02-17	SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs	Yige Xu et.al.	2502.12134	link
2025-02-17	TokenSkip: Controllable Chain-of-Thought Compression in LLMs	Heming Xia et.al.	2502.12067	link
2025-02-17	Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models	Hyunwoo Kim et.al.	2502.11881	null
2025-02-17	Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities	Hanbin Wang et.al.	2502.11829	link
2025-02-17	Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning	Yuqi Pang et.al.	2502.11751	link
2025-02-17	DeFiScope: Detecting Various DeFi Price Manipulations with LLM Reasoning	Juantao Zhong et.al.	2502.11521	null
2025-02-16	Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls	Ante Wang et.al.	2502.11183	link
2025-02-16	LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning	Tianshi Zheng et.al.	2502.11176	null
2025-02-15	A Tutorial on LLM Reasoning: Relevant Methods behind ChatGPT o1	Jun Wang et.al.	2502.10867	null
2025-02-28	USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions	Hamed Rahimi et.al.	2502.10636	null
2025-02-14	Do Large Language Models Reason Causally Like Us? Even Better?	Hanna M. Dettki et.al.	2502.10215	null
2025-02-14	MathConstruct: Challenging LLM Reasoning with Constructive Proofs	Mislav Balunović et.al.	2502.10197	null
2025-02-13	MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency	Dongzhi Jiang et.al.	2502.09621	null
2025-02-14	EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges	Clinton J. Wang et.al.	2502.08859	null
2025-02-11	CIRCUIT: A Benchmark for Circuit Interpretation and Reasoning Capabilities of LLMs	Lejla Skelic et.al.	2502.07980	null
2025-02-05	Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment	Cheryl Li et.al.	2502.07803	null
2025-02-17	Bag of Tricks for Inference-time Computation of LLM Reasoning	Fan Liu et.al.	2502.07191	link
2025-02-15	Self-Supervised Prompt Optimization	Jinyu Xiang et.al.	2502.06855	link
2025-02-06	Vision-Integrated LLMs for Autonomous Driving Assistance : Human Performance Comparison and Trust Evaluation	Namhee Kim et.al.	2502.06843	null
2025-02-04	Policy Guided Tree Search for Enhanced LLM Reasoning	Yang Li et.al.	2502.06813	null
2025-03-11	ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates	Ling Yang et.al.	2502.06772	link
2025-02-10	Resurrecting saturated LLM benchmarks with adversarial encoding	Igor Ivanov et.al.	2502.06738	null
2025-02-13	LawGPT: Knowledge-Guided Data Generation and Its Application to Legal LLM	Zhi Zhou et.al.	2502.06572	link
2025-02-09	A Generative Framework for Bidirectional Image-Report Understanding in Chest Radiography	Nicholas Evans et.al.	2502.05926	null
2025-02-08	Evaluating Vision-Language Models for Emotion Recognition	Sree Bhattacharyya et.al.	2502.05660	null
2025-02-07	GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?	Yang Zhou et.al.	2502.05252	link
2025-02-07	Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures	Tushar Pandey et.al.	2502.05078	link
2025-02-07	Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research	Junde Wu et.al.	2502.04644	link
2025-02-05	Enhancing Reasoning to Adapt Large Language Models for Domain-Specific Applications	Bo Wen et.al.	2502.04384	link
2025-02-05	Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning	Jonathan Kim et.al.	2502.04381	null
2025-02-04	Investigating the Robustness of Deductive Reasoning with Large Language Models	Fabian Hoppe et.al.	2502.04352	null
2025-02-04	Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search	Maohao Shen et.al.	2502.02508	null
2025-02-04	CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning	Jianfeng Pan et.al.	2502.02390	null
2025-02-08	Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking	Jinyang Wu et.al.	2502.02339	null
2025-02-04	Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration	Younan Zhu et.al.	2502.01969	null
2025-01-31	Improving Rule-based Reasoning in LLMs via Neurosymbolic Representations	Varun Dhanraj et.al.	2502.01657	null
2025-02-03	Position: Empowering Time Series Reasoning with Multimodal LLMs	Yaxuan Kong et.al.	2502.01477	null
2025-02-03	ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning	Bill Yuchen Lin et.al.	2502.01100	null
2025-02-16	Learning Autonomous Code Integration for Math Language Models	Haozhe Wang et.al.	2502.00691	null
2025-02-13	Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning	Zhi Zhou et.al.	2502.00511	null
2025-02-14	Reward-Guided Speculative Decoding for Efficient LLM Reasoning	Baohao Liao et.al.	2501.19324	null
2025-01-31	Efficient Reasoning with Hidden Thinking	Xuan Shen et.al.	2501.19201	link
2025-01-31	BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning	Han Zhong et.al.	2501.18858	null
2025-01-28	A Stochastic Dynamical Theory of LLM Self-Adversariality: Modeling Severity Drift as a Critical Process	Jack David Carson et.al.	2501.16783	null
2025-01-27	Explaining GitHub Actions Failures with Large Language Models: Challenges, Insights, and Limitations	Pablo Valenzuela-Toledo et.al.	2501.16495	null
2025-01-27	Large Models in Dialogue for Active Perception and Anomaly Detection	Tzoulio Chamiti et.al.	2501.16300	link
2025-01-26	TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs	Yuxuan Gu et.al.	2501.15674	link
2025-01-28	Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning	Zeyu Gan et.al.	2501.15602	link
2025-01-26	Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework	Yuhong Sun et.al.	2501.15581	null
2025-02-15	Option-ID Based Elimination For Multiple Choice Questions	Zhenhao Zhu et.al.	2501.15175	link
2025-01-24	Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains	Xu Chu et.al.	2501.14431	null
2025-02-12	GraphSOS: Graph Sampling and Order Selection to Help LLMs Understand Graphs Better	Xu Chu et.al.	2501.14427	null
2025-01-23	Pseudocode-Injection Magic: Enabling LLMs to Tackle Graph Computational Tasks	Chang Gong et.al.	2501.13731	null
2025-02-10	Cognitive Paradigms for Evaluating VLMs on Visual Reasoning Task	Mohit Vaishnav et.al.	2501.13620	null
2025-01-22	EvidenceMap: Unleashing the Power of Small Language Models with Evidence Analysis for Biomedical Question Answering	Chang Zong et.al.	2501.12746	null
2025-01-17	LLM Reasoner and Automated Planner: A new NPC approach	Israel Puerta-Merino et.al.	2501.10106	null
2025-01-22	FRAG: A Flexible Modular Framework for Retrieval-Augmented Generation based on Knowledge Graphs	Zengyi Gao et.al.	2501.09957	null
2025-01-17	Evolving Deeper LLM Thinking	Kuang-Huei Lee et.al.	2501.09891	null
2025-01-23	Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models	Fengli Xu et.al.	2501.09686	null
2025-01-15	Multimodal LLMs Can Reason about Aesthetics in Zero-Shot	Ruixiang Jiang et.al.	2501.09012	link
2025-02-10	Ensemble of Large Language Models for Curated Labeling and Rating of Free-text Data	Jiaxing Qiu et.al.	2501.08413	link
2025-01-14	Reasoning with Graphs: Structuring Implicit Knowledge to Enhance LLMs Reasoning	Haoyu Han et.al.	2501.07845	null
2025-01-09	Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark	Yunzhuo Hao et.al.	2501.05444	link
2025-01-08	Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations	Archita Srivastava et.al.	2501.04675	null
2025-01-08	DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests	Charles Corbière et.al.	2501.04671	null
2025-01-08	Understanding Before Reasoning: Enhancing Chain-of-Thought with Iterative Summarization Pre-Prompting	Dong-Hai Zhu et.al.	2501.04341	link
2025-01-07	Reasoning-Enhanced Self-Training for Long-Form Personalized Text Generation	Alireza Salemi et.al.	2501.04167	null
2025-01-07	Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild	Wanpeng Hu et.al.	2501.02964	link
2025-01-06	KG-CF: Knowledge Graph Completion with Context Filtering under the Guidance of Large Language Models	Zaiyi Zheng et.al.	2501.02711	null
2025-01-04	Table as Thought: Exploring Structured Thoughts in LLM Reasoning	Zhenjie Sun et.al.	2501.02152	null
2025-01-03	Recursive Decomposition of Logical Thoughts: Framework for Superior Reasoning and Knowledge Propagation in Large Language Models	Kaleem Ullah Qasim et.al.	2501.02026	null
2025-01-02	Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search	Shuangtao Li et.al.	2501.01478	null
2025-01-02	HetGCoT-Rec: Heterogeneous Graph-Enhanced Chain-of-Thought LLM Reasoning for Journal Recommendation	Runsong Jia et.al.	2501.01203	null
2025-01-03	Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents	Chengbo He et.al.	2501.00430	null
2024-12-31	EQUATOR: A Deterministic Framework for Evaluating LLM Reasoning with Open-Ended Questions. # v1.0.0-beta	Raymond Bernard et.al.	2501.00257	null
2024-12-30	Efficiently Serving LLM Reasoning Programs with Certaindex	Yichao Fu et.al.	2412.20993	null
2024-12-28	LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning	Shuguang Chen et.al.	2412.20227	null
2025-02-17	Token-Budget-Aware LLM Reasoning	Tingxu Han et.al.	2412.18547	link
2024-12-23	StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs	Hailin Chen et.al.	2412.18011	null
2025-02-09	Evaluating LLM Reasoning in the Operations Research Domain with ORQA	Mahdi Mostajabdaveh et.al.	2412.17874	link
2024-12-23	Diving into Self-Evolving Training for Multimodal Reasoning	Wei Liu et.al.	2412.17451	null
2024-12-21	SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization	Tan-Hanh Pham et.al.	2412.16771	null
2024-12-20	PruneVid: Visual Token Pruning for Efficient Video Large Language Models	Xiaohu Huang et.al.	2412.16117	link
2024-12-19	Eliciting Causal Abilities in Large Language Models for Reasoning Tasks	Yajing Wang et.al.	2412.15314	link
2024-12-19	Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying	Federico Castagna et.al.	2412.15177	link
2024-12-19	Progressive Multimodal Reasoning via Active Retrieval	Guanting Dong et.al.	2412.14835	null
2024-12-19	FiVL: A Framework for Improved Vision-Language Alignment	Estelle Aflalo et.al.	2412.14672	null
2024-12-19	FaultExplainer: Leveraging Large Language Models for Interpretable Fault Detection and Diagnosis	Abdullah Khan et.al.	2412.14492	link
2024-12-18	Cognition Chain for Explainable Psychological Stress Detection on Social Media	Xin Wang et.al.	2412.14009	link
2024-12-27	Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence	Jinghan He et.al.	2412.13949	null
2025-02-16	Do Language Models Understand Time?	Xi Ding et.al.	2412.13845	link
2024-12-18	Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games	Wenye Lin et.al.	2412.13602	link
2024-12-17	ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models	Yuxi Sun et.al.	2412.12848	null
2024-12-12	A NotSo Simple Way to Beat Simple Bench	Soham Sane et.al.	2412.12173	null
2024-12-11	What Makes In-context Learning Effective for Mathematical Reasoning: A Theoretical Analysis	Jiayu Liu et.al.	2412.12157	null
2025-02-18	A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges	Yibo Yan et.al.	2412.11936	null
2024-12-24	Stepwise Reasoning Error Disruption Attack of LLMs	Jingyu Peng et.al.	2412.11934	null
2024-12-16	Leveraging Retrieval-Augmented Tags for Large Vision-Language Understanding in Complex Scenes	Antonio Carlos Rivera et.al.	2412.11396	null
2024-12-15	SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation	Hang Zhang et.al.	2412.11026	null
2024-12-15	Entropy-Regularized Process Reward Model	Hanning Zhang et.al.	2412.11006	link
2024-12-14	Optimizing Vision-Language Interactions Through Decoder-Only Models	Kaito Tanaka et.al.	2412.10758	null
2024-12-14	Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation	Sukai Huang et.al.	2412.10675	null
2024-12-14	Thinking with Knowledge Graphs: Enhancing LLM Reasoning Through Structured Data	Xue Wu et.al.	2412.10654	null
2024-12-13	EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual Editing	Umar Khalid et.al.	2412.10566	null
2024-12-13	Atomic Learning Objectives Labeling: A High-Resolution Approach for Physics Education	Naiming Liu et.al.	2412.09914	null
2025-01-18	Neptune: The Long Orbit to Benchmarking Long Video Understanding	Arsha Nagrani et.al.	2412.09582	link
2025-02-14	Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning	Zhenni Bi et.al.	2412.09078	link
2024-12-11	Training Large Language Models to Reason in a Continuous Latent Space	Shibo Hao et.al.	2412.06769	link
2025-01-23	GameArena: Evaluating LLM Reasoning through Live Computer Games	Lanxiang Hu et.al.	2412.06394	null
2024-12-08	Language hooks: a modular framework for augmenting LLM reasoning that decouples tool usage from the model and its prompt	Damien de Mijolla et.al.	2412.05967	null
2024-12-06	MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale	Jarvis Guo et.al.	2412.05237	null
2024-12-05	Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction	Yiheng Xu et.al.	2412.04454	null
2024-12-05	SocialMind: LLM-based Proactive AR Social Assistive System with Human-like Perception for In-situ Live Interactions	Bufang Yang et.al.	2412.04036	null
2024-12-04	DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation	Qingdong He et.al.	2412.03255	null
2024-12-03	Explainable CTR Prediction via LLM Reasoning	Xiaohan Yu et.al.	2412.02588	null
2025-02-12	NYT-Connections: A Deceptively Simple Text Classification Task that Stumps System-1 Thinkers	Angel Yahir Loredo Lopez et.al.	2412.01621	null
2025-01-13	Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability	Zicheng Lin et.al.	2411.19943	link
2024-11-29	TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension	Zipeng Qiu et.al.	2411.19504	link
2024-11-29	COLD: Causal reasOning in cLosed Daily activities	Abhinav Joshi et.al.	2411.19500	link
2024-12-16	Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning	Di Zhang et.al.	2411.18203	null
2024-11-26	NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects?	Jiaxuan Li et.al.	2411.17794	null
2024-11-25	Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision	Zhiheng Xi et.al.	2411.16579	null
2024-11-22	On the Impact of Fine-Tuning on Chain-of-Thought Reasoning	Elita Lobo et.al.	2411.15382	null
2024-11-21	Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models	Yuhao Dong et.al.	2411.14432	link
2024-11-20	BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games	Davide Paglieri et.al.	2411.13543	null
2024-11-20	Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving	Hao Zhou et.al.	2411.13076	null
2024-11-15	Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination	Haojie Zheng et.al.	2411.12591	link
2024-12-23	Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus	Terufumi Morishita et.al.	2411.12498	link
2024-11-18	Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation	Mingchao Qi et.al.	2411.11714	link
2024-12-31	Enhancing LLM Reasoning with Reward-guided Tree Search	Jinhao Jiang et.al.	2411.11694	null
2024-12-15	A dataset of questions on decision-theoretic reasoning in Newcomb-like problems	Caspar Oesterheld et.al.	2411.10588	link
2024-11-15	Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization	Weiyun Wang et.al.	2411.10442	null
2025-01-09	LLaVA-CoT: Let Vision Language Models Reason Step-by-Step	Guowei Xu et.al.	2411.10440	link
2024-11-15	Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level	Andong Deng et.al.	2411.09921	null
2024-11-14	Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering	Nghia Trung Ngo et.al.	2411.09213	null
2024-11-13	Tree-of-Table: Unleashing the Power of LLMs for Enhanced Large-Scale Table Understanding	Deyi Ji et.al.	2411.08516	null
2024-11-18	What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?	Katie Kang et.al.	2411.07681	link
2024-11-27	Self-Training Meets Consistency: Improving LLMs' Reasoning With Consistency-Driven Rationale Evaluation	Jaehyeok Lee et.al.	2411.06387	link
2024-11-09	A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization	Haoxin Liu et.al.	2411.06018	null
2024-11-11	LLMs as Method Actors: A Model for Prompt Engineering and Architecture	Colin Doyle et.al.	2411.05778	link
2024-11-12	Kwai-STaR: Transform LLMs into State-Transition Reasoners	Xingyu Lu et.al.	2411.04799	null
2024-11-21	Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding	Haolin Chen et.al.	2411.04282	link
2024-11-05	CrowdGenUI: Enhancing LLM-Based UI Widget Generation with a Crowdsourced Preference Library	Yimeng Liu et.al.	2411.03477	null
2025-01-27	MetRex: A Benchmark for Verilog Code Metric Reasoning Using LLMs	Manar Abdelatty et.al.	2411.03471	link
2024-11-04	RuAG: Learned-rule-augmented Generation for Large Language Models	Yudi Zhang et.al.	2411.03349	null
2024-10-30	Vision-Language Models Can Self-Improve Reasoning via Reflection	Kanzhi Cheng et.al.	2411.00855	null
2024-11-01	Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling	Yiwen Ding et.al.	2411.00750	link
2024-11-01	STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing	Jiaru Zou et.al.	2411.00387	null
2024-11-08	GRS-QA -- Graph Reasoning-Structured Question Answering Dataset	Anish Pahilajani et.al.	2411.00369	null
2024-10-31	Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning	Jinghan Zhang et.al.	2410.24155	null
2024-10-31	RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner	Fu-Chieh Chang et.al.	2410.23912	null
2024-10-31	OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models	Junda Wu et.al.	2410.23703	null
2024-10-30	ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning	Millennium Bismay et.al.	2410.23180	link
2024-10-30	On Memorization of Large Language Models in Logical Reasoning	Chulin Xie et.al.	2410.23123	null
2024-10-28	Causal Interventions on Causal Paths: Mapping GPT-2's Reasoning From Syntax to Semantics	Isabelle Lee et.al.	2410.21353	null
2024-10-28	Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments	Sangmim Song et.al.	2410.20666	null
2024-10-25	Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models	Danqing Wang et.al.	2410.20007	null
2024-10-25	Can Stories Help LLMs Reason? Curating Information Space Through Narrative	Vahid Sadiri Javadi et.al.	2410.19221	null
2024-10-18	Make LLMs better zero-shot reasoners: Structure-orientated autonomous reasoning	Pengfei He et.al.	2410.19000	link
2024-10-25	CLR-Bench: Evaluating Large Language Models in College-level Reasoning	Junnan Dong et.al.	2410.17558	null
2024-10-28	Non-myopic Generation of Language Models for Reasoning and Planning	Chang Ma et.al.	2410.17195	link
2024-11-06	Improving Causal Reasoning in Large Language Models: A Survey	Longxuan Yu et.al.	2410.16676	link
2024-10-22	A Statistical Analysis of LLMs' Self-Evaluation Using Proverbs	Ryosuke Sonoda et.al.	2410.16640	null
2024-10-21	Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models' Reasoning with Formal Logic	Jason Chan et.al.	2410.16502	null
2024-11-27	On Designing Effective RL Reward at Training Time for LLM Reasoning	Jiaxuan Gao et.al.	2410.15115	null
2025-01-28	Paths-over-Graph: Knowledge Graph Empowered Large Language Model Reasoning	Xingyu Tan et.al.	2410.14211	null
2024-10-21	Unconstrained Model Merging for Enhanced LLM Reasoning	Yiming Zhang et.al.	2410.13699	null
2024-10-16	Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models	Linhao Luo et.al.	2410.13080	link
2024-10-16	KcMF: A Knowledge-compliant Framework for Schema and Entity Matching with Fine-tuning-free LLMs	Yongqin Xu et.al.	2410.12480	null
2024-10-17	Enhancing LLM Trading Performance with Fact-Subjectivity Aware Reasoning	Qian Wang et.al.	2410.12464	link
2024-10-16	Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up	Jiahao Yuan et.al.	2410.12323	link
2024-10-16	Exploiting LLMs' Reasoning Capability to Infer Implicit Concepts in Legal Information Retrieval	Hai-Long Nguyen et.al.	2410.12154	null
2024-10-15	Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming	Yilun Hao et.al.	2410.12112	null
2024-10-12	OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models	Jun Wang et.al.	2410.09671	null
2024-10-11	P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains	Simeng Han et.al.	2410.09207	null
2024-10-11	Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning	Yunpeng Gao et.al.	2410.08500	null
2024-10-10	SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation	Hang Yin et.al.	2410.08189	null
2024-10-10	Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning	Amrith Setlur et.al.	2410.08146	null
2024-10-10	Automatic Curriculum Expert Iteration for Reliable LLM Reasoning	Zirui Zhao et.al.	2410.07627	link
2024-10-09	Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis	Ahmed Abdullah et.al.	2410.06841	null
2024-10-09	Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning	Xiyao Wang et.al.	2410.06508	null
2025-01-02	Filtering Discomforting Recommendations with Large Language Models	Jiahao Liu et.al.	2410.05411	null
2024-10-05	Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification	Zhenwen Liang et.al.	2410.05318	null
2024-10-06	Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval	Pengcheng Jiang et.al.	2410.04585	link
2024-10-03	The Role of Deductive and Inductive Reasoning in Large Language Models	Chengkun Cai et.al.	2410.02892	null
2024-10-02	Not All LLM Reasoners Are Created Equal	Arian Hosseini et.al.	2410.01748	null
2024-12-25	Interpretable Contrastive Monte Carlo Tree Search Reasoning	Zitian Gao et.al.	2410.01707	link
2024-10-02	VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment	Amirhossein Kazemnejad et.al.	2410.01679	link
2024-10-02	AHP-Powered LLM Reasoning for Multi-Criteria Evaluation of Open-Ended Responses	Xiaotian Lu et.al.	2410.01246	null
2024-10-01	Self-controller: Controlling LLMs with Multi-round Step-by-step Self-awareness	Xiao Peng et.al.	2410.00359	null
2024-10-01	Insight: A Multi-Modal Diagnostic Pipeline using LLMs for Ocular Surface Disease Diagnosis	Chun-Hsiao Yeh et.al.	2410.00292	null
2024-10-08	GUNDAM: Aligning Large Language Models with Graph Understanding	Sheng Ouyang et.al.	2409.20053	null
2024-09-27	Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs	Yanyuan Qiao et.al.	2409.18794	null
2024-10-23	Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning	Debargha Ganguly et.al.	2409.17270	null
2024-09-20	CSCE: Boosting LLM Reasoning by Simultaneous Enhancing of Casual Significance and Consistency	Kangsheng Wang et.al.	2409.17174	null
2024-09-20	Mufu: Multilingual Fused Learning for Low-Resource Translation with LLM	Zheng Wei Lim et.al.	2409.13949	null
2024-09-19	SituationAdapt: Contextual UI Optimization in Mixed Reality with Situation Awareness via LLM Reasoning	Zhipeng Li et.al.	2409.12836	null
2024-10-04	Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning	Jiaxin Wen et.al.	2409.12452	link
2024-12-16	Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data	Jiaming Zhou et.al.	2409.12437	link
2024-09-18	MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning	Justin Chih-Yao Chen et.al.	2409.12147	link
2024-11-05	Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent	Fatemeh Haji et.al.	2409.11527	link
2024-09-16	Enhancing RL Safety with Counterfactual LLM Reasoning	Dennis Gross et.al.	2409.10188	link
2024-09-11	Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation	SeongYeub Chu et.al.	2409.07355	link

(back to top)

LLM Evaluation

Publish Date	Title	Authors	PDF	Code
2025-07-22	Mind the Gap: Evaluating the Representativeness of Quantitative Medical Language Reasoning LLM Benchmarks for African Disease Burdens	Fred Mutisya et.al.	2507.16322	null
2025-07-18	Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark	Goeric Huybrechts et.al.	2507.15882	null
2025-07-21	Left Leaning Models: AI Assumptions on Economic Policy	Maxim Chupilkin et.al.	2507.15771	null
2025-07-21	From Queries to Criteria: Understanding How Astronomers Evaluate LLMs	Alina Hyk et.al.	2507.15715	null
2025-07-21	Evaluating Text Style Transfer: A Nine-Language Benchmark for Text Detoxification	Vitaly Protasov et.al.	2507.15557	null
2025-07-15	LLM-based ambiguity detection in natural language instructions for collaborative surgical robots	Ana Davila et.al.	2507.11525	null
2025-07-15	DCR: Quantifying Data Contamination in LLMs Evaluation	Cheng Xu et.al.	2507.11405	null
2025-07-17	SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks	Pavel Adamenko et.al.	2507.11059	null
2025-07-11	OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique	Wasi Uddin Ahmad et.al.	2507.09075	null
2025-07-18	From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation	Seokhee Hong et.al.	2507.08924	null
2025-07-11	A Third Paradigm for LLM Evaluation: Dialogue Game-Based Evaluation using clembench	David Schlangen et.al.	2507.08491	null
2025-07-07	Train-before-Test Harmonizes Language Model Rankings	Guanhua Zhang et.al.	2507.05195	null
2025-07-13	SymbolicThought: Integrating Language Models and Symbolic Reasoning for Consistent and Interpretable Human Relationship Understanding	Runcong Zhao et.al.	2507.04189	null
2025-07-09	Skewed Score: A statistical framework to assess autograders	Magda Dubois et.al.	2507.03772	null
2025-07-12	Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages	Samridhi Raj Sinha et.al.	2507.01853	null
2025-07-01	Pitfalls of Evaluating Language Models with Open Benchmarks	Md. Najib Hasan et.al.	2507.00460	null
2025-06-30	AutoEvoEval: An Automated Framework for Evolving Close-Ended LLM Evaluation Data	JiaRu Wu et.al.	2506.23735	null
2025-06-27	WildSpeech-Bench: Benchmarking Audio LLMs in Natural Speech Conversation	Jian Zhang et.al.	2506.21875	null
2025-06-25	DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs	Ruokai Yin et.al.	2506.20194	null
2025-06-23	Smart-LLaMA-DPO: Reinforced Large Language Model for Explainable Smart Contract Vulnerability Detection	Lei Yu et.al.	2506.18245	null
2025-06-22	The Democratic Paradox in Large Language Models' Underestimation of Press Freedom	I. Loaiza et.al.	2506.18045	null
2025-06-21	CodeMorph: Mitigating Data Leakage in Large Language Model Assessment	Hongzhou Rao et.al.	2506.17627	null
2025-06-20	Re-Evaluating Code LLM Benchmarks Under Semantic Mutation	Zhiyuan Pan et.al.	2506.17369	null
2025-06-19	LMR-BENCH: Evaluating LLM Agent's Ability on Reproducing Language Modeling Research	Shuo Yan et.al.	2506.17335	null
2025-06-20	Do We Need Large VLMs for Spotting Soccer Actions?	Ritabrata Chakraborty et.al.	2506.17144	null
2025-06-17	SFT-GO: Supervised Fine-Tuning with Group Optimization for Large Language Models	Gyuhak Kim et.al.	2506.15021	null
2025-06-19	MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation	Xueqing Peng et.al.	2506.14028	null
2025-06-18	The NordDRG AI Benchmark for Large Language Models	Tapio Pitkäranta et.al.	2506.13790	link
2025-06-20	Domain Specific Benchmarks for Evaluating Multimodal Large Language Models	Khizar Anjum et.al.	2506.12958	null
2025-06-06	The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs	Songyang Liu et.al.	2506.11094	null
2025-05-22	NSW-EPNews: A News-Augmented Benchmark for Electricity Price Forecasting with LLMs	Zhaoge Bi et.al.	2506.11050	null
2025-04-23	Impact of Comments on LLM Comprehension of Legacy Code	Rock Sabetto et.al.	2506.11007	null
2025-06-12	LLM-Driven Personalized Answer Generation and Evaluation	Mohammadreza Molavi et.al.	2506.10829	null
2025-06-11	Textual Bayes: Quantifying Uncertainty in LLM-Based Systems	Brendan Leigh Ross et.al.	2506.10060	null
2025-06-16	Metritocracy: Representative Metrics for Lite Benchmarks	Ariel Procaccia et.al.	2506.09813	null
2025-06-10	Breaking the ICE: Exploring promises and challenges of benchmarks for Inference Carbon & Energy estimation for LLMs	Samarth Sikand et.al.	2506.08727	null
2025-06-10	Sample Efficient Demonstration Selection for In-Context Learning	Kiran Purohit et.al.	2506.08607	link
2025-06-09	How Benchmark Prediction from Fewer Data Misses the Mark	Guanhua Zhang et.al.	2506.07673	link
2025-06-09	Beyond Benchmarks: A Novel Framework for Domain-Specific LLM Evaluation and Knowledge Mapping	Nitin Sharma et.al.	2506.07658	null
2025-06-09	Vuyko Mistral: Adapting LLMs for Low-Resource Dialectal Translation	Roman Kyslyi et.al.	2506.07617	null
2025-06-05	LLM-First Search: Self-Guided Exploration of the Solution Space	Nathan Herr et.al.	2506.05213	link
2025-06-05	Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation	Noy Sternlicht et.al.	2506.05062	link
2025-06-04	BEAR: BGP Event Analysis and Reporting	Hanqing Li et.al.	2506.04514	link
2025-06-04	N $^2$ : A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion	Caleb Chin et.al.	2506.04166	link
2025-06-04	Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis	Kejian Zhu et.al.	2506.04142	null
2025-06-03	NetPress: Dynamically Generated LLM Benchmarks for Network Applications	Yajie Zhou et.al.	2506.03231	link
2025-06-04	PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs	Ze Yu Zhang et.al.	2506.02965	null
2025-06-02	Multilingual Definition Modeling	Edison Marrese-Taylor et.al.	2506.01489	null
2025-06-01	Taming LLMs by Scaling Learning Rates with Gradient Grouping	Siyuan Li et.al.	2506.01049	null
2025-06-06	Data Swarms: Optimizable Generation of Synthetic Evaluation Data	Shangbin Feng et.al.	2506.00741	null
2025-05-31	AgentAuditor: Human-Level Safety and Security Evaluation for LLM Agents	Hanjun Luo et.al.	2506.00641	null
2025-05-31	BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation	Eunsu Kim et.al.	2506.00482	null
2025-05-30	MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs	Gabrielle Kaili-May Liu et.al.	2505.24858	link
2025-05-30	Benchmarking Large Language Models for Cryptanalysis and Mismatched-Generalization	Utsav Maskey et.al.	2505.24621	null
2025-05-30	Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation	Naila Shafirni Hidayat et.al.	2505.24263	link
2025-05-29	Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs	Yinong Oliver Wang et.al.	2505.23996	null
2025-05-29	Revisiting Uncertainty Estimation and Calibration of Large Language Models	Linwei Tao et.al.	2505.23854	null
2025-05-28	Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective	Qingchuan Ma et.al.	2505.23833	link
2025-06-24	MemAscend: System Memory Optimization for SSD-Offloaded LLM Fine-Tuning	Yong-Cheng Liaw et.al.	2505.23254	null
2025-07-03	Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding	Chengyue Wu et.al.	2505.22618	null
2025-05-29	Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition	Hanting Chen et.al.	2505.22375	null
2025-05-28	ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments	Gili Lior et.al.	2505.22169	null
2025-05-28	Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate	Ashim Gupta et.al.	2505.21999	null
2025-05-21	SIMCOPILOT: Evaluating Large Language Models for Copilot-Style Code Generation	Mingchao Jiang et.al.	2505.21514	null
2025-05-26	Dynamically Learned Test-Time Model Routing in Language Model Zoos with Service Level Guarantees	Herbert Woisetschläger et.al.	2505.19947	null
2025-05-26	BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs	Guilong Lu et.al.	2505.19457	link
2025-05-25	Likert or Not: LLM Absolute Relevance Judgments on Fine-Grained Ordinal Scales	Charles Godfrey et.al.	2505.19334	null
2025-05-25	Can Large Language Models Infer Causal Relationships from Real-World Text?	Ryan Saklad et.al.	2505.18931	null
2025-05-24	MedScore: Factuality Evaluation of Free-Form Medical Answers	Heyuan Huang et.al.	2505.18452	link
2025-05-23	How Can I Publish My LLM Benchmark Without Giving the True Answers Away?	Takashi Ishida et.al.	2505.18102	null
2025-05-23	ELSPR: Evaluator LLM Training Data Self-Purification on Non-Transitive Preferences via Tournament Graph Reconstruction	Yan Yu et.al.	2505.17691	null
2025-05-22	CaseReportBench: An LLM Benchmark Dataset for Dense Information Extraction in Clinical Case Reports	Xiao Yu Cindy Zhang et.al.	2505.17265	null
2025-05-21	NEXT-EVAL: Next Evaluation of Traditional and LLM Web Data Record Extraction	Soyeon Kim et.al.	2505.17125	null
2025-05-21	Any Large Language Model Can Be a Reliable Judge: Debiasing with a Reasoning-based Bias Detector	Haoyan Yang et.al.	2505.17100	null
2025-05-22	AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios	Yunjia Qi et.al.	2505.16944	link
2025-05-22	CASTILLO: Characterizing Response Length Distributions of Large Language Models	Daniel F. Perez-Ramirez et.al.	2505.16881	link
2025-05-21	Reverse Engineering Human Preferences with Reinforcement Learning	Lisa Alazraki et.al.	2505.15795	null
2025-05-21	An Empirical Study of the Anchoring Effect in LLMs: Existence, Mechanism, and Potential Mitigations	Yiming Huang et.al.	2505.15392	null
2025-05-21	Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory	Hongli Zhou et.al.	2505.15055	link
2025-05-20	FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain	Rohan Deb et.al.	2505.14826	null
2025-05-20	Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding?	Bo Feng et.al.	2505.14321	null
2025-05-29	YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering	Jennifer D'Souza et.al.	2505.14279	null
2025-05-20	Think-J: Learning to Think for Generative LLM-as-a-Judge	Hui Huang et.al.	2505.14268	link
2025-05-19	4Hammer: a board-game reinforcement learning environment for the hour long time frame	Massimo Fioravanti et.al.	2505.13638	link
2025-05-18	KG-QAGen: A Knowledge-Graph-Based Framework for Systematic Question Generation and Long-Context LLM Evaluation	Nikita Tatarinov et.al.	2505.12495	link
2025-05-17	Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation	Vincent Koc et.al.	2505.12058	link
2025-05-21	Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization	Ximing Dong et.al.	2505.10736	null
2025-05-13	A suite of LMs comprehend puzzle statements as well as humans	Adele E Goldberg et.al.	2505.08996	null
2025-05-13	Towards Contamination Resistant Benchmarks	Rahmatullah Musawi et.al.	2505.08389	null
2025-05-12	A Case Study Investigating the Role of Generative AI in Quality Evaluations of Epics in Agile Software Development	Werner Geyer et.al.	2505.07664	null
2025-05-09	LLMs Get Lost In Multi-Turn Conversation	Philippe Laban et.al.	2505.06120	link
2025-05-15	Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information	Joshua Harris et.al.	2505.06046	null
2025-05-02	Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs	Ganghua Wang et.al.	2505.03814	null
2025-05-29	am-ELO: A Stable Framework for Arena-based LLM Evaluation	Zirui Liu et.al.	2505.03475	null
2025-05-05	Developing A Framework to Support Human Evaluation of Bias in Generated Free Response Text	Jennifer Healey et.al.	2505.03053	null
2025-05-01	Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation	Vaidehi Patil et.al.	2505.01456	link
2025-04-30	A Report on the llms evaluating the high school questions	Zhu Jiawei et.al.	2505.00057	null
2025-04-30	RDF-Based Structured Quality Assessment Representation of Multilingual LLM Evaluations	Jonas Gwozdz et.al.	2504.21605	null
2025-04-30	Confidence in Large Language Model Evaluation: A Bayesian Approach to Limited-Sample Challenges	Xiao Xiao et.al.	2504.21303	null
2025-04-27	LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations	Laura Dietz et.al.	2504.19076	null
2025-04-23	Agree to Disagree? A Meta-Evaluation of LLM Misgendering	Arjun Subramonian et.al.	2504.17075	link
2025-04-23	IberBench: LLM Evaluation on Iberian Languages	José Ángel González et.al.	2504.16921	null
2025-04-23	Private Federated Learning using Preference-Optimized Synthetic Data	Charlie Hou et.al.	2504.16438	link
2025-04-29	Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark	Jasper Götting et.al.	2504.16137	null
2025-05-16	DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain	Enhao Huang et.al.	2504.16116	null
2025-04-22	Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach	Ruizhe Li et.al.	2504.15784	null
2025-04-20	Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey	Ahsan Bilal et.al.	2504.14520	null
2025-04-20	Information Diffusion and Preferential Attachment in a Network of Large Language Models	Adit Jain et.al.	2504.14438	null
2025-04-18	MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks	Jaime Raldua Veuthey et.al.	2504.14039	null
2025-04-17	ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition	Haidar Khan et.al.	2504.12562	link
2025-04-17	ELAB: Extensive LLM Alignment Benchmark in Persian Language	Zahra Pourbahman et.al.	2504.12553	null
2025-04-16	MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models	Hang Yuan et.al.	2504.12234	null
2025-04-17	Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation	Julia Kreutzer et.al.	2504.11829	null
2025-04-14	HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving	Avinash Kumar et.al.	2504.10724	null
2025-05-19	Large Language Models Could Be Rote Learners	Yuyang Xu et.al.	2504.08300	null
2025-05-30	DeepSeek-R1 vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?	Daniil Larionov et.al.	2504.08120	null
2025-05-15	Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric	Yixin Cao et.al.	2504.07440	link
2025-06-20	TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models	Sher Badshah et.al.	2504.07385	null
2025-04-08	NativQA Framework: Enabling LLMs with Native, Local, and Everyday Knowledge	Firoj Alam et.al.	2504.05995	null
2025-04-09	How Accurately Do Large Language Models Understand Code?	Sabaat Haroon et.al.	2504.04372	null
2025-04-04	Do LLM Evaluators Prefer Themselves for a Reason?	Wei-Lin Chen et.al.	2504.03846	link
2025-04-15	Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning	Kai Ye et.al.	2504.03784	null
2025-04-04	Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency	Erik Johannes Husom et.al.	2504.03360	null
2025-04-02	YourBench: Easy Custom Evaluation Sets for Everyone	Sumuk Shashidhar et.al.	2504.01833	link
2025-04-08	Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?	Kai Yan et.al.	2504.00509	null
2025-04-01	HRET: A Self-Evolving LLM Evaluation Toolkit for Korean	Hanwool Lee et.al.	2503.22968	null
2025-03-27	CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?	Jiefu Ou et.al.	2503.21717	link
2025-03-27	Evaluating book summaries from internal knowledge in Large Language Models: a cross-model and semantic consistency approach	Javier Coronado-Blázquez et.al.	2503.21613	null
2025-05-19	Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models	Haoxiang Sun et.al.	2503.21380	link
2025-03-25	FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models	Dahyun Jung et.al.	2503.19540	link
2025-05-30	LLM Benchmarking with LLaMA2: Evaluating Code Development Performance Across Multiple Programming Languages	Patrick Diehl et.al.	2503.19217	null
2025-03-28	Overtrained Language Models Are Harder to Fine-Tune	Jacob Mitchell Springer et.al.	2503.19206	null
2025-03-25	Decorum: A Language-Based Approach For Style-Conditioned Synthesis of Indoor 3D Scenes	Kelly O. Marshall et.al.	2503.18155	null
2025-05-14	Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark	Zheqing Li et.al.	2503.17599	null
2025-03-20	The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination	Yifan Sun et.al.	2503.16402	link
2025-03-20	Fùxì: A Benchmark for Evaluating Language Models on Ancient Chinese Text Understanding and Generation	Shangqing Zhao et.al.	2503.15837	link
2025-06-08	Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering	Francesco Maria Molfese et.al.	2503.14996	null
2025-03-13	It is Too Many Options: Pitfalls of Multiple-Choice Questions in Generative AI and Medical Education	Shrutika Singh et.al.	2503.13508	null
2025-03-17	REPA: Russian Error Types Annotation for Evaluating Text Generation and Judgment Capabilities	Alexander Pugachev et.al.	2503.13102	null
2025-03-14	V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning	Zixu Cheng et.al.	2503.11495	null
2025-06-03	OASST-ETC Dataset: Alignment Signals from Eye-tracking Analysis of LLM Responses	Angela Lopez-Cardona et.al.	2503.10927	link
2025-03-13	Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data	Paul Quinlan et.al.	2503.10883	null
2025-03-13	Commenting Higher-level Code Unit: Full Code, Reduced Code, or Hierarchical Code Summarization	Weisong Sun et.al.	2503.10737	null
2025-03-12	Medical Large Language Model Benchmarks Should Prioritize Construct Validity	Ahmed Alaa et.al.	2503.10694	null
2025-04-17	ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition	Hisham A. Alyahya et.al.	2503.10673	link
2025-05-20	RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs	Zhongzhan Huang et.al.	2503.10657	link
2025-05-26	MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation	Weihao Xuan et.al.	2503.10497	null
2025-03-12	Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts	Hongyu Chen et.al.	2503.09347	null
2025-03-08	SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?	Xudong Lu et.al.	2503.06029	null
2025-03-07	SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs	Samir Abdaljalil et.al.	2503.05980	null
2025-03-07	RocketEval: Efficient Automated LLM Evaluation via Grading Checklist	Tianjun Wei et.al.	2503.05142	link
2025-02-09	Peeking Behind Closed Doors: Risks of LLM Evaluation by Private Data Curators	Hritik Bansal et.al.	2503.04756	null
2025-03-07	Human Implicit Preference-Based Policy Fine-tuning for Multi-Agent Reinforcement Learning in USV Swarm	Hyeonjun Kim et.al.	2503.03796	null
2025-03-04	SAGE: Steering and Refining Dialog Generation with State-Action Augmentation	Yizhe Zhang et.al.	2503.03040	link
2025-05-28	Position: Don't Use the CLT in LLM Evals With Fewer Than a Few Hundred Datapoints	Sam Bowyer et.al.	2503.01747	null
2025-03-04	DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation	Eliya Habba et.al.	2503.01622	null
2025-03-03	None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering	Zhi Rui Tam et.al.	2503.01550	null
2025-03-03	SwiLTra-Bench: The Swiss Legal Translation Benchmark	Joel Niklaus et.al.	2503.01372	null
2025-03-03	LLM-Advisor: An LLM Benchmark for Cost-efficient Path Planning across Multiple Terrains	Ling Xiao et.al.	2503.01236	null
2025-03-02	FunBench: Benchmarking Fundus Reading Skills of MLLMs	Qijie Wei et.al.	2503.00901	null
2025-03-02	Towards Efficient Educational Chatbots: Benchmarking RAG Frameworks	Umar Ali Khan et.al.	2503.00781	null
2025-04-12	Evaluating Personalized Tool-Augmented LLMs from the Perspectives of Personalization and Proactivity	Yupu Hao et.al.	2503.00771	link
2025-03-01	U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack	Yunfan Gao et.al.	2503.00353	link
2025-02-28	Jawaher: A Multidialectal Dataset of Arabic Proverbs for LLM Benchmarking	Samar M. Magdy et.al.	2503.00231	null
2025-02-28	Consistency Evaluation of News Article Summaries Generated by Large (and Small) Language Models	Colleen Gilhuly et.al.	2502.20647	null
2025-05-23	Is Your Paper Being Reviewed by an LLM? Benchmarking AI Text Detection in Peer Review	Sungduk Yu et.al.	2502.19614	null
2025-02-26	Exploring Graph Tasks with Pure LLMs: A Comprehensive Benchmark and Investigation	Yuxiang Wang et.al.	2502.18771	link
2025-02-23	Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation	Simin Chen et.al.	2502.17521	link
2025-05-23	Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective	Chengyin Xu et.al.	2502.17262	null
2025-02-24	Detecting Benchmark Contamination Through Watermarking	Tom Sander et.al.	2502.17259	null
2025-02-24	Predicting Liquidity-Aware Bond Yields using Causal GANs and Deep Reinforcement Learning with LLM Evaluation	Jaskaran Singh Walia et.al.	2502.17011	null
2025-02-24	AlphaAgent: LLM-Driven Alpha Mining with Regularized Exploration to Counteract Alpha Decay	Ziyi Tang et.al.	2502.16789	link
2025-01-30	Retrieval Augmented Generation Based LLM Evaluation For Protocol State Machine Inference With Chain-of-Thought Reasoning	Youssef Maklad et.al.	2502.15727	null
2025-03-10	Prompt-to-Leaderboard	Evan Frick et.al.	2502.14855	link
2025-03-28	SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines	M-A-P Team et.al.	2502.14739	null
2025-02-20	SEA-HELM: Southeast Asian Holistic Evaluation of Language Models	Yosephine Susanto et.al.	2502.14301	null
2025-02-20	Transfer-Prompting: Enhancing Cross-Task Adaptation in Large Language Models via Dual-Stage Prompts Optimization	Yupeng Chang et.al.	2502.14211	link
2025-02-19	Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above	Nishant Balepur et.al.	2502.14127	null
2025-02-19	STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models	Narun Raman et.al.	2502.13119	null
2025-02-18	HPSS: Heuristic Prompting Strategy Search for LLM Evaluators	Bosi Wen et.al.	2502.13031	null
2025-05-23	None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks	Eva Sánchez Salido et.al.	2502.12896	null
2025-04-08	Safe at the Margins: A General Approach to Safety Alignment in Low-Resource English Languages -- A Singlish Case Study	Isaac Lim et.al.	2502.12485	null
2025-02-17	Deviation Ratings: A General, Clone-Invariant Rating Method	Luke Marris et.al.	2502.11645	null
2025-02-21	TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking	Shahriar Kabir Nahin et.al.	2502.11187	null
2025-02-15	Rule-Bottleneck Reinforcement Learning: Joint Explanation and Decision Optimization for Resource Allocation with Language Agents	Mauricio Tec et.al.	2502.10732	null
2025-03-02	An Empirical Analysis of Uncertainty in Large Language Model Evaluations	Qiujie Xie et.al.	2502.10709	link
2025-02-25	Accelerating Unbiased LLM Evaluation via Synthetic Feedback	Zhaoyi Zhou et.al.	2502.10563	link
2025-02-14	MathConstruct: Challenging LLM Reasoning with Constructive Proofs	Mislav Balunović et.al.	2502.10197	null
2025-02-13	Enhancing Jailbreak Attacks via Compliance-Refusal-Based Initialization	Amit Levi et.al.	2502.09755	null
2025-02-13	NestQuant: Nested Lattice Quantization for Matrix Products and LLMs	Semyon Savkin et.al.	2502.09720	null
2025-02-12	The Science of Evaluating Foundation Models	Jiayi Yuan et.al.	2502.09670	null
2025-02-13	Copilot Arena: A Platform for Code LLM Evaluation in the Wild	Wayne Chi et.al.	2502.09328	null
2025-02-12	Revisiting 3D LLM Benchmarks: Are We Really Testing 3D Capabilities?	Jiahe Jin et.al.	2502.08503	link
2025-02-11	Forget What You Know about LLMs Evaluations -- LLMs are Like a Chameleon	Nurit Cohen-Inger et.al.	2502.07445	link
2025-02-10	Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph Coloring	Alex Heyman et.al.	2502.07087	link
2025-02-10	Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models	Lujain Ibrahim et.al.	2502.07077	null
2025-02-07	LLM-Supported Natural Language to Bash Translation	Finnian Westenfelder et.al.	2502.06858	link
2025-02-15	Self-Supervised Prompt Optimization	Jinyu Xiang et.al.	2502.06855	link
2025-02-10	Resurrecting saturated LLM benchmarks with adversarial encoding	Igor Ivanov et.al.	2502.06738	null
2025-02-10	Automatic Evaluation of Healthcare LLMs Beyond Question-Answering	Anna Arias-Duart et.al.	2502.06666	null
2025-02-10	Unbiased Evaluation of Large Language Models from a Causal Perspective	Meilin Chen et.al.	2502.06655	null
2025-02-10	LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks	Xin Zhou et.al.	2502.06215	null
2025-02-05	Aero-LLM: A Distributed Framework for Secure UAV Communication and Intelligent Decision-Making	Balakrishnan Dharmalingam et.al.	2502.05220	null
2025-02-06	TruthFlow: Truthful LLM Generation via Representation Flow Correction	Hanyu Wang et.al.	2502.04556	null
2025-02-05	How do Humans and Language Models Reason About Creativity? A Comparative Analysis	Antonio Laverghetta Jr. et.al.	2502.03253	null
2025-03-22	On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation	Nghiem T. Diep et.al.	2502.03029	null
2025-02-02	LLM-Powered Benchmark Factory: Reliable, Generic, and Efficient	Peiwen Yuan et.al.	2502.01683	link
2025-02-02	HASSLE-free: A unified Framework for Sparse plus Low-Rank Matrix Decomposition for LLMs	Mehdi Makni et.al.	2502.00899	null
2025-02-01	DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks	Zhiliang Chen et.al.	2502.00270	link
2025-01-30	Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation	Muhammed Yusuf Kocyigit et.al.	2501.18771	null
2025-01-31	ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation	Minghua He et.al.	2501.18460	null
2025-02-01	LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering	Beiming Liu et.al.	2501.17183	null
2025-03-18	An LLM Benchmark for Addressee Recognition in Multi-modal Multi-party Dialogue	Koji Inoue et.al.	2501.16643	null
2025-01-26	HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI	Tidor-Vlad Pricope et.al.	2501.15627	null
2025-01-23	Question Answering on Patient Medical Records with Private Fine-Tuned LLMs	Sara Kothari et.al.	2501.13687	null
2025-01-10	CodEv: An Automated Grading Framework Leveraging Large Language Models for Consistent and Constructive Feedback	En-Qi Tseng et.al.	2501.10421	null
2025-01-15	Towards Multilingual LLM Evaluation for Baltic and Nordic languages: A study on Lithuanian History	Yevhen Kostiuk et.al.	2501.09154	null
2025-01-13	Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles	Samia Touileb et.al.	2501.07718	null
2025-01-03	FLAME: Financial Large-Language Model Assessment and Metrics Evaluation	Jiayu Guo et.al.	2501.06211	link
2025-01-07	MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems	Yannis Katsis et.al.	2501.03468	link
2025-01-05	Evaluating Large Language Models Against Human Annotators in Latent Content Analysis: Sentiment, Political Leaning, Emotional Intensity, and Sarcasm	Ljubisa Bojic et.al.	2501.02532	null
2025-01-04	LLMzSzŁ: a comprehensive LLM benchmark for Polish	Krzysztof Jassem et.al.	2501.02266	null
2025-03-25	VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM	Yuqian Yuan et.al.	2501.00599	link
2025-01-04	Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation	M. Ali Bayram et.al.	2501.00593	null
2024-12-31	Echoes in AI: Quantifying Lack of Plot Diversity in LLM Outputs	Weijia Xu et.al.	2501.00273	null
2024-12-30	EVOLVE: Emotion and Visual Output Learning via LLM Evaluation	Jordan Sinclair et.al.	2412.20632	null
2024-12-24	Muse: A Multimodal Conversational Recommendation Dataset with Scenario-Grounded User Profiles	Zihan Wang et.al.	2412.18416	null
2024-12-24	A Statistical Framework for Ranking LLM-Based Chatbots	Siavash Ameli et.al.	2412.18407	link
2025-01-25	DeepCRCEval: Revisiting the Evaluation of Code Review Comment Generation	Junyi Lu et.al.	2412.18291	null
2024-12-23	CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language Models	Ruibo Tu et.al.	2412.17970	link
2025-01-02	Baichuan4-Finance Technical Report	Hanyu Zhang et.al.	2412.15270	null
2024-12-19	ObjVariantEnsemble: Advancing Point Cloud LLM Evaluation in Challenging Scenes with Subtly Distinguished Objects	Qihang Cao et.al.	2412.14837	null
2024-12-18	AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge	Xiaobao Wu et.al.	2412.13670	link
2025-02-16	Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning	Eitan Wagner et.al.	2412.13631	null
2025-02-17	OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain	Shuting Wang et.al.	2412.13018	link
2024-12-10	How to Choose a Threshold for an Evaluation Metric for Large Language Models	Bhaskarjit Sarmah et.al.	2412.12148	null
2024-12-15	Dual Traits in Probabilistic Reasoning of Large Language Models	Shenxiong Li et.al.	2412.11009	link
2024-12-30	LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation	Eunsu Kim et.al.	2412.10424	null
2024-12-13	Cultural Evolution of Cooperation among LLM Agents	Aron Vallinder et.al.	2412.10270	null
2024-12-12	Towards Understanding the Robustness of LLM-based Evaluations under Perturbations	Manav Chaudhary et.al.	2412.09269	null
2024-12-10	BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities	Sahal Shaji Mullappilly et.al.	2412.07769	link
2025-02-28	PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models	Qian Zhang et.al.	2412.06287	link
2024-12-02	AI Benchmarks and Datasets for LLM Evaluation	Todor Ivanov et.al.	2412.01020	null
2024-11-30	Evaluating the Consistency of LLM Evaluators	Noah Lee et.al.	2412.00543	null
2024-11-29	MIMDE: Exploring the Use of Synthetic vs Human Data for Evaluating Multi-Insight Multi-Document Extraction Tasks	John Francis et.al.	2411.19689	null
2024-11-29	Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension Ability	Yujin Han et.al.	2411.19456	link
2024-11-27	Is my Meeting Summary Good? Estimating Quality with a Multi-LLM Evaluator	Frederic Kirstein et.al.	2411.18444	null
2025-01-17	CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity	Zhengmin Yu et.al.	2411.16239	link
2024-11-25	SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text	Reshmi Ghosh et.al.	2411.16077	null
2024-11-26	Do LLMs Agree on the Creativity Evaluation of Alternative Uses?	Abdullah Al Rabeyah et.al.	2411.15560	null
2025-02-17	Ranking Unraveled: Recipes for LLM Rankings in Head-to-Head AI Combat	Roland Daynauth et.al.	2411.14483	link
2024-11-21	Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models	Lovish Madaan et.al.	2411.14103	null
2024-11-21	An Evaluation-Driven Approach to Designing LLM Agents: Process and Architecture	Boming Xia et.al.	2411.13768	null
2024-11-21	A Framework for Evaluating LLMs Under Task Indeterminacy	Luke Guerdan et.al.	2411.13760	null
2024-11-12	Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning	Linyang He et.al.	2411.07533	null
2024-11-13	Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models	Yancheng He et.al.	2411.07140	null
2024-11-09	Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models	Xiaojun Wu et.al.	2411.06272	link
2025-02-09	ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding	Israel Abebe Azime et.al.	2411.05049	null
2024-11-07	Bayesian Calibration of Win Rate Estimation with LLM Evaluators	Yicheng Gao et.al.	2411.04424	link
2024-11-05	Enhancing LLM Evaluations: The Garbling Trick	William F. Bradley et.al.	2411.01533	null
2025-02-19	Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models	Seonil Son et.al.	2411.01281	null
2025-02-07	Mastering the Craft of Data Synthesis for CodeLLMs	Meng Chen et.al.	2411.00005	link
2024-10-28	Project MPG: towards a generalized performance benchmark for LLM capabilities	Lucas Spangher et.al.	2410.22368	null
2024-10-29	Self-Preference Bias in LLM-as-a-Judge	Koki Wataoka et.al.	2410.21819	null
2024-10-28	Unveiling Context-Aware Criteria in Self-Assessing LLMs	Taneesh Gupta et.al.	2410.21545	null
2024-10-27	LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization	Jui-Nan Yen et.al.	2410.20625	link
2024-10-26	Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in Expert Knowledge Tasks	Annalisa Szymanski et.al.	2410.20266	null
2024-10-23	MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning	Jingfan Zhang et.al.	2410.18035	null
2025-02-21	Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements	Isamu Isozaki et.al.	2410.17141	link
2024-10-21	CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution	Maosong Cao et.al.	2410.16256	link
2025-01-26	mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation	Nishat Raihan et.al.	2410.15037	link
2024-10-19	CAP: Data Contamination Detection via Consistency Amplification	Yi Zhao et.al.	2410.15005	null
2024-10-18	Enabling Scalable Evaluation of Bias Patterns in Medical LLMs	Hamed Fayyaz et.al.	2410.14763	link
2024-11-06	Diverging Preferences: When do Annotators Disagree and do Models Know?	Michael JQ Zhang et.al.	2410.14632	null
2024-10-18	Combining Entropy and Matrix Nuclear Norm for Enhanced Evaluation of Language Models	James Vo et.al.	2410.14480	null
2024-10-21	BenTo: Benchmark Task Reduction with In-Context Transferability	Hongyu Zhao et.al.	2410.13804	link
2024-10-16	BenchmarkCards: Large Language Model and Risk Reporting	Anna Sokol et.al.	2410.12974	null
2025-02-01	Language Model Preference Evaluation with Multiple Weak Evaluators	Zhengyu Hu et.al.	2410.12869	link
2024-10-11	Enterprise Benchmarks for Large Language Model Evaluation	Bing Zhang et.al.	2410.12857	link
2024-10-16	An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation	Junjie Chen et.al.	2410.12265	null
2024-10-15	Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers	Lorenzo Pacchiardi et.al.	2410.11672	link
2024-10-15	Black-box Uncertainty Quantification Method for LLM-as-a-Judge	Nico Wagner et.al.	2410.11594	null
2024-10-14	Jailbreak Instruction-Tuned LLMs via end-of-sentence MLP Re-weighting	Yifan Luo et.al.	2410.10150	null
2024-12-13	HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics	Jingxuan Fan et.al.	2410.09988	link
2024-10-15	LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models	Han Qiu et.al.	2410.09962	link
2024-10-17	Towards Multilingual LLM Evaluation for European Languages	Klaudia Thellmann et.al.	2410.08928	null
2024-10-11	Test-driven Software Experimentation with LASSO: an LLM Benchmarking Example	Marcus Kessel et.al.	2410.08911	null
2024-10-10	Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks	Mathis Pink et.al.	2410.08133	null
2025-02-03	COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act	Philipp Guldimann et.al.	2410.07959	link
2024-11-06	News Reporter: A Multi-lingual LLM Framework for Broadcast T.V News	Tarun Jain et.al.	2410.07520	null
2024-10-09	Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates	Xiaosen Zheng et.al.	2410.07137	link
2024-10-09	ReIFE: Re-evaluating Instruction-Following Evaluation	Yixin Liu et.al.	2410.07069	link
2024-10-08	Active Evaluation Acquisition for Efficient LLM Benchmarking	Yang Li et.al.	2410.05952	null
2024-10-07	TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles	Qingchen Yu et.al.	2410.05262	link
2024-10-01	Language Enhanced Model for Eye (LEME): An Open-Source Ophthalmology-Specific Large Language Model	Aidan Gilson et.al.	2410.03740	null
2024-10-04	TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation	Jonathan Cook et.al.	2410.03608	null
2024-10-04	Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores	Robert E. Blackwell et.al.	2410.03492	null
2024-10-29	AIME: AI System Optimization via Multiple LLM Evaluators	Bhrij Patel et.al.	2410.03131	null
2024-10-02	Comparing Criteria Development Across Domain Experts, Lay Users, and Models in Large Language Model Evaluation	Annalisa Szymanski et.al.	2410.02054	null
2024-10-02	Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models	Joseph Lee et.al.	2410.01795	link
2024-10-03	Extending Context Window of Large Language Models from a Distributional Perspective	Yingsheng Wu et.al.	2410.01490	link
2024-10-02	ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving	Yifan Qiao et.al.	2410.01228	null
2024-10-01	ViDAS: Vision-based Danger Assessment and Scoring	Pranav Gupta et.al.	2410.00477	null
2024-10-01	PclGPT: A Large Language Model for Patronizing and Condescending Language Detection	Hongbo Wang et.al.	2410.00361	link
2024-11-26	LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models	Haitao Li et.al.	2409.20288	link
2024-09-29	Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems	Xuyang Wu et.al.	2409.19804	link
2024-10-19	Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models	Xin Li et.al.	2409.19667	link
2024-10-05	IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation	Fan Lin et.al.	2409.18892	link
2024-12-13	A Character-Centric Creative Story Generation via Imagination	Kyeongman Park et.al.	2409.16667	null
2024-09-25	Judgment of Thoughts: Courtroom of the Binary Logical Reasoning in Large Language Models	Sungjune Park et.al.	2409.16635	null
2024-12-18	Kalahi: A handcrafted, grassroots cultural LLM evaluation suite for Filipino	Jann Railey Montalan et.al.	2409.15380	link
2024-12-16	MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators	Qingyu Lu et.al.	2409.14335	link
2024-09-21	ChemEval: A Comprehensive Multi-Level Chemical Evaluation for Large Language Models	Yuqing Huang et.al.	2409.13989	link
2024-12-17	AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs	Basel Mousi et.al.	2409.11404	null
2024-10-02	LLM-as-a-Judge & Reward Model: What They Can and Cannot Do	Guijin Son et.al.	2409.11239	null
2024-12-08	Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges	Vinay Samuel et.al.	2409.09927	link
2024-09-13	Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia	Fajri Koto et.al.	2409.08564	null
2024-09-09	Assessing SPARQL capabilities of Large Language Models	Lars-Peter Meyer et.al.	2409.05925	link
2024-10-08	LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs	Yuhao Wu et.al.	2409.02076	link
2024-10-14	Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation	Jasper Dekoninck et.al.	2409.00696	null
2024-08-26	Evaluating ChatGPT on Nuclear Domain-Specific Data	Muhammad Anwar et.al.	2409.00090	null
2024-08-28	LLMSecCode: Evaluating Large Language Models for Secure Coding	Anton Rydén et.al.	2408.16100	link
2024-08-26	LLM-3D Print: Large Language Models To Monitor and Control 3D Printing	Yayati Jadhav et.al.	2408.14307	null
2024-08-26	Epidemic Information Extraction for Event-Based Surveillance using Large Language Models	Sergio Consoli et.al.	2408.14277	null
2024-10-04	MobileQuant: Mobile-friendly Quantization for On-device Language Models	Fuwen Tan et.al.	2408.13933	link
2024-08-23	LalaEval: A Holistic Human Evaluation Framework for Domain-Specific Large Language Models	Chongyan Sun et.al.	2408.13338	null
2024-08-23	Open Llama2 Model for the Lithuanian Language	Artūras Nakvosas et.al.	2408.12963	null
2024-08-23	LIMP: Large Language Model Enhanced Intent-aware Mobility Prediction	Songwei Li et.al.	2408.12832	link
2024-12-20	Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts	Jiaqing Liu et.al.	2408.09688	null
2024-08-20	Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge	Ravi Raju et.al.	2408.08808	null
2024-10-16	The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation	Samee Arif et.al.	2408.08688	link
2024-10-19	Persona is a Double-edged Sword: Mitigating the Negative Impact of Role-playing Prompts in Zero-shot Reasoning Tasks	Junseok Kim et.al.	2408.08631	null

(back to top)

LLM MLLM

Publish Date	Title	Authors	PDF	Code
2025-07-23	Yume: An Interactive World Generation Model	Xiaofeng Mao et.al.	2507.17744	null
2025-07-23	Flow Matching Meets Biology and Life Science: A Survey	Zihao Li et.al.	2507.17731	null
2025-07-23	BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems	Malsha Ashani Mahawatta Dona et.al.	2507.17722	null
2025-07-23	AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer	Danny D. Leybzon et.al.	2507.17718	null
2025-07-23	HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging	Taha Ceritli et.al.	2507.17706	null
2025-07-23	Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models	Changxin Tian et.al.	2507.17702	null
2025-07-23	Thinking Isn't an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations	Zhao Song et.al.	2507.17699	null
2025-07-23	Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks	Ilias Chatzistefanidis et.al.	2507.17695	null
2025-07-23	Simulating multiple human perspectives in socio-ecological systems using large language models	Yongchao Zeng et.al.	2507.17680	null
2025-07-23	See the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question Answering	Junjie Wang et.al.	2507.17659	null
2025-07-23	CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts	Olaf Dünkel et.al.	2507.17651	null
2025-07-23	Who Attacks, and Why? Using LLMs to Identify Negative Campaigning in 18M Tweets across 19 Countries	Victor Hartman et.al.	2507.17636	null
2025-07-23	A Hybrid Early-Exit Algorithm for Large Language Models Based on Space Alignment Decoding (SPADE)	Bowen Zheng et.al.	2507.17618	null
2025-07-23	CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning	Lingxiao Tang et.al.	2507.17548	null
2025-07-23	Anticipate, Simulate, Reason (ASR): A Comprehensive Generative AI Framework for Combating Messaging Scams	Xue Wen Tan et.al.	2507.17543	null
2025-07-23	AssertFlip: Reproducing Bugs via Inversion of LLM-Generated Passing Tests	Lara Khatib et.al.	2507.17542	null
2025-07-23	Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning	Xinyao Liu et.al.	2507.17539	null
2025-07-23	Enabling Cyber Security Education through Digital Twins and Generative AI	Vita Santa Barletta et.al.	2507.17518	null
2025-07-23	URPO: A Unified Reward & Policy Optimization Framework for Large Language Models	Songshuo Lu et.al.	2507.17515	null
2025-07-23	HOTA: Hamiltonian framework for Optimal Transport Advection	Nazar Buzun et.al.	2507.17513	null
2025-07-23	Unsupervised anomaly detection using Bayesian flow networks: application to brain FDG PET in the context of Alzheimer's disease	Hugues Roy et.al.	2507.17486	null
2025-07-23	An Uncertainty-Driven Adaptive Self-Alignment Framework for Large Language Models	Haoran Sun et.al.	2507.17477	null
2025-07-23	MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs	Alexander R. Fabbri et.al.	2507.17476	null
2025-07-23	BGM-HAN: A Hierarchical Attention Network for Accurate and Fair Decision Assessment on Semi-Structured Profiles	Junhua Liu et.al.	2507.17472	null
2025-07-23	ERMV: Editing 4D Robotic Multi-view images to enhance embodied agents	Chang Nie et.al.	2507.17462	null
2025-07-23	Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning	Situo Zhang et.al.	2507.17448	null
2025-07-23	Each to Their Own: Exploring the Optimal Embedding in RAG	Shiting Chen et.al.	2507.17442	null
2025-07-23	A Comprehensive Evaluation on Quantization Techniques for Large Language Models	Yutong Liu et.al.	2507.17417	null
2025-07-23	HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs	Zhaolin Cai et.al.	2507.17394	null
2025-07-23	Investigating Training Data Detection in AI Coders	Tianlin Li et.al.	2507.17389	null
2025-07-23	Confidence Calibration in Vision-Language-Action Models	Thomas P Zollo et.al.	2507.17383	null
2025-07-23	Language-Conditioned Open-Vocabulary Mobile Manipulation with Pretrained Models	Shen Tan et.al.	2507.17379	null
2025-07-23	DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning	Chuzhan Hao et.al.	2507.17365	null
2025-07-23	RoadBench: A Vision-Language Foundation Model and Benchmark for Road Damage Understanding	Xi Xiao et.al.	2507.17353	null
2025-07-23	CartoonAlive: Towards Expressive Live2D Modeling from Single Portraits	Chao He et.al.	2507.17327	null
2025-07-23	Application of Whisper in Clinical Practice: the Post-Stroke Speech Assessment during a Naming Task	Milena Davudova et.al.	2507.17326	null
2025-07-23	R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning	Zhuokun Chen et.al.	2507.17307	null
2025-07-23	A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model	Zhe Xu et.al.	2507.17303	null
2025-07-23	Exploring the Potential of LLMs for Serendipity Evaluation in Recommender Systems	Li Kang et.al.	2507.17290	null
2025-07-23	Triple X: A LLM-Based Multilingual Speech Recognition System for the INTERSPEECH2025 MLC-SLM Challenge	Miaomiao Gao et.al.	2507.17288	null
2025-07-23	Fully Automated SAM for Single-source Domain Generalization in Medical Image Segmentation	Huanli Zhuo et.al.	2507.17281	null
2025-07-23	Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance	Rishi Parekh et.al.	2507.17273	null
2025-07-23	Seed&Steer: Guiding Large Language Models with Compilable Prefix and Branch Signals for Unit Test Generation	Shuaiyu Zhou et.al.	2507.17271	null
2025-07-23	Understanding Prompt Programming Tasks and Questions	Jenny T. Liang et.al.	2507.17264	null
2025-07-23	Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs	Eyal German et.al.	2507.17259	null
2025-07-23	Agent Identity Evals: Measuring Agentic Identity	Elija Perrier et.al.	2507.17257	null
2025-07-23	Rethinking VAE: From Continuous to Discrete Representations Without Probabilistic Assumptions	Songxuan Shi et.al.	2507.17255	null
2025-07-23	R4ec: A Reasoning, Reflection, and Refinement Framework for Recommendation Systems	Hao Gu et.al.	2507.17249	null
2025-07-23	Perceptual Classifiers: Detecting Generative Images using Perceptual Features	Krishna Srikar Durbha et.al.	2507.17240	null
2025-07-23	MaskedCLIP: Bridging the Masked and CLIP Space for Semi-Supervised Medical Vision-Language Pre-training	Lei Zhu et.al.	2507.17239	null
2025-07-23	A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task	Mashiro Toyooka et.al.	2507.17232	null
2025-07-23	PIG-Nav: Key Insights for Pretrained Image Goal Navigation Models	Jiansong Wan et.al.	2507.17220	null
2025-07-23	The Pluralistic Moral Gap: Understanding Judgment and Value Differences between Humans and Large Language Models	Giuseppe Russo et.al.	2507.17216	null
2025-07-23	EFS: Evolutionary Factor Searching for Sparse Portfolio Optimization Using Large Language Models	Haochen Luo et.al.	2507.17211	null
2025-07-23	HypoChainer: A Collaborative System Combining LLMs and Knowledge Graphs for Hypothesis-Driven Scientific Discovery	Haoran Jiang et.al.	2507.17209	null
2025-07-23	Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation	Zixuan Wang et.al.	2507.17204	null
2025-07-23	DesignLab: Designing Slides Through Iterative Detection and Correction	Jooyeol Yun et.al.	2507.17202	null
2025-07-23	Vec2Face+ for Face Dataset Generation	Haiyu Wu et.al.	2507.17192	null
2025-07-23	LLM Meets the Sky: Heuristic Multi-Agent Reinforcement Learning for Secure Heterogeneous UAV Networks	Lijie Zheng et.al.	2507.17188	null
2025-07-23	SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs	Zhiqiang Liu et.al.	2507.17178	null
2025-07-23	Improving LLMs' Generalized Reasoning Abilities by Graph Problems	Qifan Zhang et.al.	2507.17168	null
2025-07-23	Can LLMs Write CI? A Study on Automatic Generation of GitHub Actions Configurations	Taher A. Ghaleb et.al.	2507.17165	null
2025-07-23	DOOMGAN:High-Fidelity Dynamic Identity Obfuscation Ocular Generative Morphing	Bharath Krishnamurthy et.al.	2507.17158	null
2025-07-23	UNICE: Training A Universal Image Contrast Enhancer	Ruodai Cui et.al.	2507.17157	null
2025-07-23	CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards	Cheng Liu et.al.	2507.17147	null
2025-07-23	SADA: Stability-guided Adaptive Diffusion Acceleration	Ting Jiang et.al.	2507.17135	null
2025-07-23	Resilient Multi-Agent Negotiation for Medical Supply Chains:Integrating LLMs and Blockchain for Transparent Coordination	Mariam ALMutairi et.al.	2507.17134	null
2025-07-23	BrownoutServe: SLO-Aware Inference Serving under Bursty Workloads for MoE-based LLMs	Jianmin Hu et.al.	2507.17133	null
2025-07-23	Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance	Yufei He et.al.	2507.17131	null
2025-07-23	BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving	Wanyi Zheng et.al.	2507.17120	null
2025-07-23	HySafe-AI: Hybrid Safety Architectural Analysis Framework for AI Systems: A Case Study	Mandar Pitale et.al.	2507.17118	null
2025-07-23	Probabilistic Graphical Models: A Concise Tutorial	Jacqueline Maasch et.al.	2507.17116	null
2025-07-23	Enhancing Transferability and Consistency in Cross-Domain Recommendations via Supervised Disentanglement	Yuhan Wang et.al.	2507.17112	null
2025-07-23	Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models	Andrii Balashov et.al.	2507.17107	null
2025-07-22	Risk In Context: Benchmarking Privacy Leakage of Foundation Models in Synthetic Tabular Data Generation	Jessup Byun et.al.	2507.17066	null
2025-07-22	Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems	Chengxuan Xia et.al.	2507.17061	null
2025-07-22	Toward Scalable Video Narration: A Training-free Approach Using Multimodal Large Language Models	Tz-Ying Wu et.al.	2507.17050	null
2025-07-22	Controllable Hybrid Captioner for Improved Long-form Video Understanding	Kuleen Sasse et.al.	2507.17047	null
2025-07-22	Write, Rank, or Rate: Comparing Methods for Studying Visualization Affordances	Chase Stokes et.al.	2507.17024	null
2025-07-22	Causal Graph Fuzzy LLMs: A First Introduction and Applications in Time Series Forecasting	Omid Orang et.al.	2507.17016	null
2025-07-22	Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge?	Arduin Findeis et.al.	2507.17015	null
2025-07-22	Multi-Label Classification with Generative AI Models in Healthcare: A Case Study of Suicidality and Risk Factors	Ming Huang et.al.	2507.17009	null
2025-07-22	Bringing Balance to Hand Shape Classification: Mitigating Data Imbalance Through Generative Models	Gaston Gustavo Rios et.al.	2507.17008	null
2025-07-22	PyG 2.0: Scalable Learning on Real World Graphs	Matthias Fey et.al.	2507.16991	null
2025-07-22	Obscured but Not Erased: Evaluating Nationality Bias in LLMs via Name-Based Bias Benchmarks	Giulio Pelosio et.al.	2507.16989	null
2025-07-22	Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain	Rishemjit Kaur et.al.	2507.16974	null
2025-07-22	LLM4MEA: Data-free Model Extraction Attacks on Sequential Recommenders via Large Language Models	Shilong Zhao et.al.	2507.16969	null
2025-07-22	Harnessing RLHF for Robust Unanswerability Recognition and Trustworthy Response Generation in LLMs	Shuyuan Lin et.al.	2507.16951	null
2025-07-22	AI-based Clinical Decision Support for Primary Care: A Real-World Study	Robert Korom et.al.	2507.16947	null
2025-07-22	AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation	Nima Fathi et.al.	2507.16940	null
2025-07-22	SiLQ: Simple Large Language Model Quantization-Aware Training	Steven K. Esser et.al.	2507.16933	null
2025-07-22	Stellar Mass-Dispersion Measure Correlations Constrain Baryonic Feedback in Fast Radio Burst Host Galaxies	Calvin Leung et.al.	2507.16816	null
2025-07-22	LingBench++: A Linguistically-Informed Benchmark and Reasoning Framework for Multi-Step and Cross-Cultural Inference with LLMs	Da-Chen Lian et.al.	2507.16809	null
2025-07-22	Rethinking LLM-Based RTL Code Optimization Via Timing Logic Metamorphosis	Zhihao Xu et.al.	2507.16808	null
2025-07-23	Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning	Yanjun Zheng et.al.	2507.16802	null
2025-07-23	Test-Time-Matching: Decouple Personality, Memory, and Linguistic Style in LLM-based Role-Playing Language Agent	Xiaoyu Zhan et.al.	2507.16799	null
2025-07-22	Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning	Helena Casademunt et.al.	2507.16795	null
2025-07-22	ChatChecker: A Framework for Dialogue System Testing and Evaluation Through Non-cooperative User Simulation	Roman Mayr et.al.	2507.16792	null
2025-07-22	Enhancing Domain Diversity in Synthetic Data Face Recognition with Dataset Fusion	Anjith George et.al.	2507.16790	null
2025-07-22	Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning	Hongyin Luo et.al.	2507.16784	null
2025-07-22	Cooling Matters: Benchmarking Large Language Models and Vision-Language Models on Liquid-Cooled Versus Air-Cooled H100 GPU Systems	Imran Latif et.al.	2507.16781	null
2025-07-22	When LLMs Copy to Think: Uncovering Copy-Guided Attacks in Reasoning LLMs	Yue Li et.al.	2507.16773	null
2025-07-22	WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding	Ran Wang et.al.	2507.16768	null
2025-07-22	Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support	Fangjian Lei et.al.	2507.16754	null
2025-07-22	CMP: A Composable Meta Prompt for SAM-Based Cross-Domain Few-Shot Segmentation	Shuai Chen et.al.	2507.16753	null
2025-07-22	Collaborative Inference and Learning between Edge SLMs and Cloud LLMs: A Survey of Algorithms, Execution, and Open Challenges	Senyao Li et.al.	2507.16731	null
2025-07-22	Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints	Zhenyun Yin et.al.	2507.16727	null
2025-07-22	Enhancing Remote Sensing Vision-Language Models Through MLLM and LLM-Based High-Quality Image-Text Dataset Generation	Yiguo He et.al.	2507.16716	null
2025-07-22	Advancing Risk and Quality Assurance: A RAG Chatbot for Improved Regulatory Compliance	Lars Hillebrand et.al.	2507.16711	null
2025-07-22	Biases in LLM-Generated Musical Taste Profiles for Recommendation	Bruno Sguerra et.al.	2507.16708	null
2025-07-22	FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation	Pingyi Fan et.al.	2507.16696	null
2025-07-22	Generating Search Explanations using Large Language Models	Arif Laksito et.al.	2507.16692	null
2025-07-22	PICACO: Pluralistic In-Context Value Alignment of LLMs via Total Correlation Optimization	Han Jiang et.al.	2507.16679	null
2025-07-22	Custom Algorithm-based Fault Tolerance for Attention Layers in Transformers	Vasileios Titopoulos et.al.	2507.16676	null
2025-07-22	Meta-Learning for Cold-Start Personalization in Prompt-Tuned LLMs	Yushang Zhao et.al.	2507.16672	null
2025-07-22	VulCoCo: A Simple Yet Effective Method for Detecting Vulnerable Code Clones	Tan Bui et.al.	2507.16661	null
2025-07-22	P-CoT: A Pedagogically-motivated Participatory Chain-of-Thought Prompting for Phonological Reasoning in LLMs	Dongjun Jang et.al.	2507.16656	null
2025-07-22	Towards Automated Regulatory Compliance Verification in Financial Auditing with Large Language Models	Armin Berger et.al.	2507.16642	null
2025-07-22	Step-Audio 2 Technical Report	Boyong Wu et.al.	2507.16632	null
2025-07-22	Automatic Fine-grained Segmentation-assisted Report Generation	Frederic Jonske et.al.	2507.16623	null
2025-07-22	On the Effectiveness of LLM-as-a-judge for Code Generation and Summarization	Giuseppe Crupi et.al.	2507.16587	null
2025-07-22	LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models	Ahmed Lekssays et.al.	2507.16585	null
2025-07-22	From Text to Actionable Intelligence: Automating STIX Entity and Relationship Extraction	Ahmed Lekssays et.al.	2507.16576	null
2025-07-22	Pixels to Principles: Probing Intuitive Physics Understanding in Multimodal Language Models	Mohamad Ballout et.al.	2507.16572	null
2025-07-22	TTMBA: Towards Text To Multiple Sources Binaural Audio Generation	Yuxuan He et.al.	2507.16564	null
2025-07-22	Exploring Gender Bias in Large Language Models: An In-depth Dive into the German Language	Kristin Gnadt et.al.	2507.16557	null
2025-07-22	Alternative Loss Function in Evaluation of Transformer Models	Jakub Michańków et.al.	2507.16548	null
2025-07-22	Learning Text Styles: A Study on Transfer, Attribution, and Verification	Zhiqiang Hu et.al.	2507.16530	null
2025-07-22	Spatial 3D-LLM: Exploring Spatial Awareness in 3D Vision-Language Models	Xiaoyan Wang et.al.	2507.16524	null
2025-07-22	C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning	Xiuwei Chen et.al.	2507.16518	null
2025-07-22	The Ever-Evolving Science Exam	Junying Wang et.al.	2507.16514	null
2025-07-22	Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications	Jean Lelong et.al.	2507.16507	null
2025-07-22	ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs	Zhenliang Zhang et.al.	2507.16488	null
2025-07-22	ACT: Bridging the Gap in Code Translation through Synthetic Data Generation & Adaptive Training	Shreya Saxena et.al.	2507.16478	null
2025-07-22	Learning Temporal Abstractions via Variational Homomorphisms in Option-Induced Abstract MDPs	Chang Li et.al.	2507.16473	null
2025-07-22	Towards Enforcing Company Policy Adherence in Agentic Workflows	Naama Zwerdling et.al.	2507.16459	null
2025-07-22	An approach to measuring the performance of Automatic Speech Recognition (ASR) models in the context of Large Language Model (LLM) powered applications	Sujith Pulikodan et.al.	2507.16456	null
2025-07-22	VGGT-Long: Chunk it, Loop it, Align it -- Pushing VGGT's Limits on Kilometer-scale Long RGB Sequences	Kai Deng et.al.	2507.16443	null
2025-07-22	Exploring Large Language Models for Analyzing and Improving Method Names in Scientific Code	Gunnar Larsen et.al.	2507.16439	null
2025-07-22	Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework	Hongyi Tang et.al.	2507.16414	null
2025-07-22	GG-BBQ: German Gender Bias Benchmark for Question Answering	Shalaka Satheesh et.al.	2507.16410	null
2025-07-22	Improving Code LLM Robustness to Prompt Perturbations via Layer-Aware Model Editing	Shuhan Liu et.al.	2507.16407	null
2025-07-22	Sparse-View 3D Reconstruction: Recent Advances and Open Challenges	Tanveer Younis et.al.	2507.16406	null
2025-07-22	LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning	Bo Hou et.al.	2507.16395	null
2025-07-22	Are Foundation Models All You Need for Zero-shot Face Presentation Attack Detection?	Lazaro Janier Gonzalez-Sole et.al.	2507.16393	null
2025-07-22	A general model for frictional contacts in colloidal systems	Kay Hofmann et.al.	2507.16388	null
2025-07-22	Application of LLM Guided Reinforcement Learning in Formation Control with Collision Avoidance	Chenhao Yao et.al.	2507.16382	null
2025-07-22	Depth Gives a False Sense of Privacy: LLM Internal States Inversion	Tian Dong et.al.	2507.16372	null
2025-07-22	One Polyp Identifies All: One-Shot Polyp Segmentation with SAM via Cascaded Priors and Iterative Prompt Evolution	Xinyu Mao et.al.	2507.16337	null
2025-07-22	Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny	Chuanhao Yan et.al.	2507.16331	null
2025-07-22	DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling	Boheng Li et.al.	2507.16329	null
2025-07-22	M-SpecGene: Generalized Foundation Model for RGBT Multispectral Vision	Kailai Zhou et.al.	2507.16318	null
2025-07-22	Perovskite-R1: A Domain-Specialized LLM for Intelligent Discovery of Precursor Additives and Experimental Design	Xin-De Wang et.al.	2507.16307	null
2025-07-22	Talking Like a Phisher: LLM-Based Attacks on Voice Phishing Classifiers	Wenhao Li et.al.	2507.16291	null
2025-07-22	Dens3R: A Foundation Model for 3D Geometry Prediction	Xianze Fang et.al.	2507.16290	null
2025-07-22	Time to Split: Exploring Data Splitting Strategies for Offline Evaluation of Sequential Recommenders	Danil Gusak et.al.	2507.16289	null
2025-07-22	Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition	Zefeng Qian et.al.	2507.16287	null
2025-07-22	Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training	Zixiao Huang et.al.	2507.16274	null
2025-07-22	Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction	Tianyun Zhong et.al.	2507.16271	null
2025-07-22	iShumei-Chinchunmei at SemEval-2025 Task 4: A balanced forgetting and retention multi-task framework using effective unlearning loss	Yujian Sun et.al.	2507.16263	null
2025-07-22	Edge-case Synthesis for Fisheye Object Detection: A Data-centric Perspective	Seunghyeon Kim et.al.	2507.16254	null
2025-07-22	Efficient RL for optimizing conversation level outcomes with an LLM-based tutor	Hyunji Nam et.al.	2507.16252	null
2025-07-22	eX-NIDS: A Framework for Explainable Network Intrusion Detection Leveraging Large Language Models	Paul R. B. Houssel et.al.	2507.16241	null
2025-07-22	Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling	Chao Zhou et.al.	2507.16240	null
2025-07-22	LLM-Enhanced Reranking for Complementary Product Recommendation	Zekun Xu et.al.	2507.16237	null
2025-07-22	Voice-based AI Agents: Filling the Economic Gaps in Digital Health Delivery	Bo Wen et.al.	2507.16229	null
2025-07-22	Distilled Large Language Model in Confidential Computing Environment for System-on-Chip Design	Dong Ben et.al.	2507.16226	null
2025-07-22	Towards Compute-Optimal Many-Shot In-Context Learning	Shahriar Golchin et.al.	2507.16217	null
2025-07-22	Advancing Visual Large Language Model for Multi-granular Versatile Perception	Wentao Xiang et.al.	2507.16213	null
2025-07-22	LOCOFY Large Design Models -- Design to code conversion solution	Sohaib Muhammad et.al.	2507.16208	null
2025-07-22	A Human-Centered Approach to Identifying Promises, Risks, & Challenges of Text-to-Image Generative AI in Radiology	Katelyn Morrison et.al.	2507.16207	null
2025-07-22	RealBench: Benchmarking Verilog Generation Models with Real-World IP Designs	Pengwei Jin et.al.	2507.16200	null
2025-07-22	WakenLLM: A Fine-Grained Benchmark for Evaluating LLM Reasoning Potential and Reasoning Process Stability	Zipeng Ling et.al.	2507.16199	null
2025-07-22	Do Large Language Models Have a Planning Theory of Mind? Evidence from MindGames: a Multi-Step Persuasion Task	Jared Moore et.al.	2507.16196	null
2025-07-22	Emergent Cognitive Convergence via Implementation: A Structured Loop Reflecting Four Theories of Mind (A Position Paper)	Myung Ho Kim et.al.	2507.16184	null
2025-07-22	LLM Data Selection and Utilization via Dynamic Bi-level Optimization	Yang Yu et.al.	2507.16178	null
2025-07-22	SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting	Shuhao Mei et.al.	2507.16145	null
2025-07-22	Disability Across Cultures: A Human-Centered Audit of Ableism in Western and Indic LLMs	Mahika Phutane et.al.	2507.16130	null
2025-07-22	Benchmarking LLM Privacy Recognition for Social Robot Decision Making	Dakota Sullivan et.al.	2507.16124	null
2025-07-22	PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation	Yaofang Liu et.al.	2507.16116	null
2025-07-21	Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization	Shengchao Liu et.al.	2507.16110	null
2025-07-21	Efficient Compositional Multi-tasking for On-device Large Language Models	Ondrej Bohdal et.al.	2507.16083	null
2025-07-21	The Prompt Makes the Person(a): A Systematic Evaluation of Sociodemographic Persona Prompting for Large Language Models	Marlene Lutz et.al.	2507.16076	null
2025-07-21	Deep Researcher with Test-Time Diffusion	Rujun Han et.al.	2507.16075	null
2025-07-21	Compositional Coordination for Multi-Robot Teams with Large Language Models	Zhehui Huang et.al.	2507.16068	null
2025-07-21	AI-Powered Commit Explorer (APCE)	Yousab Grees et.al.	2507.16063	null
2025-07-21	AutoMeet: a proof-of-concept study of genAI to automate meetings in automotive engineering	Simon Baeuerle et.al.	2507.16054	null
2025-07-21	Making REST APIs Agent-Ready: From OpenAPI to Model Context Protocol Servers for Tool-Augmented LLMs	Meriem Mastouri et.al.	2507.16044	null
2025-07-21	A Pilot Study on LLM-Based Agentic Translation from Android to iOS: Pitfalls and Insights	Zhili Zeng et.al.	2507.16037	null
2025-07-21	From Logic to Language: A Trust Index for Problem Solving with LLMs	Tehseen Rug et.al.	2507.16028	null
2025-07-21	AI, Expert or Peer? -- Examining the Impact of Perceived Feedback Source on Pre-Service Teachers Feedback Perception and Uptake	Lucas Jasper Jacobsen et.al.	2507.16013	null
2025-07-21	Diffusion Beats Autoregressive in Data-Constrained Settings	Mihir Prabhudesai et.al.	2507.15857	null
2025-07-21	Latent Denoising Makes Good Visual Tokenizers	Jiawei Yang et.al.	2507.15856	null
2025-07-21	Gemini 2.5 Pro Capable of Winning Gold at IMO 2025	Yichen Huang et.al.	2507.15855	null
2025-07-21	The Other Mind: How Language Models Exhibit Human Temporal Cognition	Lingyu Li et.al.	2507.15851	null
2025-07-21	3LM: Bridging Arabic, STEM, and Code through Benchmarking	Basma El Amel Boussaha et.al.	2507.15850	null
2025-07-21	The Impact of Language Mixing on Bilingual LLM Reasoning	Yihao Li et.al.	2507.15849	null
2025-07-21	FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs	Anh Nguyen et.al.	2507.15839	null
2025-07-21	Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation	Alessandro B. Melchiorre et.al.	2507.15826	null
2025-07-21	ACS: An interactive framework for conformal selection	Yu Gui et.al.	2507.15825	null
2025-07-21	Can Your Model Separate Yolks with a Water Bottle? Benchmarking Physical Commonsense Understanding in Video Generation Models	Enes Sanli et.al.	2507.15824	null
2025-07-21	Do AI models help produce verified bug fixes?	Li Huang et.al.	2507.15822	null
2025-07-21	LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra	Seth Karten et.al.	2507.15815	null
2025-07-21	Diffusion models for multivariate subsurface generation and efficient probabilistic inversion	Roberto Miele et.al.	2507.15809	null
2025-07-21	True Multimodal In-Context Learning Needs Attention to the Visual Context	Shuo Chen et.al.	2507.15807	null
2025-07-21	ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction	Danhui Chen et.al.	2507.15803	null
2025-07-21	Regularized Low-Rank Adaptation for Few-Shot Organ Segmentation	Ghassen Baklouti et.al.	2507.15793	null
2025-07-21	Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning	Sneheel Sarangi et.al.	2507.15788	null
2025-07-21	Reservoir Computing as a Language Model	Felix Köster et.al.	2507.15779	null
2025-07-21	Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR	Jiakang Wang et.al.	2507.15778	null
2025-07-21	Left Leaning Models: AI Assumptions on Economic Policy	Maxim Chupilkin et.al.	2507.15771	null
2025-07-21	A Framework for Analyzing Abnormal Emergence in Service Ecosystems Through LLM-based Agent Intention Mining	Yifan Shen et.al.	2507.15770	null
2025-07-21	GasAgent: A Multi-Agent Framework for Automated Gas Optimization in Smart Contracts	Jingyi Zheng et.al.	2507.15761	null
2025-07-21	Understanding Large Language Models' Ability on Interdisciplinary Research	Yuanhao Shen et.al.	2507.15736	null
2025-07-21	Gaze-supported Large Language Model Framework for Bi-directional Human-Robot Interaction	Jens V. Rüppel et.al.	2507.15729	null
2025-07-21	TokensGen: Harnessing Condensed Tokens for Long Video Generation	Wenqi Ouyang et.al.	2507.15728	null
2025-07-21	A Practical Investigation of Spatially-Controlled Image Generation with Transformers	Guoxuan Xia et.al.	2507.15724	null
2025-07-21	BEnchmarking LLMs for Ophthalmology (BELO) for Ophthalmological Knowledge and Reasoning	Sahana Srinivasan et.al.	2507.15717	null
2025-07-21	Chinchunmei at SemEval-2025 Task 11: Boosting the Large Language Model's Capability of Emotion Perception using Contrastive Learning	Tian Li et.al.	2507.15714	null
2025-07-21	Is Large Language Model Performance on Reasoning Tasks Impacted by Different Ways Questions Are Asked?	Seok Hwan Song et.al.	2507.15707	null
2025-07-21	Estimating Rate-Distortion Functions Using the Energy-Based Model	Shitong Wu et.al.	2507.15700	null
2025-07-21	CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models	Congmin Zheng et.al.	2507.15698	null
2025-07-21	Surfacing Variations to Calibrate Perceived Reliability of MLLM-generated Image Descriptions	Meng Chen et.al.	2507.15692	null
2025-07-21	P3: Prompts Promote Prompting	Xinyu Zhang et.al.	2507.15675	null
2025-07-21	BugScope: Learn to Find Bugs Like Human	Jinyao Guo et.al.	2507.15671	null
2025-07-21	VeriRAG: A Retrieval-Augmented Framework for Automated RTL Testability Repair	Haomin Qi et.al.	2507.15664	null
2025-07-21	SustainDiffusion: Optimising the Social and Environmental Sustainability of Stable Diffusion Models	Giordano d'Aloisio et.al.	2507.15663	null
2025-07-21	HW-MLVQA: Elucidating Multilingual Handwritten Document Understanding with a Comprehensive VQA Benchmark	Aniket Pal et.al.	2507.15655	null
2025-07-21	Extracting Visual Facts from Intermediate Layers for Mitigating Hallucinations in Multimodal Large Language Models	Haoran Zhou et.al.	2507.15652	null
2025-07-21	Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training	Kailai Yang et.al.	2507.15640	null
2025-07-21	DHEvo: Data-Algorithm Based Heuristic Evolution for Generalizable MILP Solving	Zhihao Zhang et.al.	2507.15615	null
2025-07-21	Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems	Andrii Balashov et.al.	2507.15613	null
2025-07-21	CylinderPlane: Nested Cylinder Representation for 3D-aware Image Generation	Ru Jia et.al.	2507.15606	null
2025-07-21	Applying the Chinese Wall Reverse Engineering Technique to Large Language Model Code Editing	Manatsawin Hanmongkolchai et.al.	2507.15599	null
2025-07-21	Learning to Extract Rational Evidence via Reinforcement Learning for Retrieval-Augmented Generation	Xinping Zhao et.al.	2507.15586	null
2025-07-21	DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding	Xiaoyi Bao et.al.	2507.15569	null
2025-07-21	Evaluating Text Style Transfer: A Nine-Language Benchmark for Text Detoxification	Vitaly Protasov et.al.	2507.15557	null
2025-07-21	Efficient Routing of Inference Requests across LLM Instances in Cloud-Edge Computing	Shibo Yu et.al.	2507.15553	null
2025-07-21	RankMixer: Scaling Up Ranking Models in Industrial Recommenders	Jie Zhu et.al.	2507.15551	null
2025-07-21	PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors	Yimeng Chen et.al.	2507.15550	null
2025-07-21	LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning	Cole Robertson et.al.	2507.15521	null
2025-07-21	HAMLET: Hyperadaptive Agent-based Modeling for Live Embodied Theatrics	Sizhou Chen et.al.	2507.15518	null
2025-07-21	Step-level Verifier-guided Hybrid Test-Time Scaling for Large Language Models	Kaiyan Chang et.al.	2507.15512	null
2025-07-21	ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution	Alexandru Coca et.al.	2507.15501	null
2025-07-21	PhishIntentionLLM: Uncovering Phishing Website Intentions through Multi-Agent Retrieval-Augmented Generation	Wenhao Li et.al.	2507.15419	null
2025-07-21	PDEformer-2: A Versatile Foundation Model for Two-Dimensional Partial Differential Equations	Zhanhong Ye et.al.	2507.15409	null
2025-07-21	PiMRef: Detecting and Explaining Ever-evolving Spear Phishing Emails with Knowledge Base Invariants	Ruofan Liu et.al.	2507.15393	null
2025-07-21	DAViD: Data-efficient and Accurate Vision Models from Synthetic Data	Fatemeh Saleh et.al.	2507.15365	null
2025-07-21	Revisiting the Effect of Grid-Following Converter on Frequency Dynamics -- Part I: Center of Inertia	Jiahao Liu et.al.	2507.15358	null
2025-07-21	Metaphor and Large Language Models: When Surface Features Matter More than Deep Understanding	Elisa Sanchez-Bayona et.al.	2507.15357	null
2025-07-21	RAD: Retrieval High-quality Demonstrations to Enhance Decision-making	Lu Guo et.al.	2507.15356	null
2025-07-21	Scaling Decentralized Learning with FLock	Zehua Cheng et.al.	2507.15349	null
2025-07-21	Probing Information Distribution in Transformer Architectures through Entropy Analysis	Amedeo Buonanno et.al.	2507.15347	null
2025-07-21	StackTrans: From Large Language Model to Large Pushdown Automata Model	Kechi Zhang et.al.	2507.15343	null
2025-07-21	Reasoning Models are Test Exploiters: Rethinking Multiple-Choice	Narun Raman et.al.	2507.15337	null
2025-07-21	On the Inevitability of Left-Leaning Political Bias in Aligned Language Models	Thilo Hagendorff et.al.	2507.15328	null
2025-07-21	BenchDepth: Are We on the Right Way to Evaluate Depth Foundation Models?	Zhenyu Li et.al.	2507.15321	null
2025-07-21	Butterfly Effects in Toolchains: A Comprehensive Analysis of Failed Parameter Filling in LLM Tool-Agent Systems	Qian Xiong et.al.	2507.15296	null
2025-07-21	A Novel Self-Evolution Framework for Large Language Models	Haoran Sun et.al.	2507.15281	null
2025-07-21	ChiMed 2.0: Advancing Chinese Medical Dataset in Facilitating Large Language Modeling	Yuanhe Tian et.al.	2507.15275	null
2025-07-21	Conditional Video Generation for High-Efficiency Video Compression	Fangqiu Yi et.al.	2507.15269	null
2025-07-21	IM-Chat: A Multi-agent LLM-based Framework for Knowledge Transfer in Injection Molding Industry	Junhyeong Lee et.al.	2507.15268	null
2025-07-21	VLM-UDMC: VLM-Enhanced Unified Decision-Making and Motion Control for Urban Autonomous Driving	Haichao Liu et.al.	2507.15266	null
2025-07-21	CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers	Jiaqi Han et.al.	2507.15260	null
2025-07-21	MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations	Deyun Zhang et.al.	2507.15255	null
2025-07-21	Input Reduction Enhanced LLM-based Program Repair	Boyang Yang et.al.	2507.15251	null
2025-07-21	FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers	Yanbing Zhang et.al.	2507.15249	null
2025-07-21	SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search	Xiaofeng Shi et.al.	2507.15245	null
2025-07-21	Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse Autoencoders	Krishna Kanth Nakka et.al.	2507.15227	null
2025-07-21	Solving Formal Math Problems by Decomposition and Iterative Reflection	Yichi Zhou et.al.	2507.15225	null
2025-07-21	SimdBench: Benchmarking Large Language Models for SIMD-Intrinsic Code Generation	Yibo He et.al.	2507.15224	null
2025-07-21	Hierarchical Part-based Generative Model for Realistic 3D Blood Vessel	Siqi Chen et.al.	2507.15223	null
2025-07-21	Improving Joint Embedding Predictive Architecture with Diffusion Noise	Yuping Qiu et.al.	2507.15216	null
2025-07-21	Collaborative Distillation Strategies for Parameter-Efficient Language Model Deployment	Xiandong Meng et.al.	2507.15198	null
2025-07-21	Better Models and Algorithms for Learning Ising Models from Dynamics	Jason Gaitonde et.al.	2507.15173	null
2025-07-20	What Level of Automation is "Good Enough"? A Benchmark of Large Language Models for Meta-Analysis Data Extraction	Lingbo Li et.al.	2507.15152	null
2025-07-20	Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction	Ce Zhang et.al.	2507.15130	null
2025-07-20	AnalogFed: Federated Discovery of Analog Circuit Topologies with Generative AI	Qiufeng Li et.al.	2507.15104	null
2025-07-20	Filling the Gap: Is Commonsense Knowledge Generation useful for Natural Language Inference?	Chathuri Jayaweera et.al.	2507.15100	null
2025-07-20	BleedOrigin: Dynamic Bleeding Source Localization in Endoscopic Submucosal Dissection via Dual-Stage Detection and Tracking	Mengya Xu et.al.	2507.15094	null
2025-07-20	A Penalty Goes a Long Way: Measuring Lexical Diversity in Synthetic Texts Under Prompt-Influenced Length Variations	Vijeta Deshpande et.al.	2507.15092	null
2025-07-20	Aesthetics is Cheap, Show me the Text: An Empirical Evaluation of State-of-the-Art Generative Models for OCR	Peirong Zhang et.al.	2507.15085	null
2025-07-20	Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback	Yiyuan Yang et.al.	2507.15066	null
2025-07-20	WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization	Zhengwei Tao et.al.	2507.15061	null
2025-07-20	LibLMFuzz: LLM-Augmented Fuzz Target Generation for Black-box Libraries	Ian Hardgrove et.al.	2507.15058	null
2025-07-20	Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding	Yuanhan Zhang et.al.	2507.15028	null
2025-07-20	Deep Generative Models in Condition and Structural Health Monitoring: Opportunities, Limitations and Future Outlook	Xin Yang et.al.	2507.15026	null
2025-07-20	Survey of GenAI for Automotive Software Development: From Requirements to Executable Code	Nenad Petrovic et.al.	2507.15025	null
2025-07-20	RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback	Qiaoyu Tang et.al.	2507.15024	null
2025-07-20	EduThink4AI: Translating Educational Critical Thinking into Multi-Agent LLM Systems	Xinmeng Hou et.al.	2507.15015	null
2025-07-20	Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression	Roy H. Jennings et.al.	2507.14997	null
2025-07-18	Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning	Shashanka Venkataramanan et.al.	2507.14137	null
2025-07-18	NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining	Maksim Kuprashevich et.al.	2507.14119	null
2025-07-18	CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning	Xiaoya Li et.al.	2507.14111	null
2025-07-18	Automated Interpretation of Non-Destructive Evaluation Contour Maps Using Large Language Models for Bridge Condition Assessment	Viraj Nishesh Darji et.al.	2507.14107	null
2025-07-18	Generative AI-Driven High-Fidelity Human Motion Simulation	Hari Iyer et.al.	2507.14097	null
2025-07-18	Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track	Brian Ondov et.al.	2507.14096	null
2025-07-18	DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration	Xiyun Li et.al.	2507.14088	null
2025-07-18	DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits	Garapati Keerthana et.al.	2507.14079	null
2025-07-18	Foundation Models as Class-Incremental Learners for Dermatological Image Classification	Mohamed Elkhayat et.al.	2507.14050	null
2025-07-18	Evaluating the Effectiveness of Cost-Efficient Large Language Models in Benchmark Biomedical Tasks	Israt Jahan et.al.	2507.14045	null
2025-07-18	TGIF: Talker Group-Informed Familiarization of Target Speaker Extraction	Tsun-An Hsieh et.al.	2507.14044	null
2025-07-18	Architecting Human-AI Cocreation for Technical Services -- Interaction Modes and Contingency Factors	Jochen Wulf et.al.	2507.14034	null
2025-07-18	KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models	Lam Nguyen et.al.	2507.14032	null
2025-07-18	Moodifier: MLLM-Enhanced Emotion-Driven Image Editing	Jiarong Ye et.al.	2507.14024	null
2025-07-18	Efficient Temporal Tokenization for Mobility Prediction with Large Language Models	Haoyu He et.al.	2507.14017	null
2025-07-18	Leveraging Pathology Foundation Models for Panoptic Segmentation of Melanoma in H&E Images	Jiaqi Lv et.al.	2507.13974	null
2025-07-18	DUALRec: A Hybrid Sequential and Language Model Framework for Context-Aware Movie Recommendation	Yitong Li et.al.	2507.13957	null
2025-07-18	Cross-modal Causal Intervention for Alzheimer's Disease Prediction	Yutao Jin et.al.	2507.13956	null
2025-07-18	Exploiting Primacy Effect To Improve Large Language Models	Bianca Raimondi et.al.	2507.13949	null
2025-07-18	Generalist Forecasting with Frozen Video Models via Latent Diffusion	Jacob C Walker et.al.	2507.13942	null
2025-07-18	Preprint: Did I Just Browse A Website Written by LLMs?	Sichang "Steven" He et.al.	2507.13933	null
2025-07-18	Enhancing LiDAR Point Features with Foundation Model Priors for 3D Object Detection	Yujian Mo et.al.	2507.13899	null
2025-07-18	Using LLMs to identify features of personal and professional skills in an open-response situational judgment test	Cole Walsh et.al.	2507.13881	null
2025-07-18	Large Language Models as Innovators: A Framework to Leverage Latent Space Exploration for Novelty Discovery	Mateusz Bystroński et.al.	2507.13874	null
2025-07-18	SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection	Aleksandr Gashkov et.al.	2507.13859	null
2025-07-18	InTraVisTo: Inside Transformer Visualisation Tool	Nicolò Brunello et.al.	2507.13858	null
2025-07-18	DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training	Zhixin Wang et.al.	2507.13833	null
2025-07-18	Question-Answer Extraction from Scientific Articles Using Knowledge Graphs and Large Language Models	Hosein Azarbonyad et.al.	2507.13827	null
2025-07-18	RAG-based Architectures for Drug Side Effect Retrieval in LLMs	Shad Nygren et.al.	2507.13822	null
2025-07-18	Team of One: Cracking Complex Video QA with Model Synergy	Jun Xie et.al.	2507.13820	null
2025-07-18	CodeEdu: A Multi-Agent Collaborative Platform for Personalized Coding Education	Jianing Zhao et.al.	2507.13814	null
2025-07-18	SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing	Yingying Zhang et.al.	2507.13812	null
2025-07-18	On-the-Fly Fine-Tuning of Foundational Neural Network Potentials: A Bayesian Neural Network Approach	Tim Rensmeyer et.al.	2507.13805	null
2025-07-18	MolPIF: A Parameter Interpolation Flow Model for Molecule Generation	Yaowei Jin et.al.	2507.13762	null
2025-07-18	PRIDE -- Parameter-Efficient Reduction of Identity Discrimination for Equality in LLMs	Maluna Menke et.al.	2507.13743	null
2025-07-18	Can Synthetic Images Conquer Forgetting? Beyond Unexplored Doubts in Few-Shot Class-Incremental Learning	Junsu Kim et.al.	2507.13739	null
2025-07-18	DailyLLM: Context-Aware Activity Log Generation Using Multi-Modal Sensors and LLMs	Ye Tian et.al.	2507.13737	null
2025-07-18	The Judge Variable: Challenging Judge-Agnostic Legal Judgment Prediction	Guillaume Zambrano et.al.	2507.13732	null
2025-07-18	LLaPipe: LLM-Guided Reinforcement Learning for Automated Data Preparation Pipeline Construction	Jing Chang et.al.	2507.13712	null
2025-07-18	CogniQ-H: A Soft Hierarchical Reinforcement Learning Paradigm for Automated Data Preparation	Jing Chang et.al.	2507.13710	null
2025-07-18	Consistent Explainers or Unreliable Narrators? Understanding LLM-generated Group Recommendations	Cedric Waterschoot et.al.	2507.13705	null
2025-07-18	TopicAttack: An Indirect Prompt Injection Attack via Topic Transition	Yulin Chen et.al.	2507.13686	null
2025-07-18	LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues	Haoyang Li et.al.	2507.13681	null
2025-07-18	KiC: Keyword-inspired Cascade for Cost-Efficient Text Generation with LLMs	Woo-Chan Kim et.al.	2507.13666	null
2025-07-18	CU-ICU: Customizing Unsupervised Instruction-Finetuned Language Models for ICU Datasets via Text-to-Text Transfer Transformer	Teerapong Panboonyuen et.al.	2507.13655	null
2025-07-18	Towards channel foundation models (CFMs): Motivations, methodologies and opportunities	Jun Jiang et.al.	2507.13637	null
2025-07-18	Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques	Niveen O. Jaffal et.al.	2507.13629	null
2025-07-18	BifrostRAG: Bridging Dual Knowledge Graphs for Multi-Hop Question Answering in Construction Safety	Yuxin Zhang et.al.	2507.13625	null
2025-07-18	Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters	Shanbo Cheng et.al.	2507.13618	null
2025-07-18	Linguistic and Embedding-Based Profiling of Texts generated by Humans and Large Language Models	Sergio E. Zanotto et.al.	2507.13614	null
2025-07-18	CoTasks: Chain-of-Thought based Video Instruction Tuning Tasks	Yanan Wang et.al.	2507.13609	null
2025-07-18	GIFT: Gradient-aware Immunization of diffusion models against malicious Fine-Tuning with safe concepts retention	Amro Abdalla et.al.	2507.13598	null
2025-07-17	A Collaborative Framework Integrating Large Language Model and Chemical Fragment Space: Mutual Inspiration for Lead Design	Hao Tuo et.al.	2507.13580	null
2025-07-17	Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries	Hyunji Nam et.al.	2507.13579	null
2025-07-17	LLM-Based Community Surveys for Operational Decision Making in Interconnected Utility Infrastructures	Adaeze Okeukwu-Ogbonnaya et.al.	2507.13577	null
2025-07-17	Apple Intelligence Foundation Language Models: Tech Report 2025	Hanzhi Zhou et.al.	2507.13575	null
2025-07-17	Temporal Adaptation of Pre-trained Foundation Models for Music Structure Analysis	Yixiao Zhang et.al.	2507.13572	null
2025-07-17	A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models	Kirill Borodin et.al.	2507.13563	null
2025-07-17	Demystifying Feature Requests: Leveraging LLMs to Refine Feature Requests in Open-Source Software	Pragyan K C et.al.	2507.13555	null
2025-07-17	GOFAI meets Generative AI: Development of Expert Systems by means of Large Language Models	Eduardo C. Garrido-Merchán et.al.	2507.13550	null
2025-07-17	A Computational Approach to Modeling Conversational Systems: Analyzing Large-Scale Quasi-Patterned Dialogue Flows	Mohamed Achref Ben Ammar et.al.	2507.13544	null
2025-07-17	Provable Low-Frequency Bias of In-Context Learning of Representations	Yongyi Yang et.al.	2507.13540	null
2025-07-17	Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation	Genki Kusano et.al.	2507.13525	null
2025-07-17	Humans learn to prefer trustworthy AI over human partners	Yaomin Jiang et.al.	2507.13524	null
2025-07-17	GraphTrafficGPT: Enhancing Traffic Management Through Graph-Based AI Agent Coordination	Nabil Abdelaziz Ferhat Taleb et.al.	2507.13511	null
2025-07-17	Fake or Real: The Impostor Hunt in Texts for Space Operations	Agata Kaczmarek et.al.	2507.13508	null
2025-07-17	Revisiting LLM Value Probing Strategies: Are They Robust and Expressive?	Siqi Shen et.al.	2507.13490	null
2025-07-17	Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers	Liang Lin et.al.	2507.13474	null
2025-07-17	ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations	Shiye Cao et.al.	2507.13468	null
2025-07-17	"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models	Jing Gu et.al.	2507.13428	null
2025-07-17	VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding	Shihao Wang et.al.	2507.13353	null
2025-07-17	Hierarchical Rectified Flow Matching with Mini-Batch Couplings	Yichi Zhang et.al.	2507.13350	null
2025-07-17	Imbalance in Balance: Online Concept Balancing in Generation Models	Yukai Shi et.al.	2507.13345	null
2025-07-17	Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes	Tyler Loakman et.al.	2507.13335	null
2025-07-17	A Survey of Context Engineering for Large Language Models	Lingrui Mei et.al.	2507.13334	null
2025-07-17	The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner	Zhouqi Hua et.al.	2507.13332	null
2025-07-17	GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM	Kyeongjin Ahn et.al.	2507.13323	null
2025-07-17	Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark	Junsu Kim et.al.	2507.13314	null
2025-07-17	The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations	Carlos Arriaga et.al.	2507.13302	null
2025-07-17	AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research	Yilun Zhao et.al.	2507.13300	null
2025-07-17	Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management	Luis Gasco et.al.	2507.13275	null
2025-07-17	Automating Steering for Safe Multimodal Large Language Models	Lyucheng Wu et.al.	2507.13255	null
2025-07-17	RemVerse: Supporting Reminiscence Activities for Older Adults through AI-Assisted Virtual Reality	Ruohao Li et.al.	2507.13247	null
2025-07-17	HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models	Ashray Gupta et.al.	2507.13238	null
2025-07-17	Enhancing Cross-task Transfer of Large Language Models via Activation Steering	Xinyu Tang et.al.	2507.13236	null
2025-07-17	VITA: Vision-to-Action Flow Matching Policy	Dechen Gao et.al.	2507.13231	null
2025-07-18	MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling	Etienne Le Naour et.al.	2507.13207	null
2025-07-18	Automatically assessing oral narratives of Afrikaans and isiXhosa children	Retief Louw et.al.	2507.13205	null
2025-07-17	Black Box Deployed -- Functional Criteria for Artificial Moral Agents in the LLM Era	Matthew E. Brophy et.al.	2507.13175	null
2025-07-17	SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks	Kutub Uddin et.al.	2507.13170	null
2025-07-17	Online Rounding for Set Cover under Subset Arrivals	Jarosław Byrka et.al.	2507.13159	null
2025-07-17	Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities	Hao Sun et.al.	2507.13158	null
2025-07-17	Multi-population GAN Training: Analyzing Co-Evolutionary Algorithms	Walter P. Casas et.al.	2507.13157	null
2025-07-17	SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models	Xiangyu Dong et.al.	2507.13152	null
2025-07-17	DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model	Maulana Bisyir Azhari et.al.	2507.13145	null
2025-07-17	RIDAS: A Multi-Agent Framework for AI-RAN with Representation- and Intention-Driven Agents	Kuiyuan Ding et.al.	2507.13140	null
2025-07-17	Detecting LLM-generated Code with Subtle Modification by Adversarial Training	Xin Yin et.al.	2507.13123	null
2025-07-17	A Computational Framework to Identify Self-Aspects in Text	Jaya Caporusso et.al.	2507.13115	null
2025-07-17	R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning	Xiaohan Guo et.al.	2507.13107	null
2025-07-17	Intelligent Virtual Sonographer (IVS): Enhancing Physician-Robot-Patient Communication	Tianyu Song et.al.	2507.13052	null
2025-07-17	MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems	Yu Cui et.al.	2507.13038	null
2025-07-17	Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities	Liuyi Wang et.al.	2507.13019	null
2025-07-17	Teach Old SAEs New Domain Tricks with Boosting	Nikita Koriagin et.al.	2507.12990	null
2025-07-17	A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints	Youssef Tawfilis et.al.	2507.12979	null
2025-07-17	UniSLU: Unified Spoken Language Understanding from Heterogeneous Cross-Task Datasets	Zhichao Sheng et.al.	2507.12951	null
2025-07-17	Insights into a radiology-specialised multimodal large language model with sparse autoencoders	Kenza Bouzid et.al.	2507.12950	null
2025-07-17	Probabilistic Soundness Guarantees in LLM Reasoning Chains	Weiqiu You et.al.	2507.12948	null
2025-07-17	Analysis of Image-and-Text Uncertainty Propagation in Multimodal Large Language Models with Cardiac MR-Based Applications	Yucheng Tang et.al.	2507.12945	null
2025-07-17	Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion	Caixia Dong et.al.	2507.12938	null
2025-07-17	Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models	Yifan Xu et.al.	2507.12916	null
2025-07-17	Agentar-DeepFinance-300K: A Large-Scale Financial Dataset via Systematic Chain-of-Thought Synthesis Optimization	Xiaoke Zhao et.al.	2507.12901	null
2025-07-17	Generalist Bimanual Manipulation via Foundation Video Diffusion Models	Yao Feng et.al.	2507.12898	null
2025-07-17	DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization	Huakang Chen et.al.	2507.12890	null
2025-07-17	VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks	Jian Yao et.al.	2507.12885	null
2025-07-17	Generative Multi-Target Cross-Domain Recommendation	Jinqiu Jin et.al.	2507.12871	null
2025-07-17	Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)	Chongli Qin et.al.	2507.12856	null
2025-07-17	DEMONSTRATE: Zero-shot Language to Robotic Control via Multi-task Demonstration Learning	Rahel Rickenbach et.al.	2507.12855	null
2025-07-17	AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning	Yiming Ren et.al.	2507.12841	null
2025-07-17	Bridging the Gap: Leveraging Retrieval-Augmented Generation to Better Understand Public Concerns about Vaccines	Muhammad Javed et.al.	2507.12840	null
2025-07-17	MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval	Jeong-Woo Park et.al.	2507.12819	null
2025-07-17	Large Language Models' Internal Perception of Symbolic Music	Andrew Shin et.al.	2507.12808	null
2025-07-17	Semantic-guided Fine-tuning of Foundation Model for Long-tailed Visual Recognition	Yufei Peng et.al.	2507.12807	null
2025-07-17	MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models	Zhiwei Liu et.al.	2507.12806	null
2025-07-17	DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment	Junjie Gao et.al.	2507.12796	null
2025-07-17	Learning Robust Negation Text Representations	Thinh Hung Truong et.al.	2507.12782	null
2025-07-17	A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models	Weijieying Ren et.al.	2507.12774	null
2025-07-17	Local Representative Token Guided Merging for Text-to-Image Generation	Min-Jeong Lee et.al.	2507.12771	null
2025-07-17	Think-Before-Draw: Decomposing Emotion Semantics & Fine-Grained Controllable Expressive Talking Head Generation	Hanlei Shi et.al.	2507.12761	null
2025-07-17	osmAG-LLM: Zero-Shot Open-Vocabulary Object Navigation via Semantic Maps and Large Language Models Reasoning	Fujing Xie et.al.	2507.12753	null
2025-07-17	Multimodal-Guided Dynamic Dataset Pruning for Robust and Efficient Data-Centric Learning	Suorong Yang et.al.	2507.12750	null
2025-07-17	Strategy Adaptation in Large Language Model Werewolf Agents	Fuya Nakamori et.al.	2507.12732	null
2025-07-17	PinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform	Xiangyi Chen et.al.	2507.12704	null
2025-07-17	Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images	Zahra TehraniNasab et.al.	2507.12698	null
2025-07-16	Improving Drug Identification in Overdose Death Surveillance using Large Language Models	Arthur J. Funnell et.al.	2507.12679	null
2025-07-16	ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle	Mihran Miroyan et.al.	2507.12674	null
2025-07-16	The first open machine translation system for the Chechen language	Abu-Viskhan A. Umishov et.al.	2507.12672	null
2025-07-16	Single Conversation Methodology: A Human-Centered Protocol for AI-Assisted Software Development	Salvador D. Escobedo et.al.	2507.12665	null
2025-07-16	VLMgineer: Vision Language Models as Robotic Toolsmiths	George Jiayuan Gao et.al.	2507.12644	null
2025-07-16	NLI4VolVis: Natural Language Interaction for Volume Visualization via LLM Multi-Agents and Editable 3D Gaussian Splatting	Kuangshi Ai et.al.	2507.12621	null
2025-07-16	BootSeer: Analyzing and Mitigating Initialization Bottlenecks in Large-Scale LLM Training	Rui Li et.al.	2507.12619	null
2025-07-16	Learning What Matters: Probabilistic Task Selection via Mutual Information for Model Finetuning	Prateek Chanda et.al.	2507.12612	null
2025-07-16	Enhancing In-Domain and Out-Domain EmoFake Detection via Cooperative Multilingual Speech Foundation Models	Orchid Chetia Phukan et.al.	2507.12595	null
2025-07-16	Assay2Mol: large language model-based drug design using BioAssay context	Yifan Deng et.al.	2507.12574	null
2025-07-16	Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models	Gen Luo et.al.	2507.12566	null
2025-07-17	PhysX: Physical-Grounded 3D Asset Generation	Ziang Cao et.al.	2507.12465	null
2025-07-16	CytoSAE: Interpretable Cell Embeddings for Hematology	Muhammed Furkan Dasdelen et.al.	2507.12464	null
2025-07-16	Mitigating Object Hallucinations via Sentence-Level Early Intervention	Shangpin Peng et.al.	2507.12455	null
2025-07-16	Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models	Yik Siu Chan et.al.	2507.12428	null
2025-07-16	Advancing Retrieval-Augmented Generation for Structured Enterprise and Internal Data	Chandana Cheerla et.al.	2507.12425	null
2025-07-16	SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?	Xinyi He et.al.	2507.12415	null
2025-07-16	Modeling Feasible Locomotion of Nanobots for Cancer Detection and Treatment	Noble Harasha et.al.	2507.12400	null
2025-07-16	Assessing the Value of Visual Input: A Benchmark of Multimodal Large Language Models for Robotic Path Planning	Jacinto Colan et.al.	2507.12391	null
2025-07-16	Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics	Meysam Alizadeh et.al.	2507.12372	null
2025-07-16	Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate	Ana Davila et.al.	2507.12370	null
2025-07-16	GitChameleon: Evaluating AI Code Generation Against Python Library Version Incompatibilities	Diganta Misra et.al.	2507.12367	null
2025-07-16	Thought Purity: Defense Paradigm For Chain-of-Thought Attack	Zihao Xue et.al.	2507.12314	null
2025-07-16	Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization	Prashanth Vijayaraghavan et.al.	2507.12308	null
2025-07-16	Humans are more gullible than LLMs in believing common psychological myths	Bevan Koopman et.al.	2507.12296	null
2025-07-16	Text-ADBench: Text Anomaly Detection Benchmark based on LLMs Embedding	Feng Xiao et.al.	2507.12295	null
2025-07-16	SHACL Validation in the Presence of Ontologies: Semantics and Rewriting Techniques	Anouk Oudshoorn et.al.	2507.12286	null
2025-07-16	FADE: Adversarial Concept Erasure in Flow Models	Zixuan Fu et.al.	2507.12283	null
2025-07-17	Next-Gen Museum Guides: Autonomous Navigation and Visitor Interaction with an Agentic Robot	Luca Garello et.al.	2507.12273	null
2025-07-16	Improving Contextual ASR via Multi-grained Fusion with Large Language Models	Shilin Zhou et.al.	2507.12252	null
2025-07-16	Generate to Ground: Multimodal Text Conditioning Boosts Phrase Grounding in Medical Vision-Language Models	Felix Nützel et.al.	2507.12236	null
2025-07-16	MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM	Tao Chen et.al.	2507.12232	null
2025-07-16	Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning	Yuhao Chen et.al.	2507.12215	null
2025-07-16	Draw an Ugly Person An Exploration of Generative AIs Perceptions of Ugliness	Garyoung Kim et.al.	2507.12212	null
2025-07-16	BuildEvo: Designing Building Energy Consumption Forecasting Heuristics via LLM-driven Evolution	Subin Lin et.al.	2507.12207	null
2025-07-16	Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage	Junqing Lin et.al.	2507.12205	null
2025-07-16	RODS: Robust Optimization Inspired Diffusion Sampling for Detecting and Reducing Hallucination in Generative Models	Yiqi Tian et.al.	2507.12201	null
2025-07-16	Multi-Component VAE with Gaussian Markov Random Field	Fouad Oubari et.al.	2507.12165	null
2025-07-16	PRISM: Distributed Inference for Foundation Models at Edge	Muhammad Azlan Qazi et.al.	2507.12145	null
2025-07-16	Overview of the Sensemaking Task at the ELOQUENT 2025 Lab: LLMs as Teachers, Students and Evaluators	Pavel Šindelář et.al.	2507.12143	null
2025-07-16	RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization	Vladimir Bogachev et.al.	2507.12142	null
2025-07-16	Room Impulse Response Generation Conditioned on Acoustic Parameters	Silvia Arellano et.al.	2507.12136	null
2025-07-16	Iterative Augmentation with Summarization Refinement (IASR) Evaluation for Unstructured Survey data Modeling and Analysis	Payal Bhattad et.al.	2507.12126	null
2025-07-16	Open-Vocabulary Indoor Object Grounding with 3D Hierarchical Scene Graph	Sergey Linok et.al.	2507.12123	null
2025-07-16	DeepShade: Enable Shade Simulation by Text-conditioned Image Generation	Longchao Da et.al.	2507.12103	null
2025-07-16	LLAMA: Multi-Feedback Smart Contract Fuzzing Framework with LLM-Guided Seed Generation	Keke Gai et.al.	2507.12084	null
2025-07-16	Findings of MEGA: Maths Explanation with LLMs using the Socratic Method for Active Learning	Tosin Adewumi et.al.	2507.12079	null
2025-07-16	Evaluating the Ability of Large Language Models to Reason about Cardinal Directions, Revisited	Anthony G Cohn et.al.	2507.12059	null
2025-07-16	FloGAN: Scenario-Based Urban Mobility Flow Generation via Conditional GANs and Dynamic Region Decoupling	Seanglidet Yean et.al.	2507.12053	null
2025-07-16	A Comparative Approach to Assessing Linguistic Creativity of Large Language Models and Humans	Anca Dinu et.al.	2507.12039	null
2025-07-16	3D-MoRe: Unified Modal-Contextual Reasoning for Embodied Question Answering	Rongtao Xu et.al.	2507.12026	null
2025-07-16	EME-TTS: Unlocking the Emphasis and Emotion Link in Speech Synthesis	Haoxun Li et.al.	2507.12015	null
2025-07-16	DSSD: Efficient Edge-Device Deployment and Collaborative Inference via Distributed Split Speculative Decoding	Jiahong Ning et.al.	2507.12000	null
2025-07-16	Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection	Tairan Huang et.al.	2507.11997	null
2025-07-16	Robust Planning for Autonomous Vehicles with Diffusion-Based Failure Samplers	Juanran Wang et.al.	2507.11991	null
2025-07-16	Aime: Towards Fully-Autonomous Multi-Agent Framework	Yexuan Shi et.al.	2507.11988	null
2025-07-16	Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions	Lukas Ellinger et.al.	2507.11981	null
2025-07-16	Value-Based Large Language Model Agent Simulation for Mutual Evaluation of Trust and Interpersonal Closeness	Yuki Sakamoto et.al.	2507.11979	null
2025-07-16	Graph Representations for Reading Comprehension Analysis using Large Language Model and Eye-Tracking Biomarker	Yuhong Zhang et.al.	2507.11972	null
2025-07-16	Watch, Listen, Understand, Mislead: Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation	Sahid Hossain Mustakim et.al.	2507.11968	null
2025-07-16	Toxicity-Aware Few-Shot Prompting for Low-Resource Singlish Translation	Ziyu Ge et.al.	2507.11966	null
2025-07-16	PoTPTQ: A Two-step Power-of-Two Post-training for LLMs	Xinyu Wang et.al.	2507.11959	null
2025-07-16	The benefits of query-based KGQA systems for complex and temporal questions in LLM era	Artem Alekseev et.al.	2507.11954	null
2025-07-16	BlockBPE: Parallel BPE Tokenization	Amos You et.al.	2507.11941	null
2025-07-16	A Multi-Level Similarity Approach for Single-View Object Grasping: Matching, Planning, and Fine-Tuning	Hao Chen et.al.	2507.11938	null
2025-07-16	A Survey of Deep Learning for Geometry Problem Solving	Jianzhe Ma et.al.	2507.11936	null
2025-07-16	Hyperphantasia: A Benchmark for Evaluating the Mental Visualization Capabilities of Multimodal LLMs	Mohammad Shahab Sepehri et.al.	2507.11932	null
2025-07-16	From Generative to Episodic: Sample-Efficient Replicable Reinforcement Learning	Max Hopkins et.al.	2507.11926	null
2025-07-16	Marco-Bench-MIF: On Multilingual Instruction-Following Capability of Large Language Models	Bo Zeng et.al.	2507.11882	null
2025-07-16	DualReward: A Dynamic Reinforcement Learning Framework for Cloze Tests Distractor Generation	Tianyou Huang et.al.	2507.11875	null
2025-07-16	CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching	Sidharth Kannan et.al.	2507.11842	null
2025-07-16	The Evolving Role of Large Language Models in Scientific Innovation: Evaluator, Collaborator, and Scientist	Haoxuan Zhang et.al.	2507.11810	null
2025-07-16	Tracing Facts or just Copies? A critical investigation of the Competitions of Mechanisms in Large Language Models	Dante Campregher et.al.	2507.11809	null
2025-07-15	Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation	Alessandro Palma et.al.	2507.11789	null
2025-07-15	Foundation Models for Brain Signals: A Critical Review of Current Progress and Future Directions	Gayal Kuruppu et.al.	2507.11783	null
2025-07-15	Large-scale distributed synchronization systems, using a cancel-on-completion redundancy mechanism	Alexander Stolyar et.al.	2507.11779	null
2025-07-15	Scaling laws for activation steering with Llama 2 models and refusal mechanisms	Sheikh Abdur Raheem Ali et.al.	2507.11771	null
2025-07-15	LLMs are Bayesian, in Expectation, not in Realization	Leon Chlon et.al.	2507.11768	null
2025-07-15	Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning	Fan Shi et.al.	2507.11761	null
2025-07-15	CRABS: A syntactic-semantic pincer strategy for bounding LLM interpretation of Python notebooks	Meng Li et.al.	2507.11742	null
2025-07-15	Auto-Formulating Dynamic Programming Problems with Large Language Models	Chenyu Zhou et.al.	2507.11737	null
2025-07-15	Subgraph Generation for Generalizing on Out-of-Distribution Links	Jay Revolinsky et.al.	2507.11710	null
2025-07-15	MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization	Atharva Naik et.al.	2507.11687	null
2025-07-15	Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification	Moises Andrade et.al.	2507.11662	null
2025-07-15	Deep Generative Methods and Tire Architecture Design	Fouad Oubari et.al.	2507.11639	null
2025-07-15	Interpretable Prediction of Lymph Node Metastasis in Rectal Cancer MRI Using Variational Autoencoders	Benjamin Keel et.al.	2507.11638	null
2025-07-15	MapIQ: Benchmarking Multimodal Large Language Models for Map Question Answering	Varun Srivastava et.al.	2507.11625	null
2025-07-15	k-Contextuality as a Heuristic for Memory Separations in Learning	Mariesa H. Teo et.al.	2507.11604	null
2025-07-15	SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics	Suyuan Zhao et.al.	2507.11588	null
2025-07-15	Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation	Zhen Xu et.al.	2507.11540	null
2025-07-15	Streaming 4D Visual Geometry Transformer	Dong Zhuo et.al.	2507.11539	null
2025-07-15	DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering	Yinsheng Li et.al.	2507.11527	null
2025-07-15	LLM-based ambiguity detection in natural language instructions for collaborative surgical robots	Ana Davila et.al.	2507.11525	null
2025-07-15	AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air	Shiyi Yang et.al.	2507.11515	null
2025-07-15	HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing	Pan Du et.al.	2507.11474	null
2025-07-15	LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer	Yaoxian Dong et.al.	2507.11457	null
2025-07-15	Implementing Adaptations for Vision AutoRegressive Model	Kaif Shaikh et.al.	2507.11441	null
2025-07-15	Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models	Paul A. Bereuter et.al.	2507.11427	null
2025-07-16	Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize?	Yanjian Zhang et.al.	2507.11423	null
2025-07-15	Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations	Miray Özcan et.al.	2507.11417	null
2025-07-15	Seq vs Seq: An Open Suite of Paired Encoders and Decoders	Orion Weller et.al.	2507.11412	null
2025-07-15	KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?	Soumadeep Saha et.al.	2507.11408	null
2025-07-15	EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes	LG AI Research et.al.	2507.11407	null
2025-07-15	DCR: Quantifying Data Contamination in LLMs Evaluation	Cheng Xu et.al.	2507.11405	null
2025-07-15	Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs	Gabriel Bo et.al.	2507.11371	null
2025-07-15	From Chaos to Automation: Enabling the Use of Unstructured Data for Robotic Process Automation	Kelly Kurowski et.al.	2507.11364	null
2025-07-15	What is the Best Process Model Representation? A Comparative Analysis for Process Modeling with Large Language Models	Alexis Brissard et.al.	2507.11356	null
2025-07-15	Foundation Models for Logistics: Toward Certifiable, Conversational Planning Interfaces	Yunhao Yang et.al.	2507.11352	null
2025-07-15	RefModel: Detecting Refactorings using Foundation Models	Pedro Simões et.al.	2507.11346	null
2025-07-15	Guiding LLM Decision-Making with Fairness Reward Models	Zara Hall et.al.	2507.11344	null
2025-07-15	MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network	Jianfei Jiang et.al.	2507.11333	null
2025-07-15	Automated Novelty Evaluation of Academic Paper: A Collaborative Approach Integrating Human and Large Language Model Knowledge	Wenqing Wu et.al.	2507.11330	null
2025-07-15	Internal Value Alignment in Large Language Models through Controlled Value Vector Activation	Haoran Jin et.al.	2507.11316	null
2025-07-15	LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification	Fengxiao Tang et.al.	2507.11310	null
2025-07-15	Dr.Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian	Andrei Niculae et.al.	2507.11299	null
2025-07-15	Opus: A Prompt Intention Framework for Complex Workflow Generation	Théo Fagnoni et.al.	2507.11288	null
2025-07-15	Taming Uncertainty via Automation: Observing, Analyzing, and Optimizing Agentic AI Systems	Dany Moshkovich et.al.	2507.11277	null
2025-07-15	FMC: Formalization of Natural Language Mathematical Competition Problems	Jiaxuan Xie et.al.	2507.11275	null
2025-07-15	KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding	Luohe Shi et.al.	2507.11273	null
2025-07-15	An Empirical Study of Multi-Agent RAG for Real-World University Admissions Counseling	Anh Nguyen-Duc et.al.	2507.11272	null
2025-07-15	MFGDiffusion: Mask-Guided Smoke Synthesis for Enhanced Forest Fire Detection	Guanghao Wu et.al.	2507.11252	null
2025-07-15	Generative Click-through Rate Prediction with Applications to Search Advertising	Lingwei Kong et.al.	2507.11246	null
2025-07-15	NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation Models	X. Feng et.al.	2507.11245	null
2025-07-15	Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages	Lyzander Marciano Andrylie et.al.	2507.11230	null
2025-07-15	An Agentic Flow for Finite State Machine Extraction using Prompt Chaining	Fares Wael et.al.	2507.11222	null
2025-07-15	EsBBQ and CaBBQ: The Spanish and Catalan Bias Benchmarks for Question Answering	Valle Ruiz-Fernández et.al.	2507.11216	null
2025-07-15	Role-Playing LLM-Based Multi-Agent Support Framework for Detecting and Addressing Family Communication Bias	Rushia Harada et.al.	2507.11210	null
2025-07-15	Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding	Conrad Borchers et.al.	2507.11198	null
2025-07-15	Mixture of Experts in Large Language Models	Danyang Zhang et.al.	2507.11181	null
2025-07-15	Latent Space Consistency for Sparse-View CT Reconstruction	Duoyou Chen et.al.	2507.11152	null
2025-07-15	What Should LLMs Forget? Quantifying Personal Data in LLMs for Right-to-Be-Forgotten Requests	Dimitri Staufer et.al.	2507.11128	null
2025-07-15	MSA at ImageCLEF 2025 Multimodal Reasoning: Multilingual Multimodal Reasoning With Ensemble Vision Language Models	Seif Ahmed et.al.	2507.11114	null
2025-07-15	Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs	Sanhanat Sivapiromrat et.al.	2507.11112	null
2025-07-15	KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model	Jie Yang et.al.	2507.11102	null
2025-07-15	The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs	Zichen Wen et.al.	2507.11097	null
2025-07-15	EditGen: Harnessing Cross-Attention Control for Instruction-Based Auto-Regressive Audio Editing	Vassilis Sioros et.al.	2507.11096	null
2025-07-15	Beyond Traditional Algorithms: Leveraging LLMs for Accurate Cross-Border Entity Identification	Andres Azqueta-Gavaldón et.al.	2507.11086	null
2025-07-15	Function-to-Style Guidance of LLMs for Code Translation	Longhui Zhang et.al.	2507.11083	null
2025-07-15	Tactical Decision for Multi-UGV Confrontation with a Vision-Language Model-Based Commander	Li Wang et.al.	2507.11079	null
2025-07-15	LogTinyLLM: Tiny Large Language Models Based Contextual Log Anomaly Detection	Isaiah Thompson Ocansey et.al.	2507.11071	null
2025-07-15	SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks	Pavel Adamenko et.al.	2507.11059	null
2025-07-15	LLM-Augmented Symptom Analysis for Cardiovascular Disease Risk Prediction: A Clinical NLP	Haowei Yang et.al.	2507.11052	null
2025-07-15	Aligned Query Expansion: Efficient Query Expansion for Information Retrieval through LLM Alignment	Adam Yang et.al.	2507.11042	null
2025-07-15	Functional Emotion Modeling in Biomimetic Reinforcement Learning	Louis Wang et.al.	2507.11027	null
2025-07-15	Incentivizing Knowledge Transfers	Zhonghong Kuang et.al.	2507.11018	null
2025-07-15	First-Order Error Matters: Accurate Compensation for Quantized Large Language Models	Xingyu Zheng et.al.	2507.11017	null
2025-07-15	SIMCODE: A Benchmark for Natural Language to ns-3 Network Simulation Code Generation	Tasnim Ahmed et.al.	2507.11014	null
2025-07-15	Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation	Yanbo Wang et.al.	2507.11001	null
2025-07-15	Teach Me Sign: Stepwise Prompting LLM for Sign Language Production	Zhaoyi An et.al.	2507.10972	null
2025-07-15	DS@GT at eRisk 2025: From prompts to predictions, benchmarking early depression detection with conversational agent based assessments and temporal attention models	Anthony Miyaguchi et.al.	2507.10958	null
2025-07-15	Modeling Understanding of Story-Based Analogies Using Large Language Models	Kalit Inani et.al.	2507.10957	null
2025-07-15	Towards Practical Benchmarking of Data Cleaning Techniques: On Generating Authentic Errors via Large Language Models	Xinyuan Liu et.al.	2507.10934	null
2025-07-15	Artificial Finance: How AI Thinks About Money	Orhan Erdem et.al.	2507.10933	null
2025-07-15	Enhancing Safe and Controllable Protein Generation via Knowledge Preference Optimization	Yuhao Wang et.al.	2507.10923	null
2025-07-15	HanjaBridge: Resolving Semantic Ambiguity in Korean LLMs via Hanja-Augmented Pre-Training	Seungho Choi et.al.	2507.10920	null
2025-07-15	LLM-Driven Dual-Level Multi-Interest Modeling for Recommendation	Ziyan Wang et.al.	2507.10917	null
2025-07-15	Lessons Learned from Evaluation of LLM based Multi-agents in Safer Therapy Recommendation	Yicong Wu et.al.	2507.10911	null
2025-07-15	Evaluating Generated Commit Messages with Large Language Models	Qunhong Zeng et.al.	2507.10906	null
2025-07-15	LiLM-RDB-SFC: Lightweight Language Model with Relational Database-Guided DRL for Optimized SFC Provisioning	Parisa Fard Moshiri et.al.	2507.10903	null
2025-07-15	Object-Centric Mobile Manipulation through SAM2-Guided Perception and Imitation Learning	Wang Zhicheng et.al.	2507.10899	null
2025-07-15	LLMATCH: A Unified Schema Matching Framework with Large Language Models	Sha Wang et.al.	2507.10897	null
2025-07-15	Learning from Imperfect Data: Robust Inference of Dynamic Systems using Simulation-based Generative Model	Hyunwoo Cho et.al.	2507.10884	null
2025-07-15	From Alerts to Intelligence: A Novel LLM-Aided Framework for Host-based Intrusion Detection	Danyu Sun et.al.	2507.10873	null
2025-07-14	WhisperKit: On-device Real-time ASR with Billion-Scale Transformers	Atila Orhon et.al.	2507.10860	null
2025-07-14	MultiVox: Benchmarking Voice Assistants for Multimodal Interactions	Ramaneswaran Selvakumar et.al.	2507.10859	null
2025-07-14	LLMs on Trial: Evaluating Judicial Fairness for Large Language Models	Yiran Hu et.al.	2507.10852	null
2025-07-14	LLM-Guided Agentic Object Detection for Open-World Understanding	Furkan Mumcu et.al.	2507.10844	null
2025-07-14	REAL-IoT: Characterizing GNN Intrusion Detection Robustness under Practical Adversarial Attack	Zhonghao Zhan et.al.	2507.10836	null
2025-07-14	Supporting SENĆOTEN Language Documentation Efforts with Automatic Speech Recognition	Mengzhe Geng et.al.	2507.10827	null
2025-07-14	Semantic Context for Tool Orchestration	Robert Müller et.al.	2507.10820	null
2025-07-14	How Robust are LLM-Generated Library Imports? An Empirical Study using Stack Overflow	Jasmine Latendresse et.al.	2507.10818	null
2025-07-14	Versatile and Generalizable Manipulation via Goal-Conditioned Reinforcement Learning with Grounded Object Detection	Huiyi Wang et.al.	2507.10814	null
2025-07-14	Automated Thematic Analyses Using LLMs: Xylazine Wound Management Social Media Chatter Use Case	JaMor Hairston et.al.	2507.10803	null
2025-07-14	Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers	Yilun Zhao et.al.	2507.10787	null
2025-07-14	Warehouse Spatial Question Answering with LLM Agent	Hsiang-Wei Huang et.al.	2507.10778	null
2025-07-14	rt-RISeg: Real-Time Model-Free Robot Interactive Segmentation for Active Instance-Level Object Understanding	Howard H. Qian et.al.	2507.10776	null
2025-07-14	Spatial Reasoners for Continuous Variables in Any Domain	Bart Pogodzinski et.al.	2507.10768	null
2025-07-14	Integrating Biological Knowledge for Robust Microscopy Image Profiling on De Novo Cell Lines	Jiayuan Chen et.al.	2507.10737	null
2025-07-14	Bridging Brains and Machines: A Unified Frontier in Neuroscience, Artificial Intelligence, and Neuromorphic Systems	Sohan Shankar et.al.	2507.10722	null
2025-07-14	Exploring User Security and Privacy Attitudes and Concerns Toward the Use of General-Purpose LLM Chatbots for Mental Health	Jabari Kwesi et.al.	2507.10695	null
2025-07-14	Machine-learning inference of stellar properties using integrated photometric and spectroscopic data	Ilay Kamai et.al.	2507.10666	null
2025-07-14	Emulating Dark Matter Halo Merger Trees with Graph Generative Models	Tri Nguyen et.al.	2507.10652	null
2025-07-14	MP1: Mean Flow Tames Policy Learning in 1-step for Robotic Manipulation	Juyi Sheng et.al.	2507.10543	null
2025-07-14	Fusing LLM Capabilities with Routing Data	Tao Feng et.al.	2507.10540	null
2025-07-14	Graph World Model	Tao Feng et.al.	2507.10539	null
2025-07-14	CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks	Hongchao Jiang et.al.	2507.10535	null
2025-07-14	Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination	Mingqi Wu et.al.	2507.10532	null
2025-07-14	Accurate generation of chemical reaction transition states by conditional flow matching	Ping Tuo et.al.	2507.10530	null
2025-07-14	Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI	Jiangkai Wu et.al.	2507.10510	null
2025-07-14	Scene-Aware Conversational ADAS with Generative AI for Real-Time Driver Assistance	Kyungtae Han et.al.	2507.10500	null
2025-07-14	Can You Detect the Difference?	İsmail Tarım et.al.	2507.10475	null
2025-07-14	MLAR: Multi-layer Large Language Model-based Robotic Process Automation Applicant Tracking	Mohamed T. Younes et.al.	2507.10472	null
2025-07-14	An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments	Mikko Korkiakoski et.al.	2507.10469	null
2025-07-14	Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems	Hammad Atta et.al.	2507.10457	null
2025-07-14	Text-Visual Semantic Constrained AI-Generated Image Quality Assessment	Qiang Li et.al.	2507.10432	null
2025-07-14	Towards Emotion Co-regulation with LLM-powered Socially Assistive Robots: Integrating LLM Prompts and Robotic Behaviors to Support Parent-Neurodivergent Child Dyads	Jing Li et.al.	2507.10427	null
2025-07-14	Zorse: Optimizing LLM Training Efficiency on Heterogeneous GPU Clusters	Runsheng Benson Guo et.al.	2507.10392	null
2025-07-14	Test-Time Canonicalization by Foundation Models for Robust Perception	Utkarsh Singhal et.al.	2507.10375	null
2025-07-14	Using AI to replicate human experimental results: a motion study	Rosa Illan Castillo et.al.	2507.10342	null
2025-07-14	Grammar-Guided Evolutionary Search for Discrete Prompt Optimisation	Muzhaffar Hazman et.al.	2507.10326	null
2025-07-14	Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching	Yuhan Liu et.al.	2507.10318	null
2025-07-14	Recognizing Dementia from Neuropsychological Tests with State Space Models	Liming Wang et.al.	2507.10311	null
2025-07-14	DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs	Jiahe Zhao et.al.	2507.10302	null
2025-07-14	FaceLLM: A Multimodal Large Language Model for Face Understanding	Hatef Otroshi Shahreza et.al.	2507.10300	null
2025-07-14	Prompt Informed Reinforcement Learning for Visual Coverage Path Planning	Venkat Margapuri et.al.	2507.10284	null
2025-07-14	Cross-Timeslot Optimization for Distributed GPU Inference Using Reinforcement Learning	Chengze Du et.al.	2507.10259	null
2025-07-14	Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection	Jinglun Li et.al.	2507.10225	null
2025-07-14	Absher: A Benchmark for Evaluating Large Language Models Understanding of Saudi Dialects	Renad Al-Monef et.al.	2507.10216	null
2025-07-14	A Training-Free, Task-Agnostic Framework for Enhancing MLLM Performance on High-Resolution Images	Jaeseong Lee et.al.	2507.10202	null
2025-07-14	History Matching under Uncertainty of Geological Scenarios with Implicit Geological Realism Control with Generative Deep Learning and Graph Convolutions	Gleb Shishaev et.al.	2507.10201	null
2025-07-14	Natural Language-based Assessment of L2 Oral Proficiency using LLMs	Stefano Bannò et.al.	2507.10200	null
2025-07-14	Breaking the Myth: Can Small Models Infer Postconditions Too?	Gehao Zhang et.al.	2507.10182	null
2025-07-14	Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving	Wonung Kim et.al.	2507.10178	null
2025-07-14	Abusive text transformation using LLMs	Rohitash Chandra et.al.	2507.10177	null
2025-07-14	Task-Based Flexible Feature Distillation for LLMs	Khouloud Saadi et.al.	2507.10155	null
2025-07-14	Past-Future Scheduler for LLM Serving under SLA Guarantees	Ruihao Gong et.al.	2507.10150	null
2025-07-14	DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation	Ivan Martinović et.al.	2507.10118	null
2025-07-14	Accelerating Automatic Program Repair with Dual Retrieval-Augmented Fine-Tuning and Patch Generation on Large Language Models	Hanyang Guo et.al.	2507.10103	null
2025-07-14	Fusing Large Language Models with Temporal Transformers for Time Series Forecasting	Chen Su et.al.	2507.10098	null
2025-07-14	Towards High Supervised Learning Utility Training Data Generation: Data Pruning and Column Reordering	Tung Sum Thomas Kwok et.al.	2507.10088	null
2025-07-14	Foundation Model Driven Robotics: A Comprehensive Review	Muhammad Tayyab Khan et.al.	2507.10087	null
2025-07-14	Cultural Bias in Large Language Models: Evaluating AI Agents through Moral Questionnaires	Simon Münker et.al.	2507.10073	null
2025-07-14	ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism	Zedong Liu et.al.	2507.10069	null
2025-07-14	LLMShot: Reducing snapshot testing maintenance via LLMs	Ergün Batuhan Kaynak et.al.	2507.10062	null
2025-07-14	GeLaCo: An Evolutionary Approach to Layer Compression	David Ponce et.al.	2507.10059	null
2025-07-14	Explicit Vulnerability Generation with LLMs: An Investigation Beyond Adversarial Attacks	Emir Bosnak et.al.	2507.10054	null
2025-07-14	Automating SPARQL Query Translations between DBpedia and Wikidata	Malte Christian Bartels et.al.	2507.10045	null
2025-07-14	Towards Applying Large Language Models to Complement Single-Cell Foundation Models	Steven Palayew et.al.	2507.10039	null
2025-07-14	EAT: QoS-Aware Edge-Collaborative AIGC Task Scheduling via Attention-Guided Diffusion Reinforcement Learning	Zhifei Xu et.al.	2507.10026	null
2025-07-14	Qualitative Study for LLM-assisted Design Study Process: Strategies, Challenges, and Roles	Shaolun Ruan et.al.	2507.10024	null
2025-07-14	The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents	Lixu Wang et.al.	2507.10016	null
2025-07-14	(Almost) Free Modality Stitching of Foundation Models	Jaisidh Singh et.al.	2507.10015	null
2025-07-14	Protective Factor-Aware Dynamic Influence Learning for Suicide Risk Prediction on Social Media	Jun Li et.al.	2507.10008	null
2025-07-14	Deep Hidden Cognition Facilitates Reliable Chain-of-Thought Reasoning	Zijun Chen et.al.	2507.10007	null
2025-07-14	Differentially Private Federated Low Rank Adaptation Beyond Fixed-Matrix	Ming Wen et.al.	2507.09990	null
2025-07-14	Demonstrating the Octopi-1.5 Visual-Tactile-Language Model	Samson Yu et.al.	2507.09985	null
2025-07-14	Tiny Reward Models	Sarah Pan et.al.	2507.09973	null
2025-07-14	AnalogTester: A Large Language Model-Based Framework for Automatic Testbench Generation in Analog Circuit Design	Weiyu Chen et.al.	2507.09965	null
2025-07-14	DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models	Luolin Xiong et.al.	2507.09955	null
2025-07-14	Can GPT-4o mini and Gemini 2.0 Flash Predict Fine-Grained Fashion Product Attributes? A Zero-Shot Analysis	Shubham Shukla et.al.	2507.09950	null
2025-07-14	Iceberg: Enhancing HLS Modeling with Synthetic Data	Zijian Ding et.al.	2507.09948	null
2025-07-14	Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed Inference	Jiaming Cheng et.al.	2507.09942	null
2025-07-14	Memorization Sinks: Isolating Memorization during LLM Training	Gaurav R. Ghosal et.al.	2507.09937	null
2025-07-14	Enhancing Retrieval Augmented Generation with Hierarchical Text Segmentation Chunking	Hai Toan Nguyen et.al.	2507.09935	null
2025-07-14	Mechanistic Interpretability of LoRA-Adapted Language Models for Nuclear Reactor Safety Applications	Yoon Pyo Lee et.al.	2507.09931	null
2025-07-14	Solving dynamic portfolio selection problems via score-based diffusion models	Ahmad Aghapour et.al.	2507.09916	null
2025-07-14	Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce Scenarios	Siyue Yao et.al.	2507.09915	null
2025-07-14	TolerantECG: A Foundation Model for Imperfect Electrocardiogram	Huynh Nguyen Dang et.al.	2507.09887	null
2025-07-14	VerifyBench: A Systematic Benchmark for Evaluating Reasoning Verifiers Across Domains	Xuzhao Li et.al.	2507.09884	null
2025-07-14	AdaBrain-Bench: Benchmarking Brain Foundation Models for Brain-Computer Interface Applications	Jiamin Wu et.al.	2507.09882	null
2025-07-14	Covering a Few Submodular Constraints and Applications	Tanvi Bajpai et.al.	2507.09879	null
2025-07-14	ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models	Yongheng Zhang et.al.	2507.09876	null
2025-07-14	Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition	Qinyuan Ye et.al.	2507.09875	null
2025-07-14	Turning the Tide: Repository-based Code Reflection	Wei Zhang et.al.	2507.09866	null
2025-07-14	A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends	Yihao Ding et.al.	2507.09861	null
2025-07-14	Model-Grounded Symbolic Artificial Intelligence Systems Learning and Reasoning with Model-Grounded Symbolic Artificial Intelligence Systems	Aniruddha Chattopadhyay et.al.	2507.09854	null
2025-07-14	Rethinking Prompt Optimization: Reinforcement, Diversification, and Migration in Blackbox LLMs	MohammadReza Davari et.al.	2507.09839	null
2025-07-14	Generative Audio Language Modeling with Continuous-valued Tokens and Masked Next-Token Prediction	Shu-wen Yang et.al.	2507.09834	null
2025-07-13	Generative Cognitive Diagnosis	Jiatong Li et.al.	2507.09831	null
2025-07-13	Measuring What Matters: A Framework for Evaluating Safety Risks in Real-World LLM Applications	Jia Yi Goh et.al.	2507.09820	null
2025-07-13	VRU-Accident: A Vision-Language Benchmark for Video Question Answering and Dense Captioning for Accident Scene Understanding	Younggun Kim et.al.	2507.09815	null
2025-07-13	A Scalable and Efficient Signal Integration System for Job Matching	Ping Liu et.al.	2507.09797	null
2025-07-13	CADmium: Fine-Tuning Code Language Models for Text-Driven Sequential CAD Design	Prashant Govindarajan et.al.	2507.09792	null
2025-07-13	Prompting for Performance: Exploring LLMs for Configuring Software	Helge Spieker et.al.	2507.09790	null
2025-07-13	TinyTroupe: An LLM-powered Multiagent Persona Simulation Toolkit	Paulo Salem et.al.	2507.09788	null
2025-07-13	Efficient Molecular Conformer Generation with SO(3)-Averaged Flow Matching and Reflow	Zhonglin Cao et.al.	2507.09785	null
2025-07-13	Do we need equivariant models for molecule generation?	Ewa M. Nowara et.al.	2507.09753	null
2025-07-13	Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations	Bradley P. Allen et.al.	2507.09751	null
2025-07-13	BrainFLORA: Uncovering Brain Concept Representation via Multimodal Neural Embeddings	Dongyang Li et.al.	2507.09747	null
2025-07-13	Enhancing Trading Performance Through Sentiment Analysis with Large Language Models: Evidence from the S&P 500	Haojie Liu et.al.	2507.09739	null
2025-07-13	Continental scale habitat modelling with artificial intelligence and multimodal earth observation	Sara Si-Moussi et.al.	2507.09732	null
2025-07-13	Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces	Baturay Saglam et.al.	2507.09709	null
2025-07-13	MCEval: A Dynamic Framework for Fair Multilingual Cultural Evaluation of LLMs	Shulin Huang et.al.	2507.09701	null
2025-07-13	ExpStar: Towards Automatic Commentary Generation for Multi-discipline Scientific Experiments	Jiali Chen et.al.	2507.09693	null
2025-07-13	Prompt2DEM: High-Resolution DEMs for Urban and Open Environments from Global Prompts Using a Monocular Foundation Model	Osher Rafaeli et.al.	2507.09681	null
2025-07-13	Can AI Rely on the Systematicity of Truth? The Challenge of Modelling Normative Domains	Matthieu Queloz et.al.	2507.09676	null
2025-07-13	Is Quantization a Deal-breaker? Empirical Insights from Large Code Models	Saima Afrin et.al.	2507.09665	null
2025-07-13	Towards Concise and Adaptive Thinking in Large Reasoning Models: A Survey	Jason Zhu et.al.	2507.09662	null
2025-07-13	Negotiating Comfort: Simulating Personality-Driven LLM Agents in Shared Residential Social Networks	Ann Nedime Nese Rende et.al.	2507.09657	null
2025-07-13	Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset	Lily Hong Zhang et.al.	2507.09650	null
2025-07-13	Can Group Relative Policy Optimization Improve Thai Legal Reasoning and Question Answering?	Pawitsapak Akarajaradwong et.al.	2507.09638	null
2025-07-13	Demystifying Flux Architecture	Or Greenberg et.al.	2507.09595	null
2025-07-11	Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective	Hangjie Yuan et.al.	2507.08801	null
2025-07-11	NeuralOS: Towards Simulating Operating Systems via Neural Generative Models	Luke Rivard et.al.	2507.08800	null
2025-07-11	One Token to Fool LLM-as-a-Judge	Yulai Zhao et.al.	2507.08794	null
2025-07-11	From One to More: Contextual Part Latents for 3D Generation	Shaocong Dong et.al.	2507.08772	null
2025-07-11	BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity	Chenyang Song et.al.	2507.08771	null
2025-07-11	Multilingual Multimodal Software Developer for Code Generation	Linzheng Chai et.al.	2507.08719	null
2025-07-11	Unreal is all you need: Multimodal ISAC Data Simulation with Only One Engine	Kongwu Huang et.al.	2507.08716	null
2025-07-11	KG-Attention: Knowledge Graph-Guided Attention at Test-Time via Bidirectional Information Aggregation	Songlin Zhai et.al.	2507.08704	null
2025-07-11	ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way	Rajarshi Roy et.al.	2507.08679	null
2025-07-11	LLMCup: Ranking-Enhanced Comment Updating with LLMs	Hua Ge et.al.	2507.08671	null
2025-07-11	KELPS: A Framework for Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment	Jiyao Zhang et.al.	2507.08665	null
2025-07-11	Introspection of Thought Helps AI Agents	Haoran Sun et.al.	2507.08664	null
2025-07-11	Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning	Xingguang Ji et.al.	2507.08649	null
2025-07-11	DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images	Haoran Sun et.al.	2507.08648	null
2025-07-11	NL in the Middle: Code Translation with LLMs and Intermediate Representations	Chi-en Amy Tai et.al.	2507.08627	null
2025-07-11	A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1	Marcin Pietroń et.al.	2507.08621	null
2025-07-11	Agentic Large Language Models for Conceptual Systems Engineering and Design	Soheyl Massoudi et.al.	2507.08619	null
2025-07-11	AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs	Florian Grötschla et.al.	2507.08616	null
2025-07-11	Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data	Parag Dutta et.al.	2507.08610	null
2025-07-11	Unlocking Speech Instruction Data Potential with Query Rewriting	Yonghua Hei et.al.	2507.08603	null
2025-07-11	Visual Semantic Description Generation with MLLMs for Image-Text Matching	Junyu Chen et.al.	2507.08590	null
2025-07-11	To Trade or Not to Trade: An Agentic Approach to Estimating Market Risk Improves Trading Decisions	Dimitrios Emmanoulopoulos et.al.	2507.08584	null
2025-07-11	Large Multi-modal Model Cartographic Map Comprehension for Textual Locality Georeferencing	Kalana Wijegunarathna et.al.	2507.08575	null
2025-07-11	AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling	Preslav Aleksandrov et.al.	2507.08567	null
2025-07-11	FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation	Yuxuan Jiang et.al.	2507.08557	null
2025-07-11	White-Basilisk: A Hybrid Model for Code Vulnerability Detection	Ioannis Lamprou et.al.	2507.08540	null
2025-07-11	The AI Language Proficiency Monitor -- Tracking the Progress of LLMs on Multilingual Benchmarks	David Pomerenke et.al.	2507.08538	null
2025-07-11	A Multi-granularity Concept Sparse Activation and Hierarchical Knowledge Graph Fusion Framework for Rare Disease Diagnosis	Mingda Zhang et.al.	2507.08529	null
2025-07-11	InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching	Yilun Wang et.al.	2507.08523	null
2025-07-11	Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation	Liu He et.al.	2507.08513	null
2025-07-11	From Language to Logic: A Bi-Level Framework for Structured Reasoning	Keying Yang et.al.	2507.08501	null
2025-07-11	Semantic-Augmented Latent Topic Modeling with LLM-in-the-Loop	Mengze Hong et.al.	2507.08498	null
2025-07-11	LLaPa: A Vision-Language Model Framework for Counterfactual-Aware Procedural Planning	Shibo Sun et.al.	2507.08496	null
2025-07-11	A Third Paradigm for LLM Evaluation: Dialogue Game-Based Evaluation using clembench	David Schlangen et.al.	2507.08491	null
2025-07-11	ILT-Iterative LoRA Training through Focus-Feedback-Fix for Multilingual Speech Recognition	Qingliang Meng et.al.	2507.08477	null
2025-07-11	SynBridge: Bridging Reaction States via Discrete Flow for Bidirectional Reaction Prediction	Haitao Lin et.al.	2507.08475	null
2025-07-11	Using Large Language Models for Legal Decision-Making in Austrian Value-Added Tax Law: An Experimental Study	Marina Luketina et.al.	2507.08468	null
2025-07-11	F3-Net: Foundation Model for Full Abnormality Segmentation of Medical Images with Flexible Input Modality Requirement	Seyedeh Sahar Taheri Otaghsara et.al.	2507.08460	null
2025-07-11	Diagnosing Failures in Large Language Models' Answers: Integrating Error Attribution into Evaluation Framework	Zishan Xu et.al.	2507.08459	null
2025-07-11	A document is worth a structured record: Principled inductive bias design for document recognition	Benjamin Meyer et.al.	2507.08458	null
2025-07-11	CUE-RAG: Towards Accurate and Cost-Efficient Graph-Based RAG via Multi-Partite Graph and Query-Driven Iterative Retrieval	Yaodong Su et.al.	2507.08445	null
2025-07-11	Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation	Anlin Zheng et.al.	2507.08441	null
2025-07-11	Finding Common Ground: Using Large Language Models to Detect Agreement in Multi-Agent Decision Conferences	Selina Heller et.al.	2507.08440	null
2025-07-11	xpSHACL: Explainable SHACL Validation using Retrieval-Augmented Generation and Large Language Models	Gustavo Correa Publio et.al.	2507.08432	null
2025-07-11	ChainEdit: Propagating Ripple Effects in LLM Knowledge Editing through Logical Rule-Guided Chains	Zilu Dong et.al.	2507.08427	null
2025-07-11	Generative artificial intelligence and hybrid models to accelerate LES in reactive flows: Application to hydrogen/methane combustion	Xiangrui Zou et.al.	2507.08426	null
2025-07-11	A Survey of Large Language Models in Discipline-specific Research: Challenges, Methods and Opportunities	Lu Xiang et.al.	2507.08425	null
2025-07-11	InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes	Zesong Yang et.al.	2507.08416	null
2025-07-11	Multi-modal Mutual-Guidance Conditional Prompt Learning for Vision-Language Models	Shijun Yang et.al.	2507.08410	null
2025-07-11	PanMatch: Unleashing the Potential of Large Vision Models for Unified Matching Models	Yongjian Zhang et.al.	2507.08400	null
2025-07-11	Understanding Driving Risks using Large Language Models: Toward Elderly Driver Assessment	Yuki Yoshihara et.al.	2507.08367	null
2025-07-11	Leveraging Machine Learning and Enhanced Parallelism Detection for BPMN Model Generation from Text	Phuong Nam Lê et.al.	2507.08362	null
2025-07-11	Cycle Context Verification for In-Context Medical Image Segmentation	Shishuai Hu et.al.	2507.08357	null
2025-07-11	Exploring Design of Multi-Agent LLM Dialogues for Research Ideation	Keisuke Ueda et.al.	2507.08350	null
2025-07-11	What Factors Affect LLMs and RLLMs in Financial Question Answering?	Peng Wang et.al.	2507.08339	null
2025-07-11	CoCo-Bot: Energy-based Composable Concept Bottlenecks for Interpretable Generative Models	Sangwon Kim et.al.	2507.08334	null
2025-07-11	CRMAgent: A Multi-Agent LLM System for E-Commerce CRM Message Template Generation	Yinzhu Quan et.al.	2507.08325	null
2025-07-11	Generative AI in Science: Applications, Challenges, and Emerging Questions	Ryan Harries et.al.	2507.08310	null
2025-07-11	Improving MLLM's Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency	Yupu Liang et.al.	2507.08309	null
2025-07-11	M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning	Inclusion AI et.al.	2507.08306	null
2025-07-11	KAT-V1: Kwai-AutoThink Technical Report	Zizheng Zhan et.al.	2507.08297	null
2025-07-11	Invariant-based Robust Weights Watermark for Large Language Models	Qingxiao Guo et.al.	2507.08288	null
2025-07-11	Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training	Aleksei Ilin et.al.	2507.08284	null
2025-07-11	Agent Safety Alignment via Reinforcement Learning	Zeyang Sha et.al.	2507.08270	null
2025-07-11	A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning	Hiroshi Yoshihara et.al.	2507.08267	null
2025-07-11	CL3R: 3D Reconstruction and Contrastive Learning for Enhanced Robotic Manipulation Representations	Wenbo Cui et.al.	2507.08262	null
2025-07-11	Quantum-Accelerated Neural Imputation with Large Language Models (LLMs)	Hossein Jamali et.al.	2507.08255	null
2025-07-11	Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models	Ulzee An et.al.	2507.08254	null
2025-07-11	Leveraging Large Language Models for Classifying App Users' Feedback	Yasaman Abedini et.al.	2507.08250	null
2025-07-11	Time Variation in the TeV Cosmic Ray Anisotropy with IceCube and Energy Dependence of the Solar Dipole	Perri Zilberman et.al.	2507.08242	null
2025-07-11	Data Generation without Function Estimation	Hadi Daneshmand et.al.	2507.08239	null
2025-07-11	InsightBuild: LLM-Powered Causal Reasoning in Smart Building Systems	Pinaki Prasad Guha Neogi et.al.	2507.08235	null
2025-07-11	Can LLMs Reliably Simulate Real Students' Abilities in Mathematics and Reading Comprehension?	KV Aditya Srivatsa et.al.	2507.08232	null
2025-07-11	Making VLMs More Robot-Friendly: Self-Critical Distillation of Low-Level Procedural Reasoning	Chan Young Park et.al.	2507.08224	null
2025-07-10	Effect of Static vs. Conversational AI-Generated Messages on Colorectal Cancer Screening Intent: a Randomized Controlled Trial	Neil K. R. Sehgal et.al.	2507.08211	null
2025-07-10	Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions	Quanyan Zhu et.al.	2507.08208	null
2025-07-10	A Dynamic Stackelberg Game Framework for Agentic AI Defense Against LLM Jailbreaking	Zhengye Han et.al.	2507.08207	null
2025-07-10	TruthTorchLM: A Comprehensive Library for Predicting Truthfulness in LLM Outputs	Duygu Nur Yaldiz et.al.	2507.08203	null
2025-07-10	Consciousness as a Jamming Phase	Kaichen Ouyang et.al.	2507.08197	null
2025-07-10	CTRLS: Chain-of-Thought Reasoning via Latent State-Transition	Junda Wu et.al.	2507.08182	null
2025-07-10	Analysis of Propaganda in Tweets From Politically Biased Sources	Vivek Sharma et.al.	2507.08169	null
2025-07-10	KP-A: A Unified Network Knowledge Plane for Catalyzing Agentic Network Intelligence	Yun Tang et.al.	2507.08164	null
2025-07-10	ALCo-FM: Adaptive Long-Context Foundation Model for Accident Prediction	Pinaki Prasad Guha Neogi et.al.	2507.08153	null
2025-07-10	Distilling Empathy from Large Language Models	Henry J. Xie et.al.	2507.08151	null
2025-07-10	Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores	Vivek Chari et.al.	2507.08143	null
2025-07-10	GRASP: Generic Reasoning And SPARQL Generation across Knowledge Graphs	Sebastian Walter et.al.	2507.08107	null
2025-07-10	Low-rank Momentum Factorization for Memory Efficient Training	Pouria Mahdavinia et.al.	2507.08091	null
2025-07-10	Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions	Simon Matrenok et.al.	2507.08068	null
2025-07-10	Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs	Ziyue Li et.al.	2507.07996	null
2025-07-10	Multigranular Evaluation for Brain Visual Decoding	Weihao Xia et.al.	2507.07993	null
2025-07-10	Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs	Jeongseok Hyun et.al.	2507.07990	null
2025-07-10	Automating Expert-Level Medical Reasoning Evaluation of Large Language Models	Shuang Zhou et.al.	2507.07988	null
2025-07-10	OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding	JingLi Lin et.al.	2507.07984	null
2025-07-10	Performance and Practical Considerations of Large and Small Language Models in Clinical Decision Support in Rheumatology	Sabine Felde et.al.	2507.07983	null
2025-07-10	Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling	Haoyu Wu et.al.	2507.07982	null
2025-07-10	Defending Against Prompt Injection With a Few DefensiveTokens	Sizhe Chen et.al.	2507.07974	null
2025-07-10	Scaling RL to Long Videos	Yukang Chen et.al.	2507.07966	null
2025-07-10	Dynamic Chunking for End-to-End Hierarchical Sequence Modeling	Sukjun Hwang et.al.	2507.07955	null
2025-07-10	Input Conditioned Layer Dropping in Speech Foundation Models	Abdul Hannan et.al.	2507.07954	null
2025-07-10	Low Resource Reconstruction Attacks Through Benign Prompts	Sol Yarkoni et.al.	2507.07947	null
2025-07-10	Can Large Language Models Improve Phishing Defense? A Large-Scale Controlled Experiment on Warning Dialogue Explanations	Federico Maria Cau et.al.	2507.07916	null
2025-07-10	MIRA: A Novel Framework for Fusing Modalities in Medical RAG	Jinhong Wang et.al.	2507.07902	null
2025-07-10	An Integrated Framework of Prompt Engineering and Multidimensional Knowledge Graphs for Legal Dispute Analysis	Mingda Zhang et.al.	2507.07893	null
2025-07-10	Automating MD simulations for Proteins using Large language Models: NAMD-Agent	Achuth Chandrasekhar et.al.	2507.07887	null
2025-07-10	Opting Out of Generative AI: a Behavioral Experiment on the Role of Education in Perplexity AI Avoidance	Roberto Ulloa et.al.	2507.07881	null
2025-07-10	LISTEN: Lightweight Industrial Sound-representable Transformer for Edge Notification	Changheon Han et.al.	2507.07879	null
2025-07-10	Mitigating Watermark Stealing Attacks in Generative Models via Multi-Key Watermarking	Toluwani Aremu et.al.	2507.07871	null
2025-07-10	DocCHA: Towards LLM-Augmented Interactive Online diagnosis System	Xinyi Liu et.al.	2507.07870	null
2025-07-10	THUNDER: Tile-level Histopathology image UNDERstanding benchmark	Pierre Marza et.al.	2507.07860	null
2025-07-10	From Ambiguity to Accuracy: The Transformative Effect of Coreference Resolution on Retrieval-Augmented Generation systems	Youngjoon Jang et.al.	2507.07847	null
2025-07-10	Towards Benchmarking Foundation Models for Tabular Data With Text	Martin Mráz et.al.	2507.07829	null
2025-07-10	MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving	Lu Xu et.al.	2507.07818	null
2025-07-10	Understanding and Controlling Repetition Neurons and Induction Heads in In-Context Learning	Nhi Hoai Doan et.al.	2507.07810	null
2025-07-10	SecureSpeech: Prompt-based Speaker and Content Protection	Belinda Soh Hui Hui et.al.	2507.07799	null
2025-07-10	Measuring AI Alignment with Human Flourishing	Elizabeth Hilliard et.al.	2507.07787	null
2025-07-10	Where are we with calibration under dataset shift in image classification?	Mélanie Roschewitz et.al.	2507.07780	null
2025-07-10	A Unified Empirical Risk Minimization Framework for Flexible N-Tuples Weak Supervision	Shuying Huang et.al.	2507.07771	null
2025-07-10	Structured Prompts, Better Outcomes? Exploring the Effects of a Structured Interface with ChatGPT in a Graduate Robotics Course	Jerome Brender et.al.	2507.07767	null
2025-07-10	When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance	Peizhang Shao et.al.	2507.07748	null
2025-07-10	On the capabilities of LLMs for classifying and segmenting time series of fruit picking motions into primitive actions	Eleni Konstantinidou et.al.	2507.07745	null
2025-07-10	GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing	Peiyan Zhang et.al.	2507.07735	null
2025-07-10	Not All Preferences are What You Need for Post-Training: Selective Alignment Strategy for Preference Optimization	Zhijin Dong et.al.	2507.07725	null
2025-07-10	KeyKnowledgeRAG (K^2RAG): An Enhanced RAG method for improved LLM question-answering capabilities	Hruday Markondapatnaikuni et.al.	2507.07695	null
2025-07-10	From Domain Documents to Requirements: Retrieval-Augmented Generation in the Space Industry	Chetan Arora et.al.	2507.07689	null
2025-07-10	Rationale-Enhanced Decoding for Multi-modal Chain-of-Thought	Shin'ya Yamaguchi et.al.	2507.07685	null
2025-07-10	Accelerating Transposed Convolutions on FPGA-based Edge Devices	Jude Haris et.al.	2507.07683	null
2025-07-10	Prompt Engineering for Requirements Engineering: A Literature Review and Roadmap	Kaicheng Huang et.al.	2507.07682	null
2025-07-10	PlanQA: A Benchmark for Spatial Reasoning in LLMs using Structured Representations	Fedor Rodionov et.al.	2507.07644	null
2025-07-10	FrugalRAG: Learning to retrieve and reason for multi-hop QA	Abhinav Java et.al.	2507.07634	null
2025-07-10	T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates	Zhitao Wang et.al.	2507.07633	null
2025-07-10	Exploring the Limits of Model Compression in LLMs: A Knowledge Distillation Study on QA Tasks	Joyeeta Datta et.al.	2507.07630	null
2025-07-10	SpatialViz-Bench: Automatically Generated Spatial Visualization Reasoning Tasks for MLLMs	Siting Wang et.al.	2507.07610	null
2025-07-10	Enhancing Vaccine Safety Surveillance: Extracting Vaccine Mentions from Emergency Department Triage Notes Using Fine-Tuned Large Language Models	Sedigh Khademi et.al.	2507.07599	null
2025-07-10	NexViTAD: Few-shot Unsupervised Cross-Domain Defect Detection via Vision Foundation Models and Multi-Task Learning	Tianwei Mu et.al.	2507.07579	null
2025-07-10	Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation	Yupu Liang et.al.	2507.07572	null
2025-07-10	CEA-LIST at CheckThat! 2025: Evaluating LLMs as Detectors of Bias and Opinion in Text	Akram Elbouanani et.al.	2507.07539	null
2025-07-10	MAPEX: Modality-Aware Pruning of Experts for Remote Sensing Foundation Models	Joelle Hanna et.al.	2507.07527	null
2025-07-10	Toward Real-World Chinese Psychological Support Dialogues: CPsDD Dataset and a Co-Evolving Multi-Agent System	Yuanchen Shi et.al.	2507.07509	null
2025-07-10	PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving	Mihir Parmar et.al.	2507.07495	null
2025-07-10	Sparse Autoencoders Reveal Interpretable Structure in Small Gene Language Models	Haoxiang Guan et.al.	2507.07486	null
2025-07-10	Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models	Kaiqu Liang et.al.	2507.07484	null
2025-07-10	General purpose models for the chemical sciences	Nawaf Alampara et.al.	2507.07456	null
2025-07-10	RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning	Hongzhi Zhang et.al.	2507.07451	null
2025-07-10	StarDojo: Benchmarking Open-Ended Behaviors of Agentic Multimodal LLMs in Production-Living Simulations with Stardew Valley	Weihao Tan et.al.	2507.07445	null
2025-07-10	SAND: Boosting LLM Agents with Self-Taught Action Deliberation	Yu Xia et.al.	2507.07441	null
2025-07-10	Towards Interpretable Time Series Foundation Models	Matthieu Boileau et.al.	2507.07439	null
2025-07-10	Neural networks leverage nominally quantum and post-quantum representations	Paul M. Riechers et.al.	2507.07432	null
2025-07-10	DrugMCTS: a drug repurposing framework combining multi-agent, RAG and Monte Carlo Tree Search	Zerui Yang et.al.	2507.07426	null
2025-07-10	Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning	Jingjing Jiang et.al.	2507.07424	null
2025-07-10	May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks	Nishit V. Pandya et.al.	2507.07417	null
2025-07-10	EPIC: Efficient Prompt Interaction for Text-Image Classification	Xinyao Yu et.al.	2507.07415	null
2025-07-10	GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation	Fardin Rastakhiz et.al.	2507.07414	null
2025-07-10	Hybrid LLM-Enhanced Intrusion Detection for Zero-Day Threats in IoT Networks	Mohammad F. Al-Hammouri et.al.	2507.07413	null
2025-07-10	Phishing Detection in the Gen-AI Era: Quantized LLMs vs Classical Models	Jikesh Thapa et.al.	2507.07406	null
2025-07-10	KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows	Zaifeng Pan et.al.	2507.07400	null
2025-07-10	Behave Your Motion: Habit-preserved Cross-category Animal Motion Transfer	Zhimin Zhang et.al.	2507.07394	null
2025-07-10	Learning Collective Variables from Time-lagged Generation	Seonghyun Park et.al.	2507.07390	null
2025-07-10	Bradley-Terry and Multi-Objective Reward Modeling Are Complementary	Zhiwei Zhang et.al.	2507.07375	null
2025-07-10	PacGDC: Label-Efficient Generalizable Depth Completion with Projection Ambiguity and Consistency	Haotian Wang et.al.	2507.07374	null
2025-07-09	On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment	Sarah Ball et.al.	2507.07341	null
2025-07-09	Bridging the Plausibility-Validity Gap by Fine-Tuning a Reasoning-Enhanced LLM for Chemical Synthesis and Discovery	Malikussaid et.al.	2507.07328	null
2025-07-09	Frontier LLMs Still Struggle with Simple Reasoning Tasks	Alan Malek et.al.	2507.07313	null
2025-07-09	Multi-Agent Retrieval-Augmented Framework for Evidence-Based Counterspeech Against Health Misinformation	Anirban Saha Anik et.al.	2507.07307	null
2025-07-09	Application of LLMs to Multi-Robot Path Planning and Task Allocation	Ashish Kumar et.al.	2507.07302	null
2025-07-09	Time Series Foundation Models for Multivariate Financial Time Series Forecasting	Ben A. Marconi et.al.	2507.07296	null
2025-07-09	Thermodynamic Prediction Enabled by Automatic Dataset Building and Machine Learning	Juejing Liu et.al.	2507.07293	null
2025-07-09	Open Source Planning & Control System with Language Agents for Autonomous Scientific Discovery	Licong Xu et.al.	2507.07257	null
2025-07-09	A Language-Driven Framework for Improving Personalized Recommendations: Merging LLMs with Traditional Algorithms	Aaron Goldstein et.al.	2507.07251	null
2025-07-09	Medical Red Teaming Protocol of Language Models: On the Importance of User Perspectives in Healthcare Settings	Minseon Kim et.al.	2507.07248	null
2025-07-09	Attentions Under the Microscope: A Comparative Study of Resource Utilization for Variants of Self-Attention	Zhengyu Tian et.al.	2507.07247	null
2025-07-09	An Information-Theoretic Perspective on Multi-LLM Uncertainty Estimation	Maya Kruse et.al.	2507.07236	null
2025-07-09	SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains	Krithika Ramesh et.al.	2507.07229	null
2025-07-09	Compute Can't Handle the Truth: Why Communication Tax Prioritizes Memory and Interconnects in Modern AI Infrastructure	Myoungsoo Jung et.al.	2507.07223	null
2025-07-09	Neurosymbolic Feature Extraction for Identifying Forced Labor in Supply Chains	Zili Wang et.al.	2507.07217	null
2025-07-09	Scale leads to compositional generalization	Florian Redhardt et.al.	2507.07207	null
2025-07-09	State-Inference-Based Prompting for Natural Language Trading with Game NPCs	Minkyung Kim et.al.	2507.07203	null
2025-07-09	A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality	Mohamed Elmoghany et.al.	2507.07202	null
2025-07-09	Combining Pre-Trained Models for Enhanced Feature Representation in Reinforcement Learning	Elia Piccoli et.al.	2507.07197	null
2025-07-09	Bridging the Last Mile of Prediction: Enhancing Time Series Forecasting with Conditional Guided Flow Matching	Huibo Xu et.al.	2507.07192	null
2025-07-09	Prompt Perturbations Reveal Human-Like Biases in LLM Survey Responses	Jens Rupprecht et.al.	2507.07188	null
2025-07-09	Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs	Itay Itzhak et.al.	2507.07186	null
2025-07-09	Interpretable EEG-to-Image Generation with Semantic Prompts	Arshak Rezvani et.al.	2507.07157	null
2025-07-09	Evaluating Retrieval-Augmented Generation Agents for Autonomous Scientific Discovery in Astrophysics	Xueqing Xu et.al.	2507.07155	null
2025-07-09	Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor	Vatsal Agarwal et.al.	2507.07106	null
2025-07-09	Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models	Tiezheng Zhang et.al.	2507.07104	null
2025-07-09	Evaluating Attribute Confusion in Fashion Text-to-Image Generation	Ziyue Liu et.al.	2507.07079	null
2025-07-09	5C Prompt Contracts: A Minimalist, Creative-Friendly, Token-Efficient Design Framework for Individual and SME LLM Usage	Ugur Ari et.al.	2507.07045	null
2025-07-09	UniConv: Unifying Retrieval and Response Generation for Large Language Models in Conversations	Fengran Mo et.al.	2507.07030	null
2025-07-09	First Return, Entropy-Eliciting Explore	Tianyu Zheng et.al.	2507.07017	null
2025-07-09	Integrating Pathology Foundation Models and Spatial Transcriptomics for Cellular Decomposition from Histology Images	Yutong Sun et.al.	2507.07013	null
2025-07-09	GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning	S M Taslim Uddin Raju et.al.	2507.07006	null
2025-07-09	Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs	Yahan Yu et.al.	2507.06999	null
2025-07-09	MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation	Qilong Xing et.al.	2507.06992	null
2025-07-09	Are They All Good? Evaluating the Quality of CoTs in LLM-based Code Generation	Binquan Zhang et.al.	2507.06980	null
2025-07-09	Hallucinating 360°: Panoramic Street-View Generation via Local Scenes Diffusion and Probabilistic Prompting	Fei Teng et.al.	2507.06971	null
2025-07-09	Scaling Towards the Information Boundary of Instruction Set: InfinityInstruct-Subject Technical Report	Li Du et.al.	2507.06968	null
2025-07-09	Investigating the Robustness of Retrieval-Augmented Generation at the Query Level	Sezen Perçin et.al.	2507.06956	null
2025-07-09	What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models	Keyon Vafa et.al.	2507.06952	null
2025-07-09	Rethinking Verification for LLM Code Generation: From Generation to Testing	Zihan Ma et.al.	2507.06920	null
2025-07-09	Exploring LLMs for Predicting Tutor Strategy and Student Outcomes in Dialogues	Fareya Ikram et.al.	2507.06910	null
2025-07-09	MultiJustice: A Chinese Dataset for Multi-Party, Multi-Charge Legal Prediction	Xiao Wang et.al.	2507.06909	null
2025-07-09	SCoRE: Streamlined Corpus-based Relation Extraction using Multi-Label Contrastive Learning and Bayesian kNN	Luca Mariotti et.al.	2507.06895	null
2025-07-09	Developing and Maintaining an Open-Source Repository of AI Evaluations: Challenges and Insights	Alexandra Abbas et.al.	2507.06893	null
2025-07-09	Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model	Jing Liang et.al.	2507.06892	null
2025-07-09	DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models	Liang Wang et.al.	2507.06853	null
2025-07-09	The Dark Side of LLMs Agent-based Attacks for Complete Computer Takeover	Matteo Lupinacci et.al.	2507.06850	null
2025-07-09	Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation	Tao Feng et.al.	2507.06830	null
2025-07-09	Adaptive Termination for Multi-round Parallel Reasoning: An Universal Semantic Entropy-Guided Framework	Zenan Xu et.al.	2507.06829	null
2025-07-09	Democratizing High-Fidelity Co-Speech Gesture Video Generation	Xu Yang et.al.	2507.06812	null
2025-07-09	Text to model via SysML: Automated generation of dynamical system computational models from unstructured natural language text via enhanced System Modeling Language diagrams	Matthew Anderson Hendricks et.al.	2507.06803	null
2025-07-09	Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining: Method, Evaluation and Applications	Seonwu Kim et.al.	2507.06795	null
2025-07-09	Checklist Engineering Empowers Multilingual LLM Judges	Mohammad Ghiasvand Mohammadkhani et.al.	2507.06774	null
2025-07-09	Leveraging LLMs for Semantic Conflict Detection via Unit Test Generation	Nathalia Barbosa et.al.	2507.06762	null
2025-07-09	LOVON: Legged Open-Vocabulary Object Navigator	Daojie Peng et.al.	2507.06747	null
2025-07-09	PenTest2.0: Towards Autonomous Privilege Escalation Using GenAI	Haitham S. Al-Sinani et.al.	2507.06742	null
2025-07-09	Hierarchical Feature Alignment for Gloss-Free Sign Language Translation	Sobhan Asasi et.al.	2507.06732	null
2025-07-09	On the Effect of Uncertainty on Layer-wise Inference Dynamics	Sunwoo Kim et.al.	2507.06722	null
2025-07-09	A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding	Zhenyang Liu et.al.	2507.06719	null
2025-07-09	CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs	Garapati Keerthana et.al.	2507.06715	null
2025-07-09	Elite Polarization in European Parliamentary Speeches: a Novel Measurement Approach Using Large Language Models	Gennadii Iakovlev et.al.	2507.06658	null
2025-07-09	Deep Disentangled Representation Network for Treatment Effect Estimation	Hui Meng et.al.	2507.06650	null
2025-07-09	EXAONE Path 2.0: Pathology Foundation Model with End-to-End Supervision	Myungjang Pyeon et.al.	2507.06639	null
2025-07-09	UniOD: A Universal Model for Outlier Detection across Diverse Domains	Dazhi Fu et.al.	2507.06624	null
2025-07-09	Expediting data extraction using a large language model (LLM) and scoping review protocol: a methodological study within a complex scoping review	James Stewart-Evans et.al.	2507.06623	null
2025-07-09	FuDoBa: Fusing Document and Knowledge Graph-based Representations with Bayesian Optimisation	Boshko Koloski et.al.	2507.06622	null
2025-07-09	Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation	Anshuk Uppal et.al.	2507.06613	null
2025-07-09	From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization	Xinjie Chen et.al.	2507.06573	null
2025-07-09	SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference	Qian Chen et.al.	2507.06567	null
2025-07-09	The Flaws of Others: An LLM-driven Framework for Scientific Knowledge Production	Juan B. Gutiérrez et.al.	2507.06565	null
2025-07-09	SkyVLN: Vision-and-Language Navigation and NMPC Control for UAVs in Urban Environments	Tianshun Li et.al.	2507.06564	null
2025-07-09	SPEAR: Subset-sampled Performance Evaluation via Automated Ground Truth Generation for RAG	Zou Yuheng et.al.	2507.06554	null
2025-07-09	Large Language Model for Extracting Complex Contract Information in Industrial Scenes	Yunyang Cao et.al.	2507.06539	null
2025-07-09	InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes under Herd Behavior	Huisheng Wang et.al.	2507.06528	null
2025-07-09	FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation	Liqiang Jing et.al.	2507.06523	null
2025-07-09	SpindleKV: A Novel KV Cache Reduction Method Balancing Both Shallow and Deep Layers	Zicong Tang et.al.	2507.06517	null
2025-07-09	QUEST: Query Optimization in Unstructured Document Analysis	Zhaoze Sun et.al.	2507.06515	null
2025-07-09	Towards LLM-based Root Cause Analysis of Hardware Design Failures	Siyu Qiu et.al.	2507.06512	null
2025-07-09	Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection	Yupeng Hu et.al.	2507.06510	null
2025-07-09	GR-LLMs: Recent Advances in Generative Recommendation Based on Large Language Models	Zhen Yang et.al.	2507.06507	null
2025-07-09	Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings	Russell Taylor et.al.	2507.06506	null
2025-07-09	MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models	Yiwen Liu et.al.	2507.06502	null
2025-07-09	On the Robustness of Verbal Confidence of LLMs in Adversarial Attacks	Stephen Obadinma et.al.	2507.06489	null
2025-07-09	Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning	Ziyang Wang et.al.	2507.06485	null
2025-07-09	3D-Generalist: Self-Improving Vision-Language-Action Models for Crafting 3D Worlds	Fan-Yun Sun et.al.	2507.06484	null
2025-07-09	Learning Japanese with Jouzu: Interaction Outcomes with Stylized Dialogue Fictional Agents	Zackary Rackauckas et.al.	2507.06483	null
2025-07-09	IMPACT: Industrial Machine Perception via Acoustic Cognitive Transformer	Changheon Han et.al.	2507.06481	null
2025-07-09	Generative Lagrangian data assimilation for ocean dynamics under extreme sparsity	Niloofar Asefi et.al.	2507.06479	null
2025-07-09	Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models	Aaron Dharna et.al.	2507.06466	null
2025-07-09	Evaluating Efficiency and Novelty of LLM-Generated Code for Graph Analysis	Atieh Barati Nia et.al.	2507.06463	null
2025-07-08	A Semantic Parsing Framework for End-to-End Time Normalization	Xin Su et.al.	2507.06450	null
2025-07-08	Perception-Aware Policy Optimization for Multimodal Reasoning	Zhenhailong Wang et.al.	2507.06448	null
2025-07-08	Exploring Task Performance with Interpretable Models via Sparse Auto-Encoders	Shun Wang et.al.	2507.06427	null
2025-07-08	Reward Models Can Improve Themselves: Reward-Guided Adversarial Failure Mode Discovery for Robust Reward Modeling	Pankayaraj Pathmanathan et.al.	2507.06419	null
2025-07-08	PAST: A multimodal single-cell foundation model for histopathology and spatial transcriptomics in cancer	Changchun Yang et.al.	2507.06418	null
2025-07-08	Voltage Regulation in Distribution Systems with Data Center Loads	Yize Chen et.al.	2507.06416	null
2025-07-08	An AI-Driven Thermal-Fluid Testbed for Advanced Small Modular Reactors: Integration of Digital Twin and Large Language Models	Doyeong Lim et.al.	2507.06399	null
2025-07-08	SLDB: An End-To-End Heterogeneous System-on-Chip Benchmark Suite for LLM-Aided Design	Elisavet Lydia Alvanaki et.al.	2507.06376	null
2025-07-08	Bridging AI and Software Security: A Comparative Vulnerability Assessment of LLM Agent Deployment Paradigms	Tarek Gasmi et.al.	2507.06323	null
2025-07-08	Too Human to Model:The Uncanny Valley of LLMs in Social Simulation -- When Generative Language Agents Misalign with Modelling Principles	Yongchao Zeng et.al.	2507.06310	null
2025-07-08	Humans overrely on overconfident language models, across languages	Neil Rathi et.al.	2507.06306	null
2025-07-08	RSRefSeg 2: Decoupling Referring Remote Sensing Image Segmentation with Foundation Models	Keyan Chen et.al.	2507.06231	null
2025-07-08	Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers	Zhiyuan Peng et.al.	2507.06223	null
2025-07-08	Is Diversity All You Need for Scalable Robotic Manipulation?	Modi Shi et.al.	2507.06219	null
2025-07-08	A Survey on Latent Reasoning	Rui-Jie Zhu et.al.	2507.06203	null
2025-07-08	UQLM: A Python Package for Uncertainty Quantification in Large Language Models	Dylan Bouchard et.al.	2507.06196	null
2025-07-08	SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads	Jiale Lao et.al.	2507.06192	null
2025-07-08	Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review	Zhicheng Lin et.al.	2507.06185	null
2025-07-08	Data-Semantics-Aware Recommendation of Diverse Pivot Tables	Whanhee Cho et.al.	2507.06171	null
2025-07-09	Skywork-R1V3 Technical Report	Wei Shen et.al.	2507.06167	null
2025-07-08	Evaluation of Habitat Robotics using Large Language Models	William Li et.al.	2507.06157	null
2025-07-08	Large Language Models Predict Human Well-being -- But Not Equally Everywhere	Pat Pataranutaporn et.al.	2507.06141	null
2025-07-08	Coding Triangle: How Does Large Language Model Understand Code?	Taolin Zhang et.al.	2507.06138	null
2025-07-08	PrefixAgent: An LLM-Powered Design Framework for Efficient Prefix Adder Optimization	Dongsheng Zuo et.al.	2507.06127	null
2025-07-09	Omni-Video: Democratizing Unified Video Understanding and Generation	Zhiyu Tan et.al.	2507.06119	null
2025-07-08	Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis	Xintong Hu et.al.	2507.06116	null
2025-07-08	Reflections Unlock: Geometry-Aware Reflection Disentanglement in 3D Gaussian Splatting for Photorealistic Scenes Rendering	Jiayi Song et.al.	2507.06103	null
2025-07-09	FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models	Bo Pang et.al.	2507.06057	null
2025-07-08	Entropy-Memorization Law: Evaluating Memorization Difficulty of Data in LLMs	Yizhan Huang et.al.	2507.06056	null
2025-07-08	Kernel Trace Distance: Quantum Statistical Metric between Measures through RKHS Density Operators	Arturo Castellanos et.al.	2507.06055	null
2025-07-08	Hierarchical Interaction Summarization and Contrastive Prompting for Explainable Recommendations	Yibin Liu et.al.	2507.06044	null
2025-07-08	CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations	Xiaohu Li et.al.	2507.06043	null
2025-07-08	CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation	Kushal Gajjar et.al.	2507.06013	null
2025-07-08	DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations	Nicholas Popovič et.al.	2507.05997	null
2025-07-08	Development and Evaluation of HopeBot: an LLM-based chatbot for structured and interactive PHQ-9 depression screening	Zhijun Guo et.al.	2507.05984	null
2025-07-08	Multi-Agent Debate Strategies to Enhance Requirements Engineering with Large Language Models	Marc Oriol et.al.	2507.05981	null
2025-07-08	RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages	Gabriel Chua et.al.	2507.05980	null
2025-07-08	Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval	Haiwen Li et.al.	2507.05970	null
2025-07-08	OpenFActScore: Open-Source Atomic Evaluation of Factuality in Text Generation	Lucas Fonseca Lage et.al.	2507.05965	null
2025-07-08	Evaluation of Large Language Model-Driven AutoML in Data and Model Management from Human-Centered Perspective	Jiapeng Yao et.al.	2507.05962	null
2025-07-08	A Wireless Foundation Model for Multi-Task Prediction	Yucheng Sheng et.al.	2507.05938	null
2025-07-08	BlueLM-2.5-3B Technical Report	Baojiao Xiong et.al.	2507.05934	null
2025-07-08	Few-shot text-based emotion detection	Teodor-George Marchitan et.al.	2507.05918	null
2025-07-08	Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis	Gholamali Aminian et.al.	2507.05913	null
2025-07-08	AI-Reporter: A Path to a New Genre of Scientific Communication	Gerd Graßhoff et.al.	2507.05903	null
2025-07-08	Psychometric Item Validation Using Virtual Respondents with Trait-Response Mediators	Sungjib Lim et.al.	2507.05890	null
2025-07-08	Current Practices for Building LLM-Powered Reasoning Tools Are Ad Hoc -- and We Can Do Better	Aaron Bembenek et.al.	2507.05886	null
2025-07-08	RecRankerEval: A Flexible and Extensible Framework for Top-k LLM-based Recommendation	Zeyuan Meng et.al.	2507.05880	null
2025-07-08	KERAG_R: Knowledge-Enhanced Retrieval-Augmented Generation for Recommendation	Zeyuan Meng et.al.	2507.05863	null
2025-07-08	USIGAN: Unbalanced Self-Information Feature Transport for Weakly Paired Image IHC Virtual Staining	Yue Peng et.al.	2507.05843	null
2025-07-08	Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models	L'ea Dubois et.al.	2507.05822	null
2025-07-08	2D Instance Editing in 3D Space	Yuhuan Xie et.al.	2507.05819	null
2025-07-08	Affective-ROPTester: Capability and Bias Analysis of LLMs in Predicting Retinopathy of Prematurity	Shuai Zhao et.al.	2507.05816	null
2025-07-08	Just Say Better or Worse: A Human-AI Collaborative Framework for Medical Image Segmentation Without Manual Annotations	Yizhe Zhang et.al.	2507.05815	null
2025-07-08	Improving Robustness of Foundation Models in Domain Adaptation with Soup-Adapters	Marco Roschkowski et.al.	2507.05807	null
2025-07-08	DREAM: Document Reconstruction via End-to-end Autoregressive Model	Xin Li et.al.	2507.05805	null
2025-07-08	Creating a customisable freely-accessible Socratic AI physics tutor	Eugenio Tufino et.al.	2507.05795	null
2025-07-08	TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model	Yujie Hu et.al.	2507.05790	null
2025-07-08	Flippi: End To End GenAI Assistant for E-Commerce	Anand A. Rajasekar et.al.	2507.05788	null
2025-07-08	Text-Guided Token Communication for Wireless Image Transmission	Bole Liu et.al.	2507.05781	null
2025-07-08	LeAD: The LLM Enhanced Planning System Converged with End-to-end Autonomous Driving	Yuhang Zhang et.al.	2507.05754	null
2025-07-08	Jigsaw: Training Multi-Billion-Parameter AI Weather Models with Optimized Model Parallelism	Deifilia Kieckhefen et.al.	2507.05753	null
2025-07-08	DocTalk: Scalable Graph-based Dialogue Synthesis for Enhancing LLM Conversational Capabilities	Jing Yang Lee et.al.	2507.05750	null
2025-07-08	Tissue Concepts v2: a Supervised Foundation Model for whole slide images	Till Nicke et.al.	2507.05742	null
2025-07-08	When Transformers Meet Recommenders: Integrating Self-Attentive Sequential Recommendation with Fine-Tuned LLMs	Kechen Liu et.al.	2507.05733	null
2025-07-08	ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark	He Wang et.al.	2507.05727	null
2025-07-08	Large Language Models for Agent-Based Modelling: Current and possible uses across the modelling cycle	Loïs Vanhée et.al.	2507.05723	null
2025-07-08	HIRAG: Hierarchical-Thought Instruction-Tuning Retrieval-Augmented Generation	YiHan Jiao et.al.	2507.05714	null
2025-07-08	DRAGON: Dynamic RAG Benchmark On News	Fedor Chernogorskii et.al.	2507.05713	null
2025-07-08	Smoothie-Qwen: Post-Hoc Smoothing to Reduce Language Bias in Multilingual LLMs	SeungWon Ji et.al.	2507.05686	null
2025-07-08	MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos	Rongsheng Wang et.al.	2507.05675	null
2025-07-08	Integrating Diffusion-based Multi-task Learning with Online Reinforcement Learning for Robust Quadruped Robot Control	Xinyao Qin et.al.	2507.05674	null
2025-07-08	TuneShield: Mitigating Toxicity in Conversational AI while Fine-tuning on Untrusted Data	Aravind Cheruvu et.al.	2507.05660	null
2025-07-08	LLMs are Introvert	Litian Zhang et.al.	2507.05638	null
2025-07-08	SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression	Yiqiao Jin et.al.	2507.05633	null
2025-07-08	Enhancing Student Learning with LLM-Generated Retrieval Practice Questions: An Empirical Study in Data Science Courses	Yuan An et.al.	2507.05629	null
2025-07-08	DreamGrasp: Zero-Shot 3D Multi-Object Reconstruction from Partial-View Images for Robotic Manipulation	Young Hun Kim et.al.	2507.05627	null
2025-07-08	Flipping Knowledge Distillation: Leveraging Small Models' Expertise to Enhance LLMs in Text Matching	Mingzhe Li et.al.	2507.05617	null
2025-07-08	Domain adaptation of large language models for geotechnical applications	Lei Fan et.al.	2507.05613	null
2025-07-08	MMW: Side Talk Rejection Multi-Microphone Whisper on Smart Glasses	Yang Liu et.al.	2507.05609	null
2025-07-08	Structured Task Solving via Modular Embodied Intelligence: A Case Study on Rubik's Cube	Chongshan Fan et.al.	2507.05607	null
2025-07-08	Self-Review Framework for Enhancing Instruction Following Capability of LLM	Sihyun Park et.al.	2507.05598	null
2025-07-08	PaddleOCR 3.0 Technical Report	Cheng Cui et.al.	2507.05595	null
2025-07-08	MLlm-DR: Towards Explainable Depression Recognition with MultiModal Large Language Models	Wei Zhang et.al.	2507.05591	null
2025-07-08	The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation	Alexander Xiong et.al.	2507.05578	null
2025-07-08	Beyond Retrieval: Ensembling Cross-Encoders and GPT Rerankers with LLMs for Biomedical QA	Shashank Verma et.al.	2507.05577	null
2025-07-08	Prompt Migration: Stabilizing GenAI Applications with Evolving Large Language Models	Shivani Tripathi et.al.	2507.05573	null
2025-07-08	ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models	Jiaxu Tian et.al.	2507.05568	null
2025-07-08	Search-based Selection of Metamorphic Relations for Optimized Robustness Testing of Large Language Models	Sangwon Hyun et.al.	2507.05565	null
2025-07-08	Enhancing Test-Time Scaling of Large Language Models with Hierarchical Retrieval-Augmented MCTS	Alex ZH Dou et.al.	2507.05557	null
2025-07-08	A Malliavin calculus approach to score functions in diffusion generative models	Ehsan Mirafzali et.al.	2507.05550	null
2025-07-07	SenseCF: LLM-Prompted Counterfactuals for Intervention and Sensor Data Augmentation	Shovito Barua Soumma et.al.	2507.05541	null
2025-07-07	Conversational Education at Scale: A Multi-LLM Agent Workflow for Procedural Learning and Pedagogic Quality Assessment	Jiahuan Pei et.al.	2507.05528	null
2025-07-07	Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications	Jean-Philippe Corbeil et.al.	2507.05517	null
2025-07-07	Tool for Supporting Debugging and Understanding of Normative Requirements Using LLMs	Alex Kleijwegt et.al.	2507.05504	null
2025-07-07	MolFORM: Multi-modal Flow Matching for Structure-Based Drug Design	Jie Huang et.al.	2507.05503	null
2025-07-07	Deep Research Comparator: A Platform For Fine-grained Human Annotations of Deep Research Agents	Prahaladh Chandrahasan et.al.	2507.05495	null
2025-07-07	MBFormer: A General Transformer-based Learning Paradigm for Many-body Interactions in Real Materials	Bowen Hou et.al.	2507.05480	null
2025-07-07	Dense and comeager conjugacy classes in zero-dimensional dynamics	Michal Doucha et.al.	2507.05474	null
2025-07-07	Inaugural MOASEI Competition at AAMAS'2025: A Technical Report	Ceferino Patino et.al.	2507.05469	null
2025-07-07	Risk-Aware Aerocapture Guidance Through a Probabilistic Indicator Function	Grace E. Calkins et.al.	2507.05454	null
2025-07-07	On the Semantics of Large Language Models	Martin Schuele et.al.	2507.05448	null
2025-07-07	PhoniTale: Phonologically Grounded Mnemonic Generation for Typologically Distant Language Pairs	Sana Kang et.al.	2507.05444	null
2025-07-07	Mastering Regional 3DGS: Locating, Initializing, and Editing with Diverse 2D Priors	Lanqing Guo et.al.	2507.05426	null
2025-07-07	"Lost-in-the-Later": Framework for Quantifying Contextual Grounding in Large Language Models	Yufei Tao et.al.	2507.05424	null
2025-07-07	Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning	Jaedong Hwang et.al.	2507.05418	null
2025-07-07	PBE Meets LLM: When Few Examples Aren't Few-Shot Enough	Shuning Zhang et.al.	2507.05403	null
2025-07-07	Neural-Driven Image Editing	Pengfei Zhou et.al.	2507.05397	null
2025-07-07	Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences	Guillem Ramírez et.al.	2507.05391	null
2025-07-07	From General to Specialized: The Need for Foundational Models in Agriculture	Vishal Nedungadi et.al.	2507.05390	null
2025-07-07	Reinforcement Fine-Tuning Naturally Mitigates Forgetting in Continual Post-Training	Song Lai et.al.	2507.05386	null
2025-07-07	Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing	Chun-Hsiao Yeh et.al.	2507.05259	null
2025-07-07	Spatio-Temporal LLM: Reasoning about Environments and Actions	Haozhen Zheng et.al.	2507.05258	null
2025-07-07	Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions	Yuanzhe Hu et.al.	2507.05257	null
2025-07-07	Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning	Yana Wei et.al.	2507.05255	null
2025-07-07	Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models	Ziqi Miao et.al.	2507.05248	null
2025-07-07	Modeling Latent Partner Strategies for Adaptive Zero-Shot Human-Agent Collaboration	Benjamin Li et.al.	2507.05244	null
2025-07-07	StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling	Meng Wei et.al.	2507.05240	null
2025-07-07	All in One: Visual-Description-Guided Unified Point Cloud Segmentation	Zongyan Han et.al.	2507.05211	null
2025-07-07	MedGemma Technical Report	Andrew Sellergren et.al.	2507.05201	null
2025-07-07	CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale	Jonathan Hyun et.al.	2507.05178	null
2025-07-07	OpenS2S: Advancing Open-Source End-to-End Empathetic Large Speech Language Model	Chen Wang et.al.	2507.05177	null
2025-07-07	A Dynamical Systems Perspective on the Analysis of Neural Networks	Dennis Chemnitz et.al.	2507.05164	null
2025-07-07	4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture	Yutian Chen et.al.	2507.05163	null
2025-07-07	AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models	Chinnappa Guggilla et.al.	2507.05157	null
2025-07-07	Interpretable Mnemonic Generation for Kanji Learning via Expectation-Maximization	Jaewook Lee et.al.	2507.05137	null
2025-07-07	LERa: Replanning with Visual Feedback in Instruction Following	Svyatoslav Pchelintsev et.al.	2507.05135	null
2025-07-07	An Evaluation of Large Language Models on Text Summarization Tasks Using Prompt Engineering Techniques	Walid Mohamed Aly et.al.	2507.05123	null
2025-07-07	LVM4CSI: Enabling Direct Application of Pre-Trained Large Vision Models for Wireless Channel Tasks	Jiajia Guo et.al.	2507.05121	null
2025-07-07	VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots	Danil S. Grigorev et.al.	2507.05118	null
2025-07-07	DICE: Discrete inverse continuity equation for learning population dynamics	Tobias Blickhan et.al.	2507.05107	null
2025-07-07	The Hidden Threat in Plain Text: Attacking RAG Data Loaders	Alberto Castagnaro et.al.	2507.05093	null
2025-07-07	Gaussian approximation for non-linearity parameter estimation in perturbed random fields on the sphere	Claudio Durastanti et.al.	2507.05074	null
2025-07-07	ICAS: Detecting Training Data from Autoregressive Image Generative Models	Hongyao Yu et.al.	2507.05068	null
2025-07-07	Replacing thinking with tool usage enables reasoning in small language models	Corrado Rainone et.al.	2507.05065	null
2025-07-07	What Shapes User Trust in ChatGPT? A Mixed-Methods Study of User Attributes, Trust Dimensions, Task Context, and Societal Perceptions among University Students	Kadija Bouyzourn et.al.	2507.05046	null
2025-07-07	MoLink: Distributed and Efficient Serving Framework for Large Models	Lewei Jin et.al.	2507.05043	null
2025-07-07	Beyond Scaling Curves: Internal Dynamics of Neural Networks Through the NTK Lens	Konstantin Nikolaou et.al.	2507.05035	null
2025-07-07	Estimating Object Physical Properties from RGB-D Vision and Depth Robot Sensors Using Deep Learning	Ricardo Cardoso et.al.	2507.05029	null
2025-07-07	A Generative Diffusion Model for Amorphous Materials	Kai Yang et.al.	2507.05024	null
2025-07-07	Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification	Chenfei Xiong et.al.	2507.05010	null
2025-07-07	Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition	Britty Baby et.al.	2507.05007	null
2025-07-07	From Autonomy to Agency: Agentic Vehicles for Human-Centered Mobility Systems	Jiangbo Yu et.al.	2507.04996	null
2025-07-07	Parameterized Diffusion Optimization enabled Autoregressive Ordinal Regression for Diabetic Retinopathy Grading	Qinkai Yu et.al.	2507.04978	null
2025-07-07	Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models	Eunseop Yoon et.al.	2507.04976	null
2025-07-07	The Case for Instance-Optimized LLMs in OLAP Databases	Bardia Mohammadi et.al.	2507.04967	null
2025-07-07	EXPOTION: Facial Expression and Motion Control for Multimodal Music Generation	Fathinah Izzati et.al.	2507.04955	null
2025-07-07	ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation	Chenchen Zhang et.al.	2507.04952	null
2025-07-07	ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding	Jianjiang Yang et.al.	2507.04943	null
2025-07-07	Contextual Light-Particle Interference	Brian Stout et.al.	2507.04935	null
2025-07-07	LIFT: Automating Symbolic Execution Optimization with Large Language Models for AI Networks	Ruoxi Wang et.al.	2507.04931	null
2025-07-07	HV-MMBench: Benchmarking MLLMs for Human-Centric Video Understanding	Yuxuan Cai et.al.	2507.04909	null
2025-07-07	Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations	A. Bochkov et.al.	2507.04886	null
2025-07-07	DoPI: Doctor-like Proactive Interrogation LLM for Traditional Chinese Medicine	Zewen Sun et.al.	2507.04877	null
2025-07-07	Music Boomerang: Reusing Diffusion Models for Data Augmentation and Audio Manipulation	Alexander Fichtinger et.al.	2507.04864	null
2025-07-07	Supporting Software Formal Verification with Large Language Models: An Experimental Study	Weiqi Wang et.al.	2507.04857	null
2025-07-07	Semantically Consistent Discrete Diffusion for 3D Biological Graph Modeling	Chinmay Prabhakar et.al.	2507.04856	null
2025-07-07	$\textit{Grahak-Nyay:}$ Consumer Grievance Redressal through Large Language Models	Shrey Ganatra et.al.	2507.04854	null
2025-07-07	Dialogue-Based Multi-Dimensional Relationship Extraction from Novels	Yuchen Yan et.al.	2507.04852	null
2025-07-07	Spec-TOD: A Specialized Instruction-Tuned LLM Framework for Efficient Task-Oriented Dialogue Systems	Quang-Vinh Nguyen et.al.	2507.04841	null
2025-07-07	RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction	Johannes Künzel et.al.	2507.04839	null
2025-07-07	The Geopolitical Determinants of Economic Growth, 1960-2019	Tianyu Fan et.al.	2507.04833	null
2025-07-07	Harnessing Pairwise Ranking Prompting Through Sample-Efficient Ranking Distillation	Junru Wu et.al.	2507.04820	null
2025-07-07	Application and Evaluation of Large Language Models for Forecasting the Impact of Traffic Incidents	George Jagadeesh et.al.	2507.04803	null
2025-07-07	Generalization bounds for score-based generative models: a synthetic proof	Arthur Stéphanovitch et.al.	2507.04794	null
2025-07-07	Reason to Rote: Rethinking Memorization in Reasoning	Yupei Du et.al.	2507.04782	null
2025-07-07	From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection	Zexi Jia et.al.	2507.04769	null
2025-07-07	ABench-Physics: Benchmarking Physical Reasoning in LLMs via High-Difficulty and Dynamic Physics Problems	Yiming Zhang et.al.	2507.04766	null
2025-07-07	GraphBrep: Learning B-Rep in Graph Structure for Efficient CAD Generation	Weilin Lai et.al.	2507.04765	null
2025-07-07	Intervening to learn and compose disentangled representations	Alex Markham et.al.	2507.04754	null
2025-07-07	Large Language Models for Network Intrusion Detection Systems: Foundations, Implementations, and Future Directions	Shuo Yang et.al.	2507.04752	null
2025-07-07	LLMs as Architects and Critics for Multi-Source Opinion Summarization	Anuj Attri et.al.	2507.04751	null
2025-07-07	LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction	Sungmin Lee et.al.	2507.04748	null
2025-07-07	Activation Steering for Chain-of-Thought Compression	Seyedarmin Azizi et.al.	2507.04742	null
2025-07-07	ChipSeek-R1: Generating Human-Surpassing RTL with LLM via Hierarchical Reward-Driven Reinforcement Learning	Zhirong Chen et.al.	2507.04736	null
2025-07-07	An analysis of vision-language models for fabric retrieval	Francesco Giuliari et.al.	2507.04735	null
2025-07-07	"This Suits You the Best": Query Focused Comparative Explainable Summarization	Arnav Attri et.al.	2507.04733	null
2025-07-07	Who's the Mole? Modeling and Detecting Intention-Hiding Malicious Agents in LLM-Based Multi-Agent Systems	Yizhe Xie et.al.	2507.04724	null
2025-07-07	LOOM-Scope: a comprehensive and efficient LOng-cOntext Model evaluation framework	Zecheng Tang et.al.	2507.04723	null
2025-07-07	Geometric-Guided Few-Shot Dental Landmark Detection with Human-Centric Foundation Model	Anbang Wang et.al.	2507.04710	null
2025-07-07	Why We Feel What We Feel: Joint Detection of Emotions and Their Opinion Triggers in E-commerce	Arnav Attri et.al.	2507.04708	null
2025-07-07	Tempo-R0: A Video-MLLM for Temporal Video Grounding through Efficient Temporal Sensing Reinforcement Learning	Feng Yue et.al.	2507.04702	null
2025-07-07	XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL	Yifu Liu et.al.	2507.04701	null
2025-07-07	A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets	Zexi Jia et.al.	2507.04699	null
2025-07-07	Performance Evaluation of General Purpose Large Language Models for Basic Linear Algebra Subprograms Code Generation	Daichi Mukunoki et.al.	2507.04697	null
2025-07-07	AKEGEN: A LLM-based Tabular Corpus Generator for Evaluating Dataset Discovery in Data Lakes	Zhenwei Dai et.al.	2507.04687	null
2025-07-07	ChangeBridge: Spatiotemporal Image Generation with Multimodal Controls for Remote Sensing	Zhenghui Zhao et.al.	2507.04678	null
2025-07-07	VectorLLM: Human-like Extraction of Structured Building Contours vis Multimodal LLMs	Tao Zhang et.al.	2507.04664	null
2025-07-07	MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding	Zhicheng Zhang et.al.	2507.04635	null
2025-07-07	Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?	Yun Qu et.al.	2507.04632	null
2025-07-07	Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts	Yun Wang et.al.	2507.04631	null
2025-07-07	Heterogeneous User Modeling for LLM-based Recommendation	Honghui Bao et.al.	2507.04626	null
2025-07-07	Knowledge-Aware Self-Correction in Language Models via Structured Memory Graphs	Swayamjit Saha et.al.	2507.04625	null
2025-07-07	Hierarchical Intent-guided Optimization with Pluggable LLM-Driven Semantics for Session-based Recommendation	Jinpeng Chen et.al.	2507.04623	null
2025-07-07	Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences	Yusong Zhang et.al.	2507.04621	null
2025-07-07	any4: Learned 4-bit Numeric Representation for LLMs	Mostafa Elhoushi et.al.	2507.04610	null
2025-07-07	PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes	Xinliang Frederick Zhang et.al.	2507.04607	null
2025-07-07	QR-LoRA: Efficient and Disentangled Fine-tuning via QR Decomposition for Customized Generation	Jiahui Yang et.al.	2507.04599	null
2025-07-06	Evaluating LLMs on Real-World Forecasting Against Human Superforecasters	Janna Lu et.al.	2507.04562	null
2025-07-06	MambaVideo for Discrete Video Tokenization with Channel-Split Quantization	Dawit Mureja Argaw et.al.	2507.04559	null
2025-07-06	Self-supervised learning of speech representations with Dutch archival data	Nik Vaessen et.al.	2507.04554	null
2025-07-06	Greedy Dynamic Matching	Nick Arnosti et.al.	2507.04551	null
2025-07-06	DP-Fusion: Token-Level Differentially Private Inference for Large Language Models	Rushil Thareja et.al.	2507.04531	null
2025-07-06	DOTResize: Reducing LLM Width via Discrete Optimal Transport-based Neuron Merging	Neha Verma et.al.	2507.04517	null
2025-07-06	Unveiling the Potential of Diffusion Large Language Model in Controllable Generation	Zhen Xiong et.al.	2507.04504	null
2025-07-06	A validity-guided workflow for robust large language model research in psychology	Zhicheng Lin et.al.	2507.04491	null
2025-07-06	Source Attribution in Retrieval-Augmented Generation	Ikhtiyor Nematov et.al.	2507.04480	null
2025-07-06	Model Inversion Attacks on Llama 3: Extracting PII from Large Language Models	Sathesh P. Sivashanmugam et.al.	2507.04478	null
2025-07-06	The role of large language models in UI/UX design: A systematic literature review	Ammar Ahmed et.al.	2507.04469	null
2025-07-06	GradOT: Training-free Gradient-preserving Offsite-tuning for Large Language Models	Kai Yao et.al.	2507.04455	null
2025-07-06	ESSA: Evolutionary Strategies for Scalable Alignment	Daria Korotyshova et.al.	2507.04453	null
2025-07-03	MultiGen: Using Multimodal Generation in Simulation to Learn Multimodal Policies in Real	Renhao Wang et.al.	2507.02864	null
2025-07-03	RefTok: Reference-Based Tokenization for Video Generation	Xiang Fan et.al.	2507.02862	null
2025-07-03	Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching	Xin Zhou et.al.	2507.02860	null
2025-07-03	Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation	Jiaer Xia et.al.	2507.02859	null
2025-07-03	Requirements Elicitation Follow-Up Question Generation	Yuchen Shen et.al.	2507.02858	null
2025-07-03	MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs	Purbesh Mitra et.al.	2507.02851	null
2025-07-03	Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection	Ziqi Miao et.al.	2507.02844	null
2025-07-03	LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding	Yuchen Ma et.al.	2507.02843	null
2025-07-03	StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason	Kaiyi Zhang et.al.	2507.02841	null
2025-07-03	ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning	Ruiyang Zhou et.al.	2507.02834	null
2025-07-03	SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model	Wencheng Zhang et.al.	2507.02822	null
2025-07-03	Multimodal Mathematical Reasoning with Diverse Solving Perspective	Wenhao Shi et.al.	2507.02804	null
2025-07-03	Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models	Riccardo Cantini et.al.	2507.02799	null
2025-07-03	No time to train! Training-Free Reference-Based Instance Segmentation	Miguel Espinosa et.al.	2507.02798	null
2025-07-03	From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding	Xiangfeng Wang et.al.	2507.02790	null
2025-07-03	Moral Responsibility or Obedience: What Do We Want from AI?	Joseph Boland et.al.	2507.02788	null
2025-07-03	Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs	Ken Tsui et.al.	2507.02778	null
2025-07-03	KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs	Yuzhang Xie et.al.	2507.02773	null
2025-07-03	Grounding Intelligence in Movement	Melanie Segado et.al.	2507.02771	null
2025-07-03	DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment	Ke-Han Lu et.al.	2507.02768	null
2025-07-03	Knowledge Protocol Engineering: A New Paradigm for AI in Domain-Specific Knowledge Work	Guangwei Zhang et.al.	2507.02760	null
2025-07-03	Fast and Simplex: 2-Simplicial Attention in Triton	Aurko Roy et.al.	2507.02754	null
2025-07-03	Who's Sorry Now: User Preferences Among Rote, Empathic, and Explanatory Apologies from LLM Chatbots	Zahra Ashktorab et.al.	2507.02745	null
2025-07-03	Prompt learning with bounding box constraints for medical image segmentation	Mélanie Gaillochet et.al.	2507.02743	null
2025-07-03	Early Signs of Steganographic Capabilities in Frontier LLMs	Artur Zolkowski et.al.	2507.02737	null
2025-07-03	Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving	Matthieu Zimmer et.al.	2507.02726	null
2025-07-03	On the Convergence of Large Language Model Optimizer for Black-Box Network Management	Hoon Lee et.al.	2507.02689	null
2025-07-03	Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs	Francesco Di Salvo et.al.	2507.02671	null
2025-07-03	AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models	Ziyin Zhou et.al.	2507.02664	null
2025-07-03	Hey AI, Generate Me a Hardware Code! Agentic AI-based Hardware Design & Verification	Deepak Narayan Gadde et.al.	2507.02660	null
2025-07-03	Medical Data Pecking: A Context-Aware Approach for Automated Quality Evaluation of Structured Medical Data	Irena Girshovitz et.al.	2507.02628	null
2025-07-03	VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning	Siran Chen et.al.	2507.02626	null
2025-07-03	FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference	Xing Liu et.al.	2507.02620	null
2025-07-03	Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory	Kenneth Payne et.al.	2507.02618	null
2025-07-03	DynamiCare: A Dynamic Multi-Agent Framework for Interactive and Open-Ended Medical Decision-Making	Tianqi Shang et.al.	2507.02616	null
2025-07-03	De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks	Wei Fan et.al.	2507.02606	null
2025-07-03	MPF: Aligning and Debiasing Language Models post Deployment via Multi Perspective Fusion	Xin Guan et.al.	2507.02595	null
2025-07-03	Revisiting Active Learning under (Human) Label Variation	Cornelia Gruber et.al.	2507.02593	null
2025-07-03	Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning	Buzhen Huang et.al.	2507.02565	null
2025-07-03	LLMREI: Automating Requirements Elicitation Interviews with LLMs	Alexander Korn et.al.	2507.02564	null
2025-07-03	Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability	Luca Baroni et.al.	2507.02559	null
2025-07-03	Clarifying Before Reasoning: A Coq Prover with Structural Context	Yanzhen Lu et.al.	2507.02541	null
2025-07-03	Are You Listening to Me? Fine-Tuning Chatbots for Empathetic Dialogue	Paulo Ricardo Knob et.al.	2507.02537	null
2025-07-03	Meta-Fair: AI-Assisted Fairness Testing of Large Language Models	Miguel Romero-Arjona et.al.	2507.02533	null
2025-07-03	Open-Source System for Multilingual Translation and Cloned Speech Synthesis	Mateo Cámara et.al.	2507.02530	null
2025-07-03	RetrySQL: text-to-SQL training with retry data for self-correcting query generation	Alicja Rączkowska et.al.	2507.02529	null
2025-07-03	Continual Gradient Low-Rank Projection Fine-Tuning for LLMs	Chenxu Wang et.al.	2507.02503	null
2025-07-03	CrowdTrack: A Benchmark for Difficult Multiple Pedestrian Tracking in Real Scenarios	Teng Fu et.al.	2507.02479	null
2025-07-03	System-performance and cost modeling of Large Language Model training and inference	Wenzhe Guo et.al.	2507.02456	null
2025-07-03	Introducing a New Brexit-Related Uncertainty Index: Its Evolution and Economic Consequences	Ismet Gocer et.al.	2507.02439	null
2025-07-03	Toward a Robust and Generalizable Metamaterial Foundation Model	Namjung Kim et.al.	2507.02436	null
2025-07-03	Improving Consistency in Vehicle Trajectory Prediction Through Preference Optimization	Caio Azevedo et.al.	2507.02406	null
2025-07-03	Evaluating Language Models For Threat Detection in IoT Security Logs	Jorge J. Tejero-Fernández et.al.	2507.02390	null
2025-07-03	JoyTTS: LLM-based Spoken Chatbot With Voice Cloning	Fangru Zhou et.al.	2507.02380	null
2025-07-03	Efficient Code LLM Training via Distribution-Consistent and Diversity-Aware Data Selection	Weijie Lyu et.al.	2507.02378	null
2025-07-03	UVLM: Benchmarking Video Language Model for Underwater World Understanding	Xizhe Xue et.al.	2507.02373	null
2025-07-03	Holistic Tokenizer for Autoregressive Image Generation	Anlin Zheng et.al.	2507.02358	null
2025-07-03	Coling-UniA at SciVQA 2025: Few-Shot Example Retrieval and Confidence-Informed Ensembling for Multimodal Large Language Models	Christian Jaumann et.al.	2507.02357	null
2025-07-03	DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning	Dohoon Kim et.al.	2507.02302	null
2025-07-03	Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization	De Cheng et.al.	2507.02288	null
2025-07-03	Misaligned from Within: Large Language Models Reproduce Our Double-Loop Learning Blindness	Tim Rogers et.al.	2507.02283	null
2025-07-03	Content filtering methods for music recommendation: A review	Terence Zeng et.al.	2507.02282	null
2025-07-03	LaCo: Efficient Layer-wise Compression of Visual Tokens for Multimodal Large Language Models	Juntao Liu et.al.	2507.02279	null
2025-07-03	NLP4Neuro: Sequence-to-sequence learning for neural population decoding	Jacob J. Morra et.al.	2507.02264	null
2025-07-03	Uncertainty-aware Reward Design Process	Yang Yang et.al.	2507.02256	null
2025-07-03	Listwise Preference Alignment Optimization for Tail Item Recommendation	Zihao Li et.al.	2507.02255	null
2025-07-03	Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation	Jungkoo Kang et.al.	2507.02253	null
2025-07-03	SurgVisAgent: Multimodal Agentic Model for Versatile Surgical Visual Enhancement	Zeyu Lei et.al.	2507.02252	null
2025-07-03	VERBA: Verbalizing Model Differences Using Large Language Models	Shravan Doda et.al.	2507.02241	null
2025-07-03	DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs	Mohammad Akyash et.al.	2507.02226	null
2025-07-03	GDC Cohort Copilot: An AI Copilot for Curating Cohorts from the Genomic Data Commons	Steven Song et.al.	2507.02221	null
2025-07-02	ESTR-CoT: Towards Explainable and Accurate Event Stream based Scene Text Recognition with Chain-of-Thought Reasoning	Xiao Wang et.al.	2507.02200	null
2025-07-02	EvalAssist: A Human-Centered Tool for LLM-as-a-Judge	Zahra Ashktorab et.al.	2507.02186	null
2025-07-02	Computer Science Education in the Age of Generative AI	Russell Beale et.al.	2507.02183	null
2025-07-02	Enhancing COBOL Code Explanations: A Multi-Agents Approach Using Large Language Models	Fangjian Lei et.al.	2507.02182	null
2025-07-02	The Revolution Has Arrived: What the Current State of Large Language Models in Education Implies for the Future	Russell Beale et.al.	2507.02180	null
2025-07-02	Data Diversification Methods In Alignment Enhance Math Performance In LLMs	Berkan Dokmeci et.al.	2507.02173	null
2025-07-02	Reasoning or Not? A Comprehensive Evaluation of Reasoning LLMs for Dialogue Summarization	Keyan Jin et.al.	2507.02145	null
2025-07-02	When LLMs Disagree: Diagnosing Relevance Filtering Bias and Retrieval Divergence in SDG Search	William A. Ingram et.al.	2507.02139	null
2025-07-02	Dissecting the Impact of Mobile DVFS Governors on LLM Inference Performance and Energy Efficiency	Zongpu Zhang et.al.	2507.02135	null
2025-07-02	BACTA-GPT: An AI-Based Bayesian Adaptive Clinical Trial Architect	Krishna Padmanabhan et.al.	2507.02130	null
2025-07-02	Generative Latent Diffusion for Efficient Spatiotemporal Data Reduction	Xiao Li et.al.	2507.02129	null
2025-07-02	CROP: Circuit Retrieval and Optimization with Parameter Guidance using LLMs	Jingyu Pan et.al.	2507.02128	null
2025-07-02	SAKURAONE: Empowering Transparent and Open AI Platforms through Private-Sector HPC Investment in Japan	Fumikazu Konishi et.al.	2507.02124	null
2025-07-02	PAL: Designing Conversational Agents as Scalable, Cooperative Patient Simulators for Palliative-Care Training	Neil K. R. Sehgal et.al.	2507.02122	null
2025-07-02	What Neuroscience Can Teach AI About Learning in Continuously Changing Environments	Daniel Durstewitz et.al.	2507.02103	null
2025-07-02	The Future is Agentic: Definitions, Perspectives, and Open Challenges of Multi-Agent Recommender Systems	Reza Yousefi Maragheh et.al.	2507.02097	null
2025-07-02	Sample Complexity Bounds for Linear Constrained MDPs with a Generative Model	Xingtu Liu et.al.	2507.02089	null
2025-07-02	McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models	Tian Lan et.al.	2507.02088	null
2025-07-02	Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions	Eitan Anzenberg et.al.	2507.02087	null
2025-07-02	Measuring Scientific Capabilities of Language Models with a Systems Biology Dry Lab	Haonan Duan et.al.	2507.02083	null
2025-07-02	Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs	Mohammad Ali Alomrani et.al.	2507.02076	null
2025-07-02	Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges	Sanjeda Akter et.al.	2507.02074	null
2025-07-02	MGC: A Compiler Framework Exploiting Compositional Blindness in Aligned LLMs for Malware Generation	Lu Yan et.al.	2507.02057	null
2025-07-02	How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks	Rahul Ramachandran et.al.	2507.01955	null
2025-07-02	Test-Time Scaling with Reflective Generative Model	Zixiao Wang et.al.	2507.01951	null
2025-07-02	Kwai Keye-VL Technical Report	Kwai Keye Team et.al.	2507.01949	null
2025-07-02	LongAnimation: Long Animation Generation with Dynamic Global-Local Memory	Nan Chen et.al.	2507.01945	null
2025-07-02	SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars	Xiaosheng Zhao et.al.	2507.01939	null
2025-07-02	The Thin Line Between Comprehension and Persuasion in LLMs	Adrian de Wynter et.al.	2507.01936	null
2025-07-02	Large Language Model-Driven Closed-Loop UAV Operation with Semantic Observations	Wenhao Wang et.al.	2507.01930	null
2025-07-02	A Survey on Vision-Language-Action Models: An Action Tokenization Perspective	Yifan Zhong et.al.	2507.01925	null
2025-07-02	Decision-oriented Text Evaluation	Yu-Shiang Huang et.al.	2507.01923	null
2025-07-02	Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models	Chengao Li et.al.	2507.01915	null
2025-07-02	Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning	Qingdong He et.al.	2507.01908	null
2025-07-02	AI4Research: A Survey of Artificial Intelligence for Scientific Research	Qiguang Chen et.al.	2507.01903	null
2025-07-02	High-Layer Attention Pruning with Rescaling	Songtao Liu et.al.	2507.01900	null
2025-07-02	MiCoTA: Bridging the Learnability Gap with Intermediate CoT and Teacher Assistants	Dongyi Ding et.al.	2507.01887	null
2025-07-02	Improving GANs by leveraging the quantum noise from real hardware	Hongni Jin et.al.	2507.01886	null
2025-07-02	A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs	Niccolò McConnell et.al.	2507.01881	null
2025-07-02	Towards Foundation Auto-Encoders for Time-Series Anomaly Detection	Gastón García González et.al.	2507.01875	null
2025-07-02	DIY-MKG: An LLM-Based Polyglot Language Learning System	Kenan Tang et.al.	2507.01872	null
2025-07-02	Bridging UI Design and chatbot Interactions: Applying Form-Based Principles to Conversational Agents	Sanjay Krishna Anbalagan et.al.	2507.01862	null
2025-07-02	TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types	Yuhao Lin et.al.	2507.01857	null
2025-07-02	Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages	Samridhi Raj Sinha et.al.	2507.01853	null
2025-07-02	Low-Perplexity LLM-Generated Sequences and Where To Find Them	Arthur Wuhrmann et.al.	2507.01844	null
2025-07-02	Out-of-Distribution Detection Methods Answer the Wrong Questions	Yucen Lily Li et.al.	2507.01831	null
2025-07-02	APRMCTS: Improving LLM-based Automated Program Repair with Iterative Tree Search	Haichuan Hu et.al.	2507.01827	null
2025-07-02	LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs	Reza Arabpour et.al.	2507.01806	null
2025-07-02	Towards Decentralized and Sustainable Foundation Model Training with the Edge	Leyang Xue et.al.	2507.01803	null
2025-07-02	HCNQA: Enhancing 3D VQA with Hierarchical Concentration Narrowing Supervision	Shengli Zhou et.al.	2507.01800	null
2025-07-02	Robust brain age estimation from structural MRI with contrastive learning	Carlo Alberto Barbano et.al.	2507.01794	null
2025-07-02	Machine learning prediction of a chemical reaction over 8 decades of energy	Daniel Julian et.al.	2507.01793	null
2025-07-02	FreeLoRA: Enabling Training-Free LoRA Fusion for Autoregressive Multi-Subject Personalization	Peng Zheng et.al.	2507.01792	null
2025-07-02	MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining	Zhixun Chen et.al.	2507.01785	null
2025-07-02	Frontiers of Generative AI for Network Optimization: Theories, Limits, and Visions	Bo Yang et.al.	2507.01773	null
2025-07-02	Enhanced Generative Model Evaluation with Clipped Density and Coverage	Nicolas Salvy et.al.	2507.01761	null
2025-07-02	Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis	Peng Zheng et.al.	2507.01756	null
2025-07-02	Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training	Ismail Labiad et.al.	2507.01752	null
2025-07-02	LLMs for Legal Subsumption in German Employment Contracts	Oliver Wardas et.al.	2507.01734	null
2025-07-02	Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach	Hao Wei et.al.	2507.01728	null
2025-07-02	Generative flow-based warm start of the variational quantum eigensolver	Hang Zou et.al.	2507.01726	null
2025-07-02	Agent Ideate: A Framework for Product Idea Generation from Patents Using Agentic AI	Gopichand Kanumolu et.al.	2507.01717	null
2025-07-02	Generative modeling of convergence maps based on predicted one-point statistics	Vilasini Tinnaneri Sreekanth et.al.	2507.01707	null
2025-07-02	AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness	Zixin Chen et.al.	2507.01702	null
2025-07-02	Graph Representation-based Model Poisoning on Federated LLMs in CyberEdge Networks	Hanlin Cai et.al.	2507.01694	null
2025-07-02	GPT, But Backwards: Exactly Inverting Language Model Outputs	Adrians Skapars et.al.	2507.01693	null
2025-07-02	A generative modeling / Physics-Informed Neural Network approach to random differential equations	Georgios Arampatzis et.al.	2507.01687	null
2025-07-02	Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling	Zeyu Huang et.al.	2507.01679	null
2025-07-02	AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training	Zhenyu Han et.al.	2507.01663	null
2025-07-02	SAILViT: Towards Robust and Generalizable Visual Backbones for MLLMs via Gradual Feature Refinement	Weijie Yin et.al.	2507.01643	null
2025-07-02	DaiFu: In-Situ Crash Recovery for Deep Learning Systems	Zilong He et.al.	2507.01628	null
2025-07-02	Chart Question Answering from Real-World Analytical Narratives	Maeve Hutchinson et.al.	2507.01627	null
2025-07-02	Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems	Zhaoyan Sun et.al.	2507.01599	null
2025-07-02	Emotionally Intelligent Task-oriented Dialogue Systems: Architecture, Representation, and Optimisation	Shutong Feng et.al.	2507.01594	null
2025-07-02	A Gift from the Integration of Discriminative and Diffusion-based Generative Learning: Boundary Refinement Remote Sensing Semantic Segmentation	Hao Wang et.al.	2507.01573	null
2025-07-02	Self-Guided Process Reward Optimization with Masked Step Advantage for Process Reinforcement Learning	Wu Fei et.al.	2507.01551	null
2025-07-02	Crafting Hanzi as Narrative Bridges: An AI Co-Creation Workshop for Elderly Migrants	Wen Zhan et.al.	2507.01548	null
2025-07-02	MARVIS: Modality Adaptive Reasoning over VISualizations	Benjamin Feuer et.al.	2507.01544	null
2025-07-02	Is External Information Useful for Stance Detection with LLMs?	Quang Minh Nguyen et.al.	2507.01543	null
2025-07-02	Efficient Out-of-Scope Detection in Dialogue Systems via Uncertainty-Driven LLM Routing	Álvaro Zaera et.al.	2507.01541	null
2025-07-02	Loss Functions in Diffusion Models: A Comparative Study	Dibyanshu Kumar et.al.	2507.01516	null
2025-07-02	SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism	Beitao Chen et.al.	2507.01513	null
2025-07-02	AVC-DPO: Aligned Video Captioning via Direct Preference Optimization	Jiyang Tang et.al.	2507.01492	null
2025-07-02	Agent-as-Tool: A Study on the Hierarchical Decision Making with Reinforcement Learning	Yanfei Zhang et.al.	2507.01489	null
2025-07-02	BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments	Yibo Qiu et.al.	2507.01485	null
2025-07-02	Evaluating the Effectiveness of Direct Preference Optimization for Personalizing German Automatic Text Simplifications for Persons with Intellectual Disabilities	Yingqiang Gao et.al.	2507.01479	null
2025-07-02	Representation Entanglement for Generation:Training Diffusion Transformers Is Much Easier Than You Think	Ge Wu et.al.	2507.01467	null
2025-07-02	NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation	Max Gandyra et.al.	2507.01463	null
2025-07-02	Using multi-agent architecture to mitigate the risk of LLM hallucinations	Abd Elrahman Amer et.al.	2507.01446	null
2025-07-02	A Large Language Model for Chemistry and Retrosynthesis Predictions	Yueqing Zhang et.al.	2507.01444	null
2025-07-02	EdgeLoRA: An Efficient Multi-Tenant LLM Serving System on Edge Devices	Zheyu Shen et.al.	2507.01438	null
2025-07-02	Challenges & Opportunities with LLM-Assisted Visualization Retargeting	Luke S. Snyder et.al.	2507.01436	null
2025-07-02	Pensieve Grader: An AI-Powered, Ready-to-Use Platform for Effortless Handwritten STEM Grading	Yoonseok Yang et.al.	2507.01431	null
2025-07-02	TriVLA: A Unified Triple-System-Based Unified Vision-Language-Action Model for General Robot Control	Zhenyang Liu et.al.	2507.01424	null
2025-07-02	Evaluating LLM Agent Collusion in Double Auctions	Kushal Agrawal et.al.	2507.01413	null
2025-07-02	BronchoGAN: Anatomically consistent and domain-agnostic image-to-image translation for video bronchoscopy	Ahmad Soliman et.al.	2507.01387	null
2025-07-02	RALLY: Role-Adaptive LLM-Driven Yoked Navigation for Agentic UAV Swarms	Ziyao Wang et.al.	2507.01378	null
2025-07-02	AI Agents and Agentic AI-Navigating a Plethora of Concepts for Future Manufacturing	Yinwang Ren et.al.	2507.01376	null
2025-07-02	Activation Reward Models for Few-Shot Model Alignment	Tianning Chai et.al.	2507.01368	null
2025-07-02	Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy	Chris Yuhao Liu et.al.	2507.01352	null
2025-07-02	SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech	Cheng Zhuangfei et.al.	2507.01348	null
2025-07-02	LEDOM: An Open and Fundamental Reverse Language Model	Xunjian Yin et.al.	2507.01335	null
2025-07-02	Symbolic or Numerical? Understanding Physics Problem Solving in Reasoning LLMs	Nifu Dan et.al.	2507.01334	null
2025-07-02	Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy	Xiaoyun Zhang et.al.	2507.01327	null
2025-07-02	ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks	Zhiyao Ren et.al.	2507.01321	null
2025-07-02	La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation	Kai Liu et.al.	2507.01299	null
2025-07-02	Beyond Black-Box AI: Interpretable Hybrid Systems for Dementia Care	Matthew JY Kang et.al.	2507.01282	null
2025-07-02	Rethinking All Evidence: Enhancing Trustworthy Retrieval-Augmented Generation via Conflict-Driven Summarization	Juan Chen et.al.	2507.01281	null
2025-07-02	Evaluating Large Language Models for Multimodal Simulated Ophthalmic Decision-Making in Diabetic Retinopathy and Glaucoma Screening	Cindy Lie Tabuse et.al.	2507.01278	null
2025-07-02	AI Meets Maritime Training: Precision Analytics for Enhanced Safety and Performance	Vishakha Lall et.al.	2507.01274	null
2025-07-02	PULSE: Practical Evaluation Scenarios for Large Multimodal Model Unlearning	Tatsuki Kawakami et.al.	2507.01271	null
2025-07-02	LLM-based Realistic Safety-Critical Driving Video Generation	Yongjie Fu et.al.	2507.01264	null
2025-07-02	GAIus: Combining Genai with Legal Clauses Retrieval for Knowledge-based Assistant	Michał Matak et.al.	2507.01259	null
2025-07-01	Beyond First-Order: Training LLMs with Stochastic Conjugate Subgradients and AdamW	Di Zhang et.al.	2507.01241	null
2025-07-01	PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning	Xingke Yang et.al.	2507.01216	null
2025-07-01	2024 NASA SUITS Report: LLM-Driven Immersive Augmented Reality User Interface for Robotics and Space Exploration	Kathy Zhuang et.al.	2507.01206	null
2025-07-01	Escaping Platos Cave: JAM for Aligning Independently Trained Vision and Language Models	Hyoseo et.al.	2507.01201	null
2025-07-01	Are Large Brainwave Foundation Models Capable Yet? Insights from Fine-tuning	Na Lee et.al.	2507.01196	null
2025-07-01	FlashDP: Private Training Large Language Models with Efficient DP-SGD	Liangyu Wang et.al.	2507.01154	null
2025-07-01	SonoGym: High Performance Simulation for Challenging Surgical Tasks with Robotic Ultrasound	Yunke Ao et.al.	2507.01152	null
2025-07-01	Geometry-aware 4D Video Generation for Robot Manipulation	Zeyi Liu et.al.	2507.01099	null
2025-07-01	A theoretical prediction for the dipole in nearby distances using cosmography	Hayley J. Macpherson et.al.	2507.01095	null
2025-07-02	GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning	GLM-V Team et.al.	2507.01006	null
2025-07-01	Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives	Sixun Dong et.al.	2506.24124	null
2025-06-30	Calligrapher: Freestyle Text Image Customization	Yue Ma et.al.	2506.24123	null
2025-06-30	TextMesh4D: High-Quality Text-to-4D Mesh Generation	Sisi Dai et.al.	2506.24121	null
2025-06-30	Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime	Yuqing Wang et.al.	2506.24120	null
2025-06-30	DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World	Xiangtai Li et.al.	2506.24102	null
2025-06-30	Imagine for Me: Creative Conceptual Blending of Real Images and Text via Blended Attention	Wonwoong Cho et.al.	2506.24085	null
2025-06-30	Logit-Gap Steering: Efficient Short-Suffix Jailbreaks for Aligned Large Language Models	Tung-Ling Li et.al.	2506.24056	null
2025-06-30	Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC	Xinming Wei et.al.	2506.24045	null
2025-06-30	A Survey on Vision-Language-Action Models for Autonomous Driving	Sicong Jiang et.al.	2506.24044	null
2025-06-30	Foundation Models for Zero-Shot Segmentation of Scientific Images without AI-Ready Data	Shubhabrata Mukherjee et.al.	2506.24039	null
2025-06-30	Minimally dissipative multi-bit logical operations	Jérémie Klinger et.al.	2506.24021	null
2025-06-30	Ella: Embodied Social Agents with Lifelong Memory	Hongxin Zhang et.al.	2506.24019	null
2025-06-30	EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations	Hyunjong Kim et.al.	2506.24016	null
2025-06-30	Large Language Models Don't Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective	Anselm R. Strohmaier et.al.	2506.24006	null
2025-06-30	Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning	Seungjun Yi et.al.	2506.23998	null
2025-06-30	TaP: A Taxonomy-Guided Framework for Automated and Scalable Preference Data Generation	Renren Jin et.al.	2506.23979	null
2025-06-30	Visual and Memory Dual Adapter for Multi-Modal Object Tracking	Boyue Xu et.al.	2506.23972	null
2025-06-30	UMA: A Family of Universal Models for Atoms	Brandon M. Wood et.al.	2506.23971	null
2025-06-30	Unveiling Decision-Making in LLMs for Text Classification : Extraction of influential and interpretable concepts with Sparse Autoencoders	Mathis Le Bail et.al.	2506.23951	null
2025-06-30	AI Risk-Management Standards Profile for General-Purpose AI (GPAI) and Foundation Models	Anthony M. Barrett et.al.	2506.23949	null
2025-07-01	Graft: Integrating the Domain Knowledge via Efficient Parameter Synergy for MLLMs	Yang Dai et.al.	2506.23940	null
2025-06-30	Leveraging the Potential of Prompt Engineering for Hate Speech Detection in Low-Resource Languages	Ruhina Tabasshum Prome et.al.	2506.23930	null
2025-06-30	IMPACT: Inflectional Morphology Probes Across Complex Typologies	Mohammed J. Saeed et.al.	2506.23929	null
2025-06-30	Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice	Akshit Kumar et.al.	2506.23924	null
2025-06-30	The Trilemma of Truth in Large Language Models	Germans Savcisens et.al.	2506.23921	null
2025-06-30	World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation	Haonan Chen et.al.	2506.23919	null
2025-06-30	Advancing Multi-Step Mathematical Reasoning in Large Language Models through Multi-Layered Self-Reflection with Auto-Prompting	André de Souza Loureiro et.al.	2506.23888	null
2025-06-30	Scaling Self-Supervised Representation Learning for Symbolic Piano Performance	Louis Bradshaw et.al.	2506.23869	null
2025-06-30	Large Language Models for Statistical Inference: Context Augmentation with Applications to the Two-Sample Problem and Regression	Marc Ratkovic et.al.	2506.23862	null
2025-06-30	Email as the Interface to Generative AI Models: Seamless Administrative Automation	Andres Navarro et.al.	2506.23850	null
2025-06-30	A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents	Hang Su et.al.	2506.23844	null
2025-06-30	Refine Any Object in Any Scene	Ziwei Chen et.al.	2506.23835	null
2025-06-30	Towards the "Digital Me": A vision of authentic Conversational Agents powered by personal Human Digital Twins	Lluís C. Coll et.al.	2506.23826	null
2025-06-30	Flash-VStream: Efficient Real-Time Understanding for Long Video Streams	Haoji Zhang et.al.	2506.23825	null
2025-07-01	The Impact of AI on Educational Assessment: A Framework for Constructive Alignment	Patrick Stokkink et.al.	2506.23815	null
2025-06-30	Leveraging a Multi-Agent LLM-Based System to Educate Teachers in Hate Incidents Management	Ewelina Gajewska et.al.	2506.23774	null
2025-06-30	Software Engineering for Large Language Models: Research Status, Challenges and the Road Ahead	Hongzhou Rao et.al.	2506.23762	null
2025-06-30	A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications	Boyang Yang et.al.	2506.23749	null
2025-07-01	Positional Bias in Binary Question Answering: How Uncertainty Shapes Model Preferences	Tiziano Labruna et.al.	2506.23743	null
2025-06-30	AutoEvoEval: An Automated Framework for Evolving Close-Ended LLM Evaluation Data	JiaRu Wu et.al.	2506.23735	null
2025-06-30	Radioactive Watermarks in Diffusion and Autoregressive Image Generative Models	Michel Meintz et.al.	2506.23731	null
2025-06-30	System-Embedded Diffusion Bridge Models	Bartlomiej Sobieski et.al.	2506.23726	null
2025-06-30	PAC Bench: Do Foundation Models Understand Prerequisites for Executing Manipulation Policies?	Atharva Gundawar et.al.	2506.23725	null
2025-06-30	MDPG: Multi-domain Diffusion Prior Guidance for MRI Reconstruction	Lingtong Zhang et.al.	2506.23701	null
2025-06-30	MedSAM-CA: A CNN-Augmented ViT with Attention-Enhanced Multi-Scale Fusion for Medical Image Segmentation	Peiting Tian et.al.	2506.23700	null
2025-06-30	If You Had to Pitch Your Ideal Software -- Evaluating Large Language Models to Support User Scenario Writing for User Experience Experts and Laypersons	Patrick Stadler et.al.	2506.23694	null
2025-06-30	Agent4S: The Transformation of Research Paradigms from the Perspective of Large Language Models	Boyuan Zheng et.al.	2506.23692	null
2025-06-30	SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation	Shuai Tan et.al.	2506.23690	null
2025-06-30	PokéAI: A Goal-Generating, Battle-Optimizing Multi-agent System for Pokemon Red	Zihao Liu et.al.	2506.23689	null
2025-06-30	Interactive Reasoning: Visualizing and Controlling Chain-of-Thought Reasoning in Large Language Models	Rock Yuren Pang et.al.	2506.23678	null
2025-06-30	Efficient Interleaved Speech Modeling through Knowledge Distillation	Mohammadmahdi Nouriborji et.al.	2506.23670	null
2025-06-30	L0: Reinforcement Learning to Become General Agents	Junjie Zhang et.al.	2506.23667	null
2025-06-30	On the Domain Robustness of Contrastive Vision-Language Models	Mario Koddenbrock et.al.	2506.23663	null
2025-06-30	Multiscale Turbulence Synthesis: Validation in 2D Hydrodynamics	Pierre Lesaffre et.al.	2506.23659	null
2025-06-30	Act-With-Think: Chunk Auto-Regressive Modeling for Generative Recommendation	Yifan Wang et.al.	2506.23643	null
2025-06-30	VAP-Diffusion: Enriching Descriptions with MLLMs for Enhanced Medical Image Generation	Peng Huang et.al.	2506.23641	null
2025-06-30	Unified Multimodal Understanding via Byte-Pair Visual Encoding	Wanpeng Zhang et.al.	2506.23639	null
2025-06-30	Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model	Mu-Chi Chen et.al.	2506.23635	null
2025-06-30	TurboVSR: Fantastic Video Upscalers and Where to Find Them	Zhongdao Wang et.al.	2506.23618	null
2025-06-30	Evaluating the Simulation of Human Personality-Driven Susceptibility to Misinformation with LLMs	Manuel Pratelli et.al.	2506.23610	null
2025-06-30	PGOV3D: Open-Vocabulary 3D Semantic Segmentation with Partial-to-Global Curriculum	Shiqi Zhang et.al.	2506.23607	null
2025-06-30	SG-LDM: Semantic-Guided LiDAR Generation via Latent-Aligned Diffusion	Zhengkang Xiang et.al.	2506.23606	null
2025-06-30	AI-Generated Lecture Slides for Improving Slide Element Detection and Retrieval	Suyash Maniyar et.al.	2506.23605	null
2025-06-30	SoK: Semantic Privacy in Large Language Models	Baihe Ma et.al.	2506.23603	null
2025-06-30	Semantic-guided Diverse Decoding for Large Language Model	Weijie Shi et.al.	2506.23601	null
2025-06-30	Transition Matching: Scalable and Flexible Generative Modeling	Neta Shaul et.al.	2506.23589	null
2025-06-30	Dataset Distillation via Vision-Language Category Prototype	Yawen Zou et.al.	2506.23580	null
2025-06-30	Evaluating Multi-Agent Defences Against Jailbreaking Attacks on Large Language Models	Maria Carolina Cornelia Wit et.al.	2506.23576	null
2025-06-30	MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI	Huanjin Yao et.al.	2506.23563	null
2025-06-30	JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching	Mingi Kwon et.al.	2506.23552	null
2025-06-30	Neural Langevin Machine: a local asymmetric learning rule can be creative	Zhendong Yu et.al.	2506.23546	null
2025-06-30	Comparative Analysis of the Code Generated by Popular Large Language Models (LLMs) for MISRA C++ Compliance	Malik Muhammad Umer et.al.	2506.23535	null
2025-06-30	On Recipe Memorization and Creativity in Large Language Models: Is Your Model a Creative Cook, a Bad Cook, or Merely a Plagiator?	Jan Kvapil et.al.	2506.23527	null
2025-06-30	NEU-ESC: A Comprehensive Vietnamese dataset for Educational Sentiment analysis and topic Classification toward multitask learning	Phan Quoc Hung Mai et.al.	2506.23524	null
2025-07-01	ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data	Yu Zhang et.al.	2506.23520	null
2025-06-30	Reinforcement Fine-Tuning Enables MLLMs Learning Novel Tasks Stably	Zhihao Zhang et.al.	2506.23508	null
2025-06-30	LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching	Mengxiao Tian et.al.	2506.23502	null
2025-06-30	Thought-Augmented Planning for LLM-Powered Interactive Recommender Agent	Haocheng Yu et.al.	2506.23485	null
2025-06-30	MTADiffusion: Mask Text Alignment Diffusion Model for Object Inpainting	Jun Huang et.al.	2506.23482	null
2025-06-30	Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks	Xian Zhang et.al.	2506.23481	null
2025-06-30	What to Keep and What to Drop: Adaptive Table Filtering Framework	Jang Won June et.al.	2506.23463	null
2025-06-30	Can We Predict the Unpredictable? Leveraging DisasterNet-LLM for Multimodal Disaster Classification	Manaswi Kulahara et.al.	2506.23462	null
2025-06-30	General Signal Model and Capacity Limit for Rydberg Quantum Information System	Jieao Zhu et.al.	2506.23455	null
2025-06-30	PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions	Mahesh Bhosale et.al.	2506.23440	null
2025-06-29	TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs	Felipe Nuti et.al.	2506.23423	null
2025-06-29	Datasets for Fairness in Language Models: An In-Depth Survey	Jiale Zhang et.al.	2506.23411	null
2025-06-29	Do LLMs Dream of Discrete Algorithms?	Claudionor Coelho Jr et.al.	2506.23408	null
2025-06-29	Perspective Dial: Measuring Perspective of Text and Guiding LLM Outputs	Taejin Kim et.al.	2506.23377	null
2025-06-29	Federated Timeline Synthesis: Scalable and Private Methodology For Model Training and Deployment	Pawel Renc et.al.	2506.23358	null
2025-06-29	GeoProg3D: Compositional Visual Reasoning for City-Scale 3D Language Fields	Shunsuke Yasuki et.al.	2506.23352	null
2025-06-29	ATGen: A Framework for Active Text Generation	Akim Tsvigun et.al.	2506.23342	null
2025-06-29	Information Loss in LLMs' Multilingual Translation: The Role of Training Data, Language Proximity, and Language Family	Yumeng Lin et.al.	2506.23340	null
2025-06-29	VALID-Mol: a Systematic Framework for Validated LLM-Assisted Molecular Design	Malikussaid et.al.	2506.23339	null
2025-06-29	XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs	Yitian Gong et.al.	2506.23325	null
2025-06-29	GATSim: Urban Mobility Simulation with Generative Agents	Qi Liu et.al.	2506.23306	null
2025-07-01	Exposing and Mitigating Calibration Biases and Demographic Unfairness in MLLM Few-Shot In-Context Learning for Medical Image Classification	Xing Shen et.al.	2506.23298	null
2025-06-29	Two Spelling Normalization Approaches Based on Large Language Models	Miguel Domingo et.al.	2506.23288	null
2025-06-29	MoMa: Modulating Mamba for Adapting Image Foundation Models to Video Recognition	Yuhuan Yang et.al.	2506.23283	null
2025-06-29	Autoregressive Denoising Score Matching is a Good Video Anomaly Detector	Hanwen Zhang et.al.	2506.23282	null
2025-06-29	Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games	David Guzman Piedrahita et.al.	2506.23276	null
2025-06-27	Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy	Yuhao Liu et.al.	2506.22432	null
2025-06-27	The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements	Bingchen Zhao et.al.	2506.22419	null
2025-06-27	HyperCLOVA X THINK Technical Report	NAVER Cloud HyperCLOVA X Team et.al.	2506.22403	null
2025-06-27	Refining Czech GEC: Insights from a Multi-Experiment Approach	Petr Pechman et.al.	2506.22402	null
2025-06-27	QuickSilver -- Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization	Danush Khanna et.al.	2506.22396	null
2025-06-27	What Makes ChatGPT Effective for Software Issue Resolution? An Empirical Study of Developer-ChatGPT Conversations in GitHub	Ramtin Ehsani et.al.	2506.22390	null
2025-06-27	Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment	Yue Zhang et.al.	2506.22385	null
2025-06-27	Probabilistic Optimality for Inference-time Scaling	Youkang Wang et.al.	2506.22376	null
2025-06-27	Towards Fair Rankings: Leveraging LLMs for Gender Bias Detection and Measurement	Maryam Mousavian et.al.	2506.22372	null
2025-06-27	Can Large Language Models Help Students Prove Software Correctness? An Experimental Study with Dafny	Carolina Carreira et.al.	2506.22370	null
2025-06-27	Concept-Level AI for Telecom: Moving Beyond Large Language Models	Viswanath Kumarskandpriya et.al.	2506.22359	null
2025-06-27	Optimal Estimation of Watermark Proportions in Hybrid AI-Human Texts	Xiang Li et.al.	2506.22343	null
2025-06-27	Evaluating Scoring Bias in LLM-as-a-Judge	Qingquan Li et.al.	2506.22316	null
2025-06-27	Detection of Personal Data in Structured Datasets Using a Large Language Model	Albert Agisha Ntwali et.al.	2506.22305	null
2025-06-27	Unfolding Generative Flows with Koopman Operators: Fast and Interpretable Sampling	Erkan Turan et.al.	2506.22304	null
2025-06-27	Rethinking Visual Token Reduction in LVLMs under Cross-modal Misalignment	Rui Xu et.al.	2506.22283	null
2025-06-27	Public Service Algorithm: towards a transparent, explainable, and scalable content curation for news content based on editorial values	Ahmad Mel et.al.	2506.22270	null
2025-06-27	Towards Operational Data Analytics Chatbots -- Virtual Knowledge Graph is All You Need	Junaid Ahmed Khan et.al.	2506.22267	null
2025-06-27	Projected Compression: Trainable Projection for Efficient Transformer Compression	Maciej Stefaniak et.al.	2506.22255	null
2025-06-27	Adapting University Policies for Generative AI: Opportunities, Challenges, and Policy Solutions in Higher Education	Russell Beale et.al.	2506.22231	null
2025-06-27	Cardiovascular disease classification using radiomics and geometric features from cardiac CT	Ajay Mittal et.al.	2506.22226	null
2025-06-27	Hybrid Generative Modeling for Incomplete Physics: Deep Grey-Box Meets Optimal Transport	Gurjeet Sangra Singh et.al.	2506.22204	null
2025-06-27	EFRame: Deeper Reasoning via Exploration-Filtering-Replay Reinforcement Learning Framework	Chen Wang et.al.	2506.22200	null
2025-06-27	Exploring Modularity of Agentic Systems for Drug Discovery	Laura van Weesep et.al.	2506.22189	null
2025-06-27	A Different Approach to AI Safety: Proceedings from the Columbia Convening on Openness in Artificial Intelligence and AI Safety	Camille François et.al.	2506.22183	null
2025-06-27	Training Language Model to Critique for Better Refinement	Tianshu Yu et.al.	2506.22157	null
2025-06-27	RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models	Ronald Fecso et.al.	2506.22149	null
2025-06-27	SAGE: Spliced-Audio Generated Data for Enhancing Foundational Models in Low-Resource Arabic-English Code-Switched Speech Recognition	Muhammad Umar Farooq et.al.	2506.22143	null
2025-06-27	Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs	Shaojie Zhang et.al.	2506.22139	null
2025-06-27	Reasoning in machine vision: learning to think fast and slow	Shaheer U. Saeed et.al.	2506.22075	null
2025-06-27	Query as Test: An Intelligent Driving Test and Data Storage Method for Integrated Cockpit-Vehicle-Road Scenarios	Shengyue Yao et.al.	2506.22068	null
2025-06-27	Lost at the Beginning of Reasoning	Baohao Liao et.al.	2506.22058	null
2025-06-27	Decoding Machine Translationese in English-Chinese News: LLMs vs. NMTs	Delu Kong et.al.	2506.22050	null
2025-06-27	GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling	Tianhao Chen et.al.	2506.22049	null
2025-06-27	Few-Shot Identity Adaptation for 3D Talking Heads via Global Gaussian Field	Hong Nie et.al.	2506.22044	null
2025-06-27	UniCA: Adapting Time Series Foundation Model to General Covariate-Aware Forecasting	Lu Han et.al.	2506.22039	null
2025-06-27	Can Peter Pan Survive MT? A Stylometric Study of LLMs, NMTs, and HTs in Children's Literature Translation	Delu Kong et.al.	2506.22038	null
2025-06-27	SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference	Yongchao He et.al.	2506.22033	null
2025-06-27	LMPVC and Policy Bank: Adaptive voice control for industrial robots with code generating LLMs and reusable Pythonic policies	Ossi Parikka et.al.	2506.22028	null
2025-06-27	RoboEnvision: A Long-Horizon Video Generation Model for Multi-Task Robot Manipulation	Liudi Yang et.al.	2506.22007	null
2025-06-27	LeanConjecturer: Automatic Generation of Mathematical Conjectures for Theorem Proving	Naoto Onda et.al.	2506.22005	null
2025-06-27	R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning	Biao Wang et.al.	2506.21980	null
2025-06-27	TASeg: Text-aware RGB-T Semantic Segmentation based on Fine-tuning Vision Foundation Models	Meng Yu et.al.	2506.21975	null
2025-06-27	Don't Trust Generative Agents to Mimic Communication on Social Networks Unless You Benchmarked their Empirical Realism	Simon Münker et.al.	2506.21974	null
2025-06-27	Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses	Mohamed Ahmed et.al.	2506.21972	null
2025-06-27	Using Large Language Models to Suggest Informative Prior Distributions in Bayesian Statistics	Michael A. Riegler et.al.	2506.21964	null
2025-06-27	PapersPlease: A Benchmark for Evaluating Motivational Values of Large Language Models Based on ERG Theory	Junho Myung et.al.	2506.21961	null
2025-06-27	Optimal Return-to-Go Guided Decision Transformer for Auto-Bidding in Advertisement	Hao Jiang et.al.	2506.21956	null
2025-06-27	Universal Modelling of Autocovariance Functions via Spline Kernels	Lachlan Astfalck et.al.	2506.21953	null
2025-06-27	CAL-RAG: Retrieval-Augmented Multi-Agent Generation for Content-Aware Layout Design	Najmeh Forouzandehmehr et.al.	2506.21934	null
2025-06-27	ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation	Reza Yousefi Maragheh et.al.	2506.21931	null
2025-06-27	A Survey of LLM Inference Systems	James Pan et.al.	2506.21901	null
2025-06-27	Bias, Accuracy, and Trust: Gender-Diverse Perspectives on Large Language Models	Aimen Gaba et.al.	2506.21898	null
2025-06-27	Exploring Task-Solving Paradigm for Generalized Cross-Domain Face Anti-Spoofing via Reinforcement Fine-Tuning	Fangling Jiang et.al.	2506.21895	null
2025-06-27	Integrating Multi-Modal Sensors: A Review of Fusion Techniques for Intelligent Vehicles	Chuheng Wei et.al.	2506.21885	null
2025-06-27	A Dual-Layered Evaluation of Geopolitical and Cultural Bias in LLMs	Sean Kim et.al.	2506.21881	null
2025-06-27	WildSpeech-Bench: Benchmarking Audio LLMs in Natural Speech Conversation	Jian Zhang et.al.	2506.21875	null
2025-06-27	On the Feasibility of Poisoning Text-to-Image AI Models via Adversarial Mislabeling	Stanley Wu et.al.	2506.21874	null
2025-06-27	Grounding-Aware Token Pruning: Recovering from Drastic Performance Drops in Visual Grounding Caused by Pruning	Tzu-Chun Chien et.al.	2506.21873	null
2025-06-27	RiverEcho: Real-Time Interactive Digital System for Ancient Yellow River Culture	Haofeng Wang et.al.	2506.21865	null
2025-06-27	DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE	Hang Shao et.al.	2506.21864	null
2025-06-27	LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs	Boyuan Sun et.al.	2506.21862	null
2025-06-27	SPADE: Spatial Transcriptomics and Pathology Alignment Using a Mixture of Data Experts for an Expressive Latent Space	Ekaterina Redekop et.al.	2506.21857	null
2025-06-27	Skill-Nav: Enhanced Navigation with Versatile Quadrupedal Locomotion via Waypoint Interface	Dewei Wang et.al.	2506.21853	null
2025-06-27	The Consistency Hypothesis in Uncertainty Quantification for Large Language Models	Quan Xiao et.al.	2506.21849	null
2025-06-27	Adversarial Threats in Quantum Machine Learning: A Survey of Attacks and Defenses	Archisman Ghosh et.al.	2506.21842	null
2025-06-27	PARSI: Persian Authorship Recognition via Stylometric Integration	Kourosh Shahnazari et.al.	2506.21840	null
2025-06-27	ProSAM: Enhancing the Robustness of SAM-based Visual Reference Segmentation with Probabilistic Prompts	Xiaoqi Wang et.al.	2506.21835	null
2025-06-27	TaleForge: Interactive Multimodal System for Personalized Story Creation	Minh-Loi Nguyen et.al.	2506.21832	null
2025-06-27	Few-Shot Segmentation of Historical Maps via Linear Probing of Vision Foundation Models	Rafael Sterzinger et.al.	2506.21826	null
2025-06-26	Exploring the change in scientific readability following the release of ChatGPT	Abdulkareem Alsudais et.al.	2506.21825	null
2025-06-26	Exploring the Structure of AI-Induced Language Change in Scientific English	Riley Galpin et.al.	2506.21817	null
2025-06-26	CAT-SG: A Large Dynamic Scene Graph Dataset for Fine-Grained Understanding of Cataract Surgery	Felix Holm et.al.	2506.21813	null
2025-06-26	Towards Transparent AI: A Survey on Explainable Large Language Models	Avash Palikhe et.al.	2506.21812	null
2025-06-26	CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation	Nicolas Bougie et.al.	2506.21805	null
2025-06-26	Multi-task parallelism for robust pre-training of graph foundation models on multi-source, multi-fidelity atomistic modeling data	Massimiliano Lupo Pasini et.al.	2506.21788	null
2025-06-26	MobiVerse: Scaling Urban Mobility Simulation with Hybrid Lightweight Domain-Specific Generator and Large Language Models	Yifan Liu et.al.	2506.21784	null
2025-06-26	Evaluating List Construction and Temporal Understanding capabilities of Large Language Models	Alexandru Dumitru et.al.	2506.21783	null
2025-06-26	M3PO: Massively Multi-Task Model-Based Policy Optimization	Aditya Narendra et.al.	2506.21782	null
2025-06-26	THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning?	Xin Wang et.al.	2506.21763	null
2025-06-26	(Fact) Check Your Bias	Eivind Morris Bakke et.al.	2506.21745	null
2025-06-26	Hierarchical Reasoning Model	Guan Wang et.al.	2506.21734	null
2025-06-26	Exploring Image Generation via Mutually Exclusive Probability Spaces and Local Correlation Hypothesis	Chenqiu Zhao et.al.	2506.21731	null
2025-06-26	FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering	Liangyu Zhong et.al.	2506.21710	null
2025-06-26	TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360° Panorama Generation	Hakan Çapuk et.al.	2506.21681	null
2025-06-26	Infrared foundations for quantum geometry I: Catalogue of totally symmetric rank-three field theories	Will Barker et.al.	2506.21662	null
2025-06-26	APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization	Minjie Hong et.al.	2506.21655	null
2025-06-26	Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test	Ziyue Li et.al.	2506.21551	null
2025-06-26	mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale	Xiaona Zhou et.al.	2506.21550	null
2025-06-26	SAM4D: Segment Anything in Camera and LiDAR Streams	Jianyun Xu et.al.	2506.21547	null
2025-06-26	PsyLite Technical Report	Fangjun Ding et.al.	2506.21536	null
2025-06-26	Exploring the Design Space of 3D MLLMs for CT Report Generation	Mohammed Baharoon et.al.	2506.21535	null
2025-06-26	"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets	Akshay Paruchuri et.al.	2506.21532	null
2025-06-26	Potemkin Understanding in Large Language Models	Marina Mancoridis et.al.	2506.21521	null
2025-06-26	Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge	Boyu Gou et.al.	2506.21506	null
2025-06-26	Bridging Offline and Online Reinforcement Learning for LLMs	Jack Lanchantin et.al.	2506.21495	null
2025-06-26	Global and Local Entailment Learning for Natural World Imagery	Srikumar Sastry et.al.	2506.21476	null
2025-06-26	Efficient and Reuseable Cloud Configuration Search Using Discovery Spaces	Michael Johnston et.al.	2506.21467	null
2025-06-26	ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing	Huadai Liu et.al.	2506.21448	null
2025-06-26	Controllable 3D Placement of Objects with Scene-Aware Diffusion Models	Mohamed Omran et.al.	2506.21446	null
2025-06-26	Text2Cypher Across Languages: Evaluating Foundational Models Beyond English	Makbule Gulcin Ozsoy et.al.	2506.21445	null
2025-06-26	Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset Evaluation	Sweta Banerjee et.al.	2506.21444	null
2025-06-26	Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection	Ali Şenol et.al.	2506.21443	null
2025-06-26	Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning	Prajwal Koirala et.al.	2506.21427	null
2025-06-26	XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation	Bowen Chen et.al.	2506.21416	null
2025-06-26	Distributed Cross-Channel Hierarchical Aggregation for Foundation Models	Aristeidis Tsaris et.al.	2506.21411	null
2025-06-26	Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference	Colin Samplawski et.al.	2506.21408	null
2025-06-26	TableMoE: Neuro-Symbolic Routing for Structured Expert Reasoning in Multimodal Table Understanding	Junwen Zhang et.al.	2506.21393	null
2025-06-26	Early Stopping Tabular In-Context Learning	Jaris Küken et.al.	2506.21387	null
2025-06-26	Leveraging LLM-Assisted Query Understanding for Live Retrieval-Augmented Generation	Guanting Dong et.al.	2506.21384	null
2025-06-26	Canonical Quantization of a Memristive Leaky Integrate-and-Fire Neuron Circuit	Dean Brand et.al.	2506.21363	null
2025-06-26	Structuralist Approach to AI Literary Criticism: Leveraging Greimas Semiotic Square for Large Language Models	Fangzhou Dong et.al.	2506.21360	null
2025-06-26	CoPa-SG: Dense Scene Graphs with Parametric and Proto-Relations	Julian Lorenz et.al.	2506.21357	null
2025-06-26	SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning	Melanie Rieff et.al.	2506.21355	null
2025-06-26	DynamicBench: Evaluating Real-Time Report Generation in Large Language Models	Jingyao Li et.al.	2506.21343	null
2025-06-26	Active Inference AI Systems for Scientific Discovery	Karthik Duraisamy et.al.	2506.21329	null
2025-06-26	Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts	Jiajie Yang et.al.	2506.21328	null
2025-06-26	DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images	Badri Vishal Kasuba et.al.	2506.21316	null
2025-06-26	Exploring Adapter Design Tradeoffs for Low Resource Music Generation	Atharva Mehta et.al.	2506.21298	null
2025-06-26	Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models	Bram Willemsen et.al.	2506.21294	null
2025-06-26	Small Encoders Can Rival Large Decoders in Detecting Groundedness	Istabrak Abbes et.al.	2506.21288	null
2025-06-26	Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning	Xin Xu et.al.	2506.21285	null
2025-06-26	Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution	Lukas Sablica et.al.	2506.21278	null
2025-06-26	HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context	Qize Yang et.al.	2506.21277	null
2025-06-26	Cat and Mouse -- Can Fake Text Generation Outpace Detector Systems?	Andrea McGlinchey et.al.	2506.21274	null
2025-06-26	DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster	Ji Qi et.al.	2506.21263	null
2025-06-26	Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents	Tianyi Men et.al.	2506.21252	null
2025-06-26	ACTLLM: Action Consistency Tuned Large Language Model	Jing Bi et.al.	2506.21250	null
2025-06-26	GANet-Seg: Adversarial Learning for Brain Tumor Segmentation with Hybrid Generative Models	Qifei Cui et.al.	2506.21245	null
2025-06-26	Zero-Shot Learning for Obsolescence Risk Forecasting	Elie Saad et.al.	2506.21240	null
2025-06-26	Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval	Yongchan Chun et.al.	2506.21222	null
2025-06-26	Complexity-aware fine-tuning	Andrey Goncharov et.al.	2506.21220	null
2025-06-26	Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?	Haoang Chi et.al.	2506.21215	null
2025-06-26	$T^3$ : Multi-level Tree-based Automatic Program Repair with Large Language Models	Quanming Liu et.al.	2506.21211	null
2025-06-26	BitMark for Infinity: Watermarking Bitwise Autoregressive Image Generative Models	Louis Kerner et.al.	2506.21209	null
2025-06-26	MedPrompt: LLM-CNN Fusion with Weight Routing for Medical Image Segmentation and Classification	Shadman Sobhan et.al.	2506.21199	null
2025-06-26	Prompt-Guided Turn-Taking Prediction	Koji Inoue et.al.	2506.21191	null
2025-06-26	GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding	Zijun Lin et.al.	2506.21188	null
2025-06-26	Task-Aware KV Compression For Cost-Effective Long Video Understanding	Minghao Qin et.al.	2506.21184	null
2025-06-26	Generative Adversarial Evasion and Out-of-Distribution Detection for UAV Cyber-Attacks	Deepak Kumar Panda et.al.	2506.21142	null
2025-06-26	How Good Are Synthetic Requirements ? Evaluating LLM-Generated Datasets for AI4RE	Abdelkarim El-Hajjami et.al.	2506.21138	null
2025-06-26	IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes	Yujia Liang et.al.	2506.21116	null
2025-06-26	OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography	Caoshuo Li et.al.	2506.21101	null
2025-06-26	Enhancing LLM Tool Use with High-quality Instruction Data from Knowledge Graph	Jingwei Wang et.al.	2506.21071	null
2025-06-26	MT2-CSD: A New Dataset and Multi-Semantic Knowledge Fusion Method for Conversational Stance Detection	Fuqiang Niu et.al.	2506.21053	null
2025-06-26	V2X-REALM: Vision-Language Model-Based Robust End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling	Junwei You et.al.	2506.21041	null
2025-06-26	Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning	Haodong Lu et.al.	2506.21035	null
2025-06-26	BLOCKS: Blockchain-supported Cross-Silo Knowledge Sharing for Efficient LLM Services	Zhaojiacheng Zhou et.al.	2506.21033	null
2025-06-26	Large Language Models Acing Chartered Accountancy	Jatin Gupta et.al.	2506.21031	null
2025-06-26	STEP Planner: Constructing cross-hierarchical subgoal tree as an embodied long-horizon task planner	Zhou Tianxing et.al.	2506.21030	null
2025-06-26	Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation	Ze Wang et.al.	2506.21022	null
2025-06-26	Multimodal Prompt Alignment for Facial Expression Recognition	Fuyan Ma et.al.	2506.21017	null
2025-06-26	HybridQ: Hybrid Classical-Quantum Generative Adversarial Network for Skin Disease Image Generation	Qingyue Jiao et.al.	2506.21015	null
2025-06-26	Distilling Normalizing Flows	Steven Walton et.al.	2506.21003	null
2025-06-26	SAC: A Framework for Measuring and Inducing Personality Traits in LLMs with Dynamic Intensity Control	Adithya Chittem et.al.	2506.20993	null
2025-06-26	Segment Anything in Pathology Images with Natural Language	Zhixuan Chen et.al.	2506.20988	null
2025-06-26	Our Coding Adventure: Using LLMs to Personalise the Narrative of a Tangible Programming Robot for Preschoolers	Martin Ruskov et.al.	2506.20982	null
2025-06-26	Response Quality Assessment for Retrieval-Augmented Generation via Conditional Conformal Factuality	Naihe Feng et.al.	2506.20978	null
2025-06-26	Where is AIED Headed? Key Topics and Emerging Frontiers (2020-2024)	Shihui Feng et.al.	2506.20971	null
2025-06-26	Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and Trends	Tian-Yu Xiang et.al.	2506.20966	null
2025-06-26	Evidence-based diagnostic reasoning with multi-agent copilot for human pathology	Chengkuan Chen et.al.	2506.20964	null
2025-06-26	EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora	Fangyuan Zhang et.al.	2506.20963	null
2025-06-26	Hierarchical Sub-action Tree for Continuous Sign Language Recognition	Dejie Yang et.al.	2506.20947	null
2025-06-26	Consistent Zero-shot 3D Texture Synthesis Using Geometry-aware Diffusion and Temporal Video Models	Donggoo Kang et.al.	2506.20946	null
2025-06-26	E-FreeM2: Efficient Training-Free Multi-Scale and Cross-Modal News Verification via MLLMs	Van-Hoang Phan et.al.	2506.20944	null
2025-06-26	Model State Arithmetic for Machine Unlearning	Keivan Rezaei et.al.	2506.20941	null
2025-06-26	ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks	Joshua H. Davis et.al.	2506.20938	null
2025-06-26	LLM-guided Chemical Process Optimization with a Multi-Agent Approach	Tong Zeng et.al.	2506.20921	null
2025-06-26	FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language	Guilherme Penedo et.al.	2506.20920	null
2025-06-26	Metadata Enrichment of Long Text Documents using Large Language Models	Manika Lamba et.al.	2506.20918	null
2025-06-26	ZKPROV: A Zero-Knowledge Approach to Dataset Provenance for Large Language Models	Mina Namazi et.al.	2506.20915	null
2025-06-26	*FaSTA $^$ : Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing**	Advait Gupta et.al.	2506.20911	null
2025-06-25	Omniwise: Predicting GPU Kernels Performance with LLMs	Zixian Wang et.al.	2506.20886	null
2025-06-25	MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans	Shubhankar Borse et.al.	2506.20879	null
2025-06-25	3DGH: 3D Head Generation with Composable Hair and Face	Chengan He et.al.	2506.20875	null
2025-06-25	Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation	Md Toufique Hasan et.al.	2506.20869	null
2025-06-25	Leaner Training, Lower Leakage: Revisiting Memorization in LLM Fine-Tuning with LoRA	Fei Wang et.al.	2506.20856	null
2025-06-25	Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision	Yuting He et.al.	2506.20850	null
2025-06-25	Uncovering Hidden Violent Tendencies in LLMs: A Demographic Analysis via Behavioral Vignettes	Quintin Myers et.al.	2506.20822	null
2025-06-25	MultiFinRAG: An Optimized Multimodal Retrieval-Augmented Generation (RAG) Framework for Financial Question Answering	Chinmay Gondhalekar et.al.	2506.20821	null
2025-06-25	GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization	Martin Andrews et.al.	[2506.20807](http://arxiv.org/abs

Name		Name	Last commit message	Last commit date
Latest commit History 626 Commits
.github/workflows		.github/workflows
docs		docs
README.md		README.md
config.yaml		config.yaml
daily_arxiv.py		daily_arxiv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Updated on 2025.11.12

LLM Reasoning

LLM Evaluation

LLM MLLM

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Xuchen-Li/llm-arxiv-daily

Folders and files

Latest commit

History

Repository files navigation

Updated on 2025.11.12

LLM Reasoning

LLM Evaluation

LLM MLLM

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages