Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Automatically update arXiv papers about LLM Reasoning, LLM Evaluation, LLM & MLLM and Video Understanding using Github Actions.

Notifications You must be signed in to change notification settings

Xuchen-Li/llm-arxiv-daily

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Updated on 2025.11.12

Table of Contents
  1. LLM Reasoning
  2. LLM Evaluation
  3. LLM MLLM
  4. Video Understanding

LLM Reasoning

Publish Date Title Authors PDF Code
2025-07-23 InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation Shuai Yang et.al. 2507.17520 null
2025-07-23 MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs Alexander R. Fabbri et.al. 2507.17476 null
2025-07-23 HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs Zhaolin Cai et.al. 2507.17394 null
2025-07-23 Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance Rishi Parekh et.al. 2507.17273 null
2025-07-22 Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning Junhao Shen et.al. 2507.16814 null
2025-07-22 Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning Ang Li et.al. 2507.16746 null
2025-07-23 WAKENLLM: Evaluating Reasoning Potential and Stability in LLMs via Fine-Grained Benchmarking Zipeng Ling et.al. 2507.16199 null
2025-07-21 Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization Shengchao Liu et.al. 2507.16110 null
2025-07-21 The Impact of Language Mixing on Bilingual LLM Reasoning Yihao Li et.al. 2507.15849 null
2025-07-21 EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent Jiaao Li et.al. 2507.15428 null
2025-07-20 LEKIA: A Framework for Architectural Alignment via Expert Knowledge Injection Boning Zhao et.al. 2507.14944 null
2025-07-18 A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning Licheng Liu et.al. 2507.14295 null
2025-07-18 Team of One: Cracking Complex Video QA with Model Synergy Jun Xie et.al. 2507.13820 null
2025-07-17 The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner Zhouqi Hua et.al. 2507.13332 null
2025-07-17 Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark Junsu Kim et.al. 2507.13314 null
2025-07-17 HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models Ashray Gupta et.al. 2507.13238 null
2025-07-17 Probabilistic Soundness Guarantees in LLM Reasoning Chains Weiqiu You et.al. 2507.12948 null
2025-07-16 Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize? Yanjian Zhang et.al. 2507.11423 null
2025-07-15 KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning? Soumadeep Saha et.al. 2507.11408 null
2025-07-15 Guiding LLM Decision-Making with Fairness Reward Models Zara Hall et.al. 2507.11344 null
2025-07-15 MSA at ImageCLEF 2025 Multimodal Reasoning: Multilingual Multimodal Reasoning With Ensemble Vision Language Models Seif Ahmed et.al. 2507.11114 null
2025-07-15 Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation Yanbo Wang et.al. 2507.11001 null
2025-07-15 Modeling Understanding of Story-Based Analogies Using Large Language Models Kalit Inani et.al. 2507.10957 null
2025-07-14 Foundation Model Driven Robotics: A Comprehensive Review Muhammad Tayyab Khan et.al. 2507.10087 null
2025-07-13 Reframing SAR Target Recognition as Visual Reasoning: A Chain-of-Thought Dataset with Multimodal LLMs Chaoran Li et.al. 2507.09535 null
2025-07-11 GraphRunner: A Multi-Stage Framework for Efficient and Accurate Graph-Based Retrieval Savini Kashmira et.al. 2507.08945 null
2025-07-11 Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning Xingguang Ji et.al. 2507.08649 null
2025-07-11 ChainEdit: Propagating Ripple Effects in LLM Knowledge Editing through Logical Rule-Guided Chains Zilu Dong et.al. 2507.08427 null
2025-07-10 ALCo-FM: Adaptive Long-Context Foundation Model for Accident Prediction Pinaki Prasad Guha Neogi et.al. 2507.08153 null
2025-07-10 MIRA: A Novel Framework for Fusing Modalities in Medical RAG Jinhong Wang et.al. 2507.07902 null
2025-07-10 The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs Jierun Chen et.al. 2507.07562 null
2025-07-10 RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning Hongzhi Zhang et.al. 2507.07451 null
2025-07-11 StarDojo: Benchmarking Open-Ended Behaviors of Agentic Multimodal LLMs in Production-Living Simulations with Stardew Valley Weihao Tan et.al. 2507.07445 null
2025-07-09 MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning Chengfei Wu et.al. 2507.07297 null
2025-07-07 DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning Shreyas Vinaya Sathyanarayana et.al. 2507.07060 null
2025-07-09 First Return, Entropy-Eliciting Explore Tianyu Zheng et.al. 2507.07017 null
2025-07-09 Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs Yahan Yu et.al. 2507.06999 null
2025-07-09 Are They All Good? Evaluating the Quality of CoTs in LLM-based Code Generation Binquan Zhang et.al. 2507.06980 null
2025-07-10 Rethinking Verification for LLM Code Generation: From Generation to Testing Zihan Ma et.al. 2507.06920 null
2025-07-09 From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization Xinjie Chen et.al. 2507.06573 null
2025-07-13 Perception-Aware Policy Optimization for Multimodal Reasoning Zhenhailong Wang et.al. 2507.06448 null
2025-07-08 Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling Prahitha Movva et.al. 2507.06183 null
2025-07-10 Skywork-R1V3 Technical Report Wei Shen et.al. 2507.06167 null
2025-07-08 KERAG_R: Knowledge-Enhanced Retrieval-Augmented Generation for Recommendation Zeyuan Meng et.al. 2507.05863 null
2025-07-09 Measuring how changes in code readability attributes affect code quality evaluation by Large Language Models Igor Regis da Silva Simoes et.al. 2507.05289 null
2025-07-07 Spatio-Temporal LLM: Reasoning about Environments and Actions Haozhen Zheng et.al. 2507.05258 null
2025-07-07 Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Yana Wei et.al. 2507.05255 null
2025-07-07 MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction Kaleem Ullah Qasim et.al. 2507.04893 null
2025-07-17 DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge Wenyao Zhang et.al. 2507.04447 null
2025-07-05 CoT-Segmenter: Enhancing OOD Detection in Dense Road Scenes via Chain-of-Thought Reasoning Jeonghyo Song et.al. 2507.03984 null
2025-07-04 Effects of structure on reasoning in instance-level Self-Discover Sachith Gunasekara et.al. 2507.03347 null
2025-07-03 RCA Copilot: Transforming Network Data into Actionable Insights via Large Language Models Alexander Shan et.al. 2507.03224 null
2025-07-03 Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization Marco Simoni et.al. 2507.03051 null
2025-07-02 Look-Back: Implicit Visual Re-focusing in MLLM Reasoning Shuo Yang et.al. 2507.03019 null
2025-07-01 From Answers to Rationales: Self-Aligning Multimodal Reasoning with Answer-Oriented Chain-of-Thought Wentao Tan et.al. 2507.02984 null
2025-06-26 Large Language Model Agent for Modular Task Execution in Drug Discovery Janghoon Ock et.al. 2507.02925 null
2025-07-03 MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs Purbesh Mitra et.al. 2507.02851 null
2025-07-03 Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation Jungkoo Kang et.al. 2507.02253 null
2025-07-02 Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs Mohammad Ali Alomrani et.al. 2507.02076 null
2025-07-02 GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning GLM-V Team et.al. 2507.01006 null
2025-07-01 HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning Zhi Jing et.al. 2507.00833 null
2025-07-01 Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Maggie Huan et.al. 2507.00432 null
2025-07-01 Causal Prompting for Implicit Sentiment Analysis with Large Language Models Jing Ren et.al. 2507.00389 null
2025-06-22 TalentMine: LLM-Based Extraction and Question-Answering from Multimodal Talent Tables Varun Mannam et.al. 2507.00041 null
2025-07-03 Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Zhaochen Su et.al. 2506.23918 null
2025-06-30 Interactive Reasoning: Visualizing and Controlling Chain-of-Thought Reasoning in Large Language Models Rock Yuren Pang et.al. 2506.23678 null
2025-06-30 MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI Huanjin Yao et.al. 2506.23563 null
2025-06-29 Are Large Language Models Capable of Deep Relational Reasoning? Insights from DeepSeek-R1 and Benchmark Comparisons Chi Chiu So et.al. 2506.23128 null
2025-06-29 Decoding Memes: Benchmarking Narrative Role Classification across Multilingual and Multimodal Models Shivam Sharma et.al. 2506.23122 null
2025-06-28 MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning Yulun Jiang et.al. 2506.22992 null
2025-06-26 APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization Minjie Hong et.al. 2506.21655 null
2025-06-24 FrankenBot: Brain-Morphic Modular Orchestration for Robotic Manipulation with Vision-Language Models Shiyi Wang et.al. 2506.21627 null
2025-06-30 FinEval-KR: A Financial Domain Evaluation Framework for Large Language Models' Knowledge and Reasoning Shaoyu Dou et.al. 2506.21591 null
2025-06-11 Debunk and Infer: Multimodal Fake News Detection via Diffusion-Generated Evidence and LLM Reasoning Kaiying Yan et.al. 2506.21557 null
2025-06-26 HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context Qize Yang et.al. 2506.21277 null
2025-06-26 Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? Haoang Chi et.al. 2506.21215 null
2025-06-25 MultiFinRAG: An Optimized Multimodal Retrieval-Augmented Generation (RAG) Framework for Financial Question Answering Chinmay Gondhalekar et.al. 2506.20821 null
2025-06-25 Generative AI for Vulnerability Detection in 6G Wireless Networks: Advances, Case Study, and Future Directions Shuo Yang et.al. 2506.20488 null
2025-06-24 KnowMap: Efficient Knowledge-Driven Task Adaptation for LLMs Kelin Fu et.al. 2506.19527 null
2025-06-24 MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models Yinan Xia et.al. 2506.19257 null
2025-06-25 Thought Anchors: Which LLM Reasoning Steps Matter? Paul C. Bogdan et.al. 2506.19143 null
2025-06-23 Finding Clustering Algorithms in the Transformer Architecture Kenneth L. Clarkson et.al. 2506.19125 null
2025-06-23 Human-Aligned Faithfulness in Toxicity Explanations of LLMs Ramaravind K. Mothilal et.al. 2506.19113 null
2025-06-23 Baba is LLM: Reasoning in a Game with Dynamic Rules Fien van Wetten et.al. 2506.19095 null
2025-06-23 OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization Yiyou Sun et.al. 2506.18880 null
2025-06-24 ReDit: Reward Dithering for Improved LLM Policy Optimization Chenxing Wei et.al. 2506.18631 null
2025-06-22 Adapting Vision-Language Models for Evaluating World Models Mariya Hendriksen et.al. 2506.17967 null
2025-06-20 Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling? Mingyuan Wu et.al. 2506.17417 null
2025-06-14 CORONA: A Coarse-to-Fine Framework for Graph-based Recommendation with Large Language Models Junze Chen et.al. 2506.17281 null
2025-06-25 No Free Lunch: Rethinking Internal Feedback for LLM Reasoning Yanzhi Zhang et.al. 2506.17219 null
2025-06-20 Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Zeyuan Yang et.al. 2506.17218 link
2025-06-20 MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation Shoubin Yu et.al. 2506.17113 link
2025-06-20 MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models Xiaolong Wang et.al. 2506.17046 null
2025-06-20 LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation Tongtian Yue et.al. 2506.16691 null
2025-06-19 GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View Fenghua Cheng et.al. 2506.16633 null
2025-06-19 History-Augmented Vision-Language Models for Frontier-Based Zero-Shot Object Navigation Mobin Habibpour et.al. 2506.16623 null
2025-06-19 How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering? Giuseppe Lando et.al. 2506.16450 null
2025-06-19 TrajSceneLLM: A Multimodal Perspective on Semantic GPS Trajectory Analysis Chunhou Ji et.al. 2506.16401 link
2025-07-17 SHREC: A Framework for Advancing Next-Generation Computational Phenotyping with Large Language Models Sarah Pungitore et.al. 2506.16359 null
2025-06-19 GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning Yi Chen et.al. 2506.16141 link
2025-06-23 SLR: An Automated Synthesis Framework for Scalable Logical Reasoning Lukas Helff et.al. 2506.15787 null
2025-06-18 CC-LEARN: Cohort-based Consistency Learning Xiao Ye et.al. 2506.15662 null
2025-06-18 MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering Xinqi Fan et.al. 2506.15298 null
2025-06-17 Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Zhoujun Cheng et.al. 2506.14965 link
2025-06-17 Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework Mohna Chakraborty et.al. 2506.14948 null
2025-06-17 PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning Yizhen Zhang et.al. 2506.14907 link
2025-06-12 FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models Yao Zhang et.al. 2506.14824 null
2025-06-17 RadFabric: Agentic AI System with Reasoning Capability for Radiology Wenting Chen et.al. 2506.14142 null
2025-06-17 A Hierarchical Test Platform for Vision Language Model (VLM)-Integrated Real-World Autonomous Driving Yupeng Zhou et.al. 2506.14100 null
2025-06-16 How Does LLM Reasoning Work for Code? A Survey and a Call to Action Ira Ceka et.al. 2506.13932 null
2025-06-16 VL-GenRM: Enhancing Vision-Language Verification via Vision Experts and Iterative Training Jipeng Zhang et.al. 2506.13888 null
2025-06-16 LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning Miho Koda et.al. 2506.13841 link
2025-06-16 Steering LLM Thinking with Budget Guidance Junyan Li et.al. 2506.13752 link
2025-06-16 Decompositional Reasoning for Graph Retrieval with Large Language Models Valentin Six et.al. 2506.13380 null
2025-07-10 Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models James Chua et.al. 2506.13206 null
2025-06-16 FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design Kai Lan et.al. 2506.13066 null
2025-06-26 Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning Haibo Qiu et.al. 2506.13056 null
2025-06-20 Domain Specific Benchmarks for Evaluating Multimodal Large Language Models Khizar Anjum et.al. 2506.12958 null
2025-06-15 SciDA: Scientific Dynamic Assessor of LLMs Junting Zhou et.al. 2506.12909 null
2025-06-14 Graph of Verification: Structured Verification of LLM Reasoning with Directed Acyclic Graphs Jiwei Fang et.al. 2506.12509 null
2025-06-14 Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics Asifullah khan et.al. 2506.12365 null
2025-06-22 MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval Mingjun Xu et.al. 2506.12364 null
2025-06-13 Tracing LLM Reasoning Processes with Strategic Games: A Framework for Planning, Revision, and Resource-Constrained Decision Making Xiaopeng Yuan et.al. 2506.12012 null
2025-06-22 How Visual Representations Map to Language Feature Space in Multimodal LLMs Constantin Venhoff et.al. 2506.11976 null
2025-06-13 LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? Zihan Zheng et.al. 2506.11928 null
2025-06-13 EasyARC: Evaluating Vision Language Models on True Visual Reasoning Mert Unsal et.al. 2506.11595 null
2025-06-13 VFaith: Do Large Multimodal Models Really Reason on Seen Images Rather than Previous Memories? Jiachen Yu et.al. 2506.11571 null
2025-07-04 LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment Shipeng Li et.al. 2506.11480 null
2025-06-09 KokushiMD-10: Benchmark for Evaluating Large Language Models on Ten Japanese National Healthcare Licensing Examinations Junyu Liu et.al. 2506.11114 null
2025-06-13 MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning Yuxuan Luo et.al. 2506.10963 null
2025-06-12 Improving Named Entity Transcription with Contextual LLM-based Revision Viet Anh Trinh et.al. 2506.10779 null
2025-06-12 NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors Numaan Naeem et.al. 2506.10627 link
2025-06-25 Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning Yuhao Zhou et.al. 2506.10521 null
2025-06-12 Reliable Reasoning Path: Distilling Effective Guidance for LLM Reasoning with Knowledge Graphs Yilin Xiao et.al. 2506.10508 null
2025-06-16 Specification and Evaluation of Multi-Agent LLM Systems -- Prototype and Cybersecurity Applications Felix Härer et.al. 2506.10467 link
2025-06-12 Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty Zehui Ling et.al. 2506.10446 null
2025-06-12 Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts Zaijing Li et.al. 2506.10357 null
2025-06-12 Code Execution as Grounded Supervision for LLM Reasoning Dongwon Jung et.al. 2506.10343 link
2025-06-11 ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering Caijun Jia et.al. 2506.10116 null
2025-06-19 Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing Junfei Wu et.al. 2506.09965 link
2025-06-11 Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning Xiangning Yu et.al. 2506.09853 null
2025-06-11 AD^2-Bench: A Hierarchical CoT Benchmark for MLLM in Autonomous Driving under Adverse Conditions Zhaoyang Wei et.al. 2506.09557 null
2025-06-11 Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models Shuai Wang et.al. 2506.09532 null
2025-06-13 e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs Amrith Setlur et.al. 2506.09026 null
2025-06-10 Learning to Reason Across Parallel Samples for LLM Reasoning Jianing Qi et.al. 2506.09014 null
2025-06-10 SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning Xiao Liang et.al. 2506.08989 link
2025-06-10 Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning Kongcheng Zhang et.al. 2506.08745 link
2025-06-10 Safe and Economical UAV Trajectory Planning in Low-Altitude Airspace: A Hybrid DRL-LLM Approach with Compliance Awareness Yanwei Gong et.al. 2506.08532 null
2025-06-10 Reinforce LLM Reasoning through Multi-Agent Reflection Yurun Yuan et.al. 2506.08379 null
2025-06-18 Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency Chenlong Wang et.al. 2506.08343 null
2025-06-09 From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium Xie Yi et.al. 2506.08292 link
2025-06-09 Automatic Generation of Inference Making Questions for Reading Comprehension Assessments Wanjing Anya Ma et.al. 2506.08260 link
2025-06-12 Play to Generalize: Learning to Reason Through Game Play Yunfei Xie et.al. 2506.08011 link
2025-06-11 Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations Yizhen Li et.al. 2506.07943 null
2025-06-09 WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning Jie Yang et.al. 2506.07905 link
2025-06-10 Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation Jiaxiang Chen et.al. 2506.07820 null
2025-06-11 AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking Silin Gao et.al. 2506.07751 null
2025-06-10 Synthesis by Design: Controlled Data Generation via Structural Guidance Lei Xu et.al. 2506.07664 null
2025-06-11 SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems Peiran Li et.al. 2506.07564 null
2025-06-09 SELT: Self-Evaluation Tree Search for LLMs with Task Decomposition Mengsong Wu et.al. 2506.07557 null
2025-06-09 Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions Lu Ma et.al. 2506.07527 link
2025-06-11 MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models Philip R. Liu et.al. 2506.07400 link
2025-06-09 Improving LLM Reasoning through Interpretable Role-Playing Steering Anyi Wang et.al. 2506.07335 null
2025-06-08 Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs Roy Eisenstadt et.al. 2506.07240 null
2025-06-08 Advancing Multimodal Reasoning Capabilities of Multimodal Large Language Models via Visual Perception Reward Tong Xiao et.al. 2506.07218 null
2025-06-08 Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs Wenrui Zhou et.al. 2506.07180 null
2025-06-08 Learning Compact Vision Tokens for Efficient Large Multimodal Models Hao Tang et.al. 2506.07138 link
2025-06-08 Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models Samir Abdaljalil et.al. 2506.07106 null
2025-06-12 Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation Jaechul Roh et.al. 2506.06971 link
2025-06-07 Boosting LLM Reasoning via Spontaneous Self-Correction Xutong Zhao et.al. 2506.06923 null
2025-06-07 Harnessing Vision-Language Models for Time Series Anomaly Detection Zelin He et.al. 2506.06836 null
2025-06-07 VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs Can Li et.al. 2506.06727 null
2025-06-07 Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning Shubham Parashar et.al. 2506.06632 null
2025-06-14 RARL: Improving Medical VLM Reasoning and Generalization with Reinforcement Learning and LoRA under Data and Hardware Constraints Tan-Hanh Pham et.al. 2506.06600 null
2025-06-06 SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation Yanwei Ren et.al. 2506.06470 null
2025-06-06 Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance Ruizhong Qiu et.al. 2506.06444 link
2025-06-06 PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts Hengzhi Li et.al. 2506.06211 null
2025-06-06 Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router Chenyang Shao et.al. 2506.05901 null
2025-06-06 BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions Saptarshi Sengupta et.al. 2506.05766 null
2025-06-05 MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning Zikui Cai et.al. 2506.05523 null
2025-06-05 DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning Tanmay Parekh et.al. 2506.05128 null
2025-06-09 Reason-to-Recommend: Using Interaction-of-Thought Reasoning to Enhance LLM Recommendation Keyu Zhao et.al. 2506.05069 null
2025-06-12 Context Is Not Comprehension Alex Pan et.al. 2506.04907 null
2025-06-05 ICPC-Eval: Probing the Frontiers of LLM Reasoning with Competitive Programming Contests Shiyi Xu et.al. 2506.04894 link
2025-06-10 Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design Lin Sun et.al. 2506.04734 null
2025-06-05 Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation Yuyang Wanyan et.al. 2506.04614 null
2025-06-05 MuSciClaims: Multimodal Scientific Claim Verification Yash Kumar Lal et.al. 2506.04585 null
2025-06-04 Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences Hadi Hosseini et.al. 2506.04478 null
2025-06-04 RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought Yi Lu et.al. 2506.04277 null
2025-06-04 Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning Shuang Chen et.al. 2506.04207 null
2025-06-04 R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning Qingfei Zhao et.al. 2506.04185 link
2025-06-04 MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos Kejian Zhu et.al. 2506.04141 null
2025-06-04 Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM Reasoning Junqi Gao et.al. 2506.03939 link
2025-06-04 Reason from Future: Reverse Thought Chain Enhances LLM Reasoning Yinlong Xu et.al. 2506.03673 null
2025-06-16 Zero-Shot Temporal Interaction Localization for Egocentric Videos Erhang Zhang et.al. 2506.03662 link
2025-06-04 MiMo-VL Technical Report Xiaomi LLM-Core Team et.al. 2506.03569 link
2025-06-04 Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback Xiaoying Zhang et.al. 2506.03106 null
2025-06-04 Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning Chen Qian et.al. 2506.02867 link
2025-06-14 TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression Zhong-Zhi Li et.al. 2506.02678 link
2025-06-03 A Smart Multimodal Healthcare Copilot with Powerful LLM Reasoning Xuejiao Zhao et.al. 2506.02470 link
2025-06-02 Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts Haizhong Zheng et.al. 2506.02177 null
2025-06-02 Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains Juncheng Wu et.al. 2506.02126 null
2025-06-02 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Shenzhi Wang et.al. 2506.01939 null
2025-06-02 Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books Chen Zhang et.al. 2506.01796 null
2025-06-02 R2SM: Referring and Reasoning for Selective Masks Yu-Lin Shih et.al. 2506.01795 null
2025-06-02 SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning Zhongwei Wan et.al. 2506.01713 null
2025-06-02 K12Vista: Exploring the Boundaries of MLLMs in K-12 Education Chong Li et.al. 2506.01676 null
2025-06-02 EvolveNav: Self-Improving Embodied Reasoning for LLM-Based Vision-Language Navigation Bingqian Lin et.al. 2506.01551 null
2025-06-02 Compiler Optimization via LLM Reasoning for Efficient Model Serving Sujun Tang et.al. 2506.01374 null
2025-06-02 The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning Xinyu Zhu et.al. 2506.01347 link
2025-06-01 GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking Yufei Zhan et.al. 2506.01078 link
2025-06-01 Enhancing LLM Reasoning for Time Series Classification by Tailored Thinking and Fused Decision Jiahui Zhou et.al. 2506.00807 null
2025-05-31 Beyond Context to Cognitive Appraisal: Emotion Reasoning as a Theory of Mind Benchmark for Large Language Models Gerard Christopher Yeo et.al. 2506.00334 null
2025-05-30 Tournament of Prompts: Evolving LLM Instructions Through Structured Debates and Elo Ratings Anirudh Nair et.al. 2506.00178 null
2025-05-30 Werewolf: A Straightforward Game Framework with TTS for Improved User Engagement Qihui Fan et.al. 2506.00160 null
2025-05-28 Rethinking Hybrid Retrieval: When Small Embeddings and LLM Re-ranking Beat Bigger Models Arjun Rao et.al. 2506.00049 null
2025-05-30 Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents Yaxin Luo et.al. 2505.24878 link
2025-05-30 Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks Tajamul Ashraf et.al. 2505.24876 link
2025-05-30 Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning Shuyao Xu et.al. 2505.24850 link
2025-05-30 Random Rule Forest (RRF): Interpretable Ensembles of LLM-Generated Questions for Predicting Startup Success Ben Griffin et.al. 2505.24622 null
2025-06-10 Can Slow-thinking LLMs Reason Over Time? Empirical Studies in Time Series Forecasting Jiahao Wang et.al. 2505.24511 link
2025-05-30 Reason-SVG: Hybrid Reward RL for Aha-Moments in Vector Graphics Generation Ximing Xing et.al. 2505.24499 null
2025-05-30 How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning Hongyi James Cai et.al. 2505.24273 null
2025-06-02 MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM Bowen Dong et.al. 2505.24238 null
2025-05-30 Semi-structured LLM Reasoners Can Be Rigorously Audited Jixuan Leng et.al. 2505.24217 null
2025-05-30 HardTests: Synthesizing High-Quality Test Cases for LLM Coding Zhongmou He et.al. 2505.24098 null
2025-05-29 Preemptive Hallucination Reduction: An Input-Level Approach for Multimodal Language Model Nokimul Hasan Arif et.al. 2505.24007 null
2025-05-29 VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL Yichen Feng et.al. 2505.23977 null
2025-05-29 Infi-Med: Low-Resource Medical MLLMs with Robust Reasoning Evaluation Zeyu Liu et.al. 2505.23867 null
2025-05-29 Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought Yunze Man et.al. 2505.23766 null
2025-06-03 DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning Ziyin Zhang et.al. 2505.23754 link
2025-05-29 Don't Take the Premise for Granted: Evaluating the Premise Critique Ability of Large Language Models Jinzhe Li et.al. 2505.23715 link
2025-05-29 Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation Ziling Cheng et.al. 2505.23701 null
2025-05-29 Probability-Consistent Preference Optimization for Enhanced LLM Reasoning Yunqiao Yang et.al. 2505.23540 link
2025-05-29 Diversity-Aware Policy Optimization for Large Language Model Reasoning Jian Yao et.al. 2505.23433 null
2025-05-29 GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning Jusheng Zhang et.al. 2505.23399 null
2025-06-05 MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence Calibration Zhitao He et.al. 2505.23224 link
2025-05-29 Elicit and Enhance: Advancing Multimodal Reasoning in Medical Scenarios Linjie Mu et.al. 2505.23118 null
2025-06-06 Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models Zeyu Liu et.al. 2505.23091 null
2025-05-29 Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction Guangyi Liu et.al. 2505.23034 null
2025-05-29 StrucSum: Graph-Structured Reasoning for Long Document Extractive Summarization with LLMs Haohan Yuan et.al. 2505.22950 null
2025-05-28 VidText: Towards Comprehensive Evaluation for Video Text Understanding Zhoufaran Yang et.al. 2505.22810 link
2025-05-28 Decomposing Elements of Problem Solving: What "Math" Does RL Teach? Tian Qin et.al. 2505.22756 link
2025-05-28 AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models Feng Luo et.al. 2505.22662 null
2025-05-28 SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning Jiaqi Huang et.al. 2505.22596 null
2025-05-28 ClaimPKG: Enhancing Claim Verification via Pseudo-Subgraph Generation with Lightweight Specialized LLM Hoang Pham et.al. 2505.22552 null
2025-05-28 Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO Lai Wei et.al. 2505.22453 link
2025-05-29 Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition Hanting Chen et.al. 2505.22375 null
2025-05-28 Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start Lai Wei et.al. 2505.22334 link
2025-05-28 If Pigs Could Fly... Can LLMs Logically Reason Through Counterfactuals? Ishwar B Balappanawar et.al. 2505.22318 null
2025-05-28 Rethinking the Unsolvable: When In-Context Search Meets Test-Time Scaling Fanzeng Xia et.al. 2505.22290 null
2025-05-28 What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning Gangwei Jiang et.al. 2505.22148 null
2025-05-28 OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning Shifang Zhao et.al. 2505.22039 null
2025-05-27 Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation Tharindu Kumarage et.al. 2505.21784 null
2025-05-27 Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models Sohyun An et.al. 2505.21765 null
2025-05-27 R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing Tianyu Fu et.al. 2505.21600 link
2025-05-31 More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models Chengzhi Liu et.al. 2505.21523 null
2025-05-27 Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning? Junhao Cheng et.al. 2505.21374 link
2025-05-27 MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs Jiakang Yuan et.al. 2505.21327 null
2025-05-27 Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning Mingyang Song et.al. 2505.21178 null
2025-05-27 DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response Junjue Wang et.al. 2505.21089 null
2025-06-04 LLMs Think, But Not In Your Flow: Reasoning-Level Personalization for Black-Box Large Language Models Jieyong Kim et.al. 2505.21082 null
2025-05-27 Def-DTS: Deductive Reasoning for Open-domain Dialogue Topic Segmentation Seungmin Lee et.al. 2505.21033 null
2025-05-27 Reason-Align-Respond: Aligning LLM Reasoning with Knowledge Graphs for KGQA Xiangqing Shen et.al. 2505.20971 null
2025-05-28 VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models Kui Wu et.al. 2505.20718 null
2025-05-27 Accelerating RL for LLM Reasoning with Optimal Advantage Regression Kianté Brantley et.al. 2505.20686 null
2025-05-27 Can Past Experience Accelerate LLM Reasoning? Bo Pan et.al. 2505.20643 null
2025-05-26 Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning Shenao Zhang et.al. 2505.20561 null
2025-05-26 Enhancing Logical Reasoning in Language Models via Symbolically-Guided Monte Carlo Process Supervision Xingwei Tan et.al. 2505.20415 null
2025-05-23 Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence Amirhosein Ghasemabadi et.al. 2505.20325 null
2025-05-26 KnowTrace: Bootstrapping Iterative Retrieval-Augmented Generation with Structured Knowledge Tracing Rui Li et.al. 2505.20245 link
2025-06-04 DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning Qi Cao et.al. 2505.20241 null
2025-05-26 THiNK: Can Large Language Models Think-aloud? Yongan Yu et.al. 2505.20184 link
2025-05-26 Visual Abstract Thinking Empowers Multimodal Reasoning Dairu Liu et.al. 2505.20164 link
2025-05-26 Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning Jaehun Jung et.al. 2505.20161 null
2025-05-26 Agentic 3D Scene Generation with Spatially Contextualized VLMs Xinhang Liu et.al. 2505.20129 null
2025-05-26 REARANK: Reasoning Re-ranking Agent via Reinforcement Learning Le Zhang et.al. 2505.20046 link
2025-05-26 An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning Andrew Zamai et.al. 2505.19954 null
2025-05-26 Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval Rong-Cheng Tu et.al. 2505.19952 null
2025-05-26 Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions Siqi Kou et.al. 2505.19949 null
2025-05-26 HS-STAR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation Feng Xiong et.al. 2505.19866 null
2025-05-26 Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective Junnan Liu et.al. 2505.19815 link
2025-05-26 MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval Rong-Cheng Tu et.al. 2505.19707 null
2025-05-26 Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning Minheng Ni et.al. 2505.19702 null
2025-05-26 Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models Lachlan McGinness et.al. 2505.19676 null
2025-05-26 Interleaved Reasoning for Large Language Models via Reinforcement Learning Roy Xie et.al. 2505.19640 null
2025-05-26 Self-Reflective Planning with Knowledge Graphs: Enhancing LLM Reasoning Reliability for Question Answering Jiajun Zhu et.al. 2505.19410 null
2025-05-25 SituatedThinker: Grounding LLM Reasoning with Real-World through Situated Thinking Junnan Liu et.al. 2505.19300 link
2025-05-28 VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use Mingyuan Wu et.al. 2505.19255 null
2025-05-25 ASPO: Adaptive Sentence-Level Preference Optimization for Fine-Grained Multimodal Reasoning Yeyuan Wang et.al. 2505.19100 null
2025-05-30 SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning Kun Xiang et.al. 2505.19099 link
2025-05-25 SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards Chuming Shen et.al. 2505.19094 link
2025-05-25 ReFineVLA: Reasoning-Aware Teacher-Guided Transfer Fine-Tuning Tuan Van Vo et.al. 2505.19080 null
2025-05-25 Can Large Language Models Infer Causal Relationships from Real-World Text? Ryan Saklad et.al. 2505.18931 null
2025-05-24 Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation Jiwan Chung et.al. 2505.18842 null
2025-05-24 Enhancing LLMs' Reasoning-Intensive Multimedia Search Capabilities through Fine-Tuning and Reinforcement Learning Jinzheng Li et.al. 2505.18831 null
2025-05-24 How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark Minglai Yang et.al. 2505.18761 link
2025-05-24 GainRAG: Preference Alignment in Retrieval-Augmented Generation through Gain Signal Synthesis Yi Jiang et.al. 2505.18710 link
2025-05-24 Steering LLM Reasoning Through Bias-Only Adaptation Viacheslav Sinii et.al. 2505.18706 null
2025-05-31 ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation Zhen Li et.al. 2505.18668 link
2025-05-24 Unraveling Misinformation Propagation in LLM Reasoning Yiyang Feng et.al. 2505.18555 link
2025-05-23 One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration Jinbang Huang et.al. 2505.18382 null
2025-05-23 Seeing Beyond Words: MatVQA for Challenging Visual-Scientific Reasoning in Materials Science Sifan Wu et.al. 2505.18319 null
2025-05-23 Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL Che Liu et.al. 2505.17952 null
2025-05-23 Stepwise Reasoning Checkpoint Analysis: A Test Time Scaling Method to Enhance LLMs' Reasoning Zezhong Wang et.al. 2505.17829 null
2025-05-23 Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning Michael Hassid et.al. 2505.17813 null
2025-05-23 Towards General Continuous Memory for Vision-Language Models Wenyi Wu et.al. 2505.17670 null
2025-05-23 EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications Ancheng Xu et.al. 2505.17654 null
2025-05-29 Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective Deyang Kong et.al. 2505.17652 null
2025-05-27 Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration Jingtong Gao et.al. 2505.17621 null
2025-05-23 MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation Jihan Yao et.al. 2505.17613 null
2025-05-23 On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning Yifan Zhang et.al. 2505.17508 null
2025-05-23 From Reasoning to Generalization: Knowledge-Augmented LLMs for ARC Benchmark Chao Lei et.al. 2505.17482 null
2025-05-23 Hydra: Structured Cross-Source Enhanced Large Language Model Reasoning Xingyu Tan et.al. 2505.17464 null
2025-05-23 LeTS: Learning to Think-and-Search via Process-and-Outcome Reward Hybridization Qi Zhang et.al. 2505.17447 null
2025-05-23 Misaligning Reasoning with Answers -- A Framework for Assessing LLM CoT Robustness Enyi Jiang et.al. 2505.17406 null
2025-05-22 LiloDriver: A Lifelong Learning Framework for Closed-loop Motion Planning in Long-tail Autonomous Driving Scenarios Huaiyuan Yao et.al. 2505.17209 link
2025-05-21 NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation Weiming Wu et.al. 2505.17121 null
2025-05-21 Systematic Evaluation of Machine-Generated Reasoning and PHQ-9 Labeling for Depression Detection Using Large Language Models Zongru Shao et.al. 2505.17119 null
2025-05-21 Swarm Intelligence Enhanced Reasoning: A Density-Driven Framework for LLM-Based Multi-Agent Optimization Ying Zhu et.al. 2505.17115 null
2025-05-21 CAMA: Enhancing Multimodal In-Context Learning with Context-Aware Modulated Attention Yanshu Li et.al. 2505.17097 null
2025-05-22 ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark Sara Ghaboura et.al. 2505.17021 link
2025-05-22 SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward Kaixuan Fan et.al. 2505.17018 link
2025-05-22 $\text{R}^2\text{ec}$ : Towards Large Recommender Models with Reasoning Runyang You et.al. 2505.16994 link
2025-05-22 Don't "Overthink" Passage Reranking: Is Reasoning Truly Necessary? Nour Jedidi et.al. 2505.16886 null
2025-05-26 DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation Bowen Zheng et.al. 2505.16810 null
2025-05-22 Two-way Evidence self-Alignment based Dual-Gated Reasoning Enhancement Kexin Zhang et.al. 2505.16806 null
2025-05-22 Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning Xinghao Chen et.al. 2505.16782 link
2025-05-22 Collaboration among Multiple Large Language Models for Medical Question Answering Kexin Shang et.al. 2505.16648 null
2025-05-27 Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains Wenhui Tan et.al. 2505.16552 null
2025-05-22 SATURN: SAT-based Reinforcement Learning to Unleash Language Model Reasoning Huanyu Liu et.al. 2505.16368 link
2025-05-22 EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning Jiawei Liu et.al. 2505.16312 link
2025-05-22 Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA Rishabh Maheshwary et.al. 2505.16293 null
2025-05-22 Training-Free Reasoning and Reflection in MLLMs Hongchen Wei et.al. 2505.16151 null
2025-05-22 Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning Shicheng Xu et.al. 2505.16142 null
2025-05-26 Abstractions-of-Thought: Intermediate Representations for LLM Reasoning in Hardware Design Matthew DeLorenzo et.al. 2505.15873 null
2025-05-21 LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models Ruilin Yao et.al. 2505.15616 null
2025-05-21 Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL Xintong Zhang et.al. 2505.15436 null
2025-05-21 Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning Yurun Yuan et.al. 2505.15311 null
2025-05-21 Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs Jie Ma et.al. 2505.15210 link
2025-05-21 Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning Jinghui Lu et.al. 2505.15154 null
2025-05-21 The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning Shivam Agarwal et.al. 2505.15134 link
2025-05-21 Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision Eric Hanchen Jiang et.al. 2505.14999 null
2025-05-20 Self-Evolving Curriculum for LLM Reasoning Xiaoyin Chen et.al. 2505.14970 null
2025-05-20 MORALISE: A Structured Benchmark for Moral Alignment in Visual Language Models Xiao Lin et.al. 2505.14728 null
2025-05-18 KGAlign: Joint Semantic-Structural Knowledge Encoding for Multimodal Fake News Detection Tuan-Vinh La et.al. 2505.14714 link
2025-05-23 Emerging Properties in Unified Multimodal Pretraining Chaorui Deng et.al. 2505.14683 null
2025-05-27 General-Reasoner: Advancing LLM Reasoning Across All Domains Xueguang Ma et.al. 2505.14652 null
2025-05-22 TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning Zhangchen Xu et.al. 2505.14625 link
2025-05-20 SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas Anjiang Wei et.al. 2505.14615 null
2025-05-21 KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation Jiajun Shi et.al. 2505.14552 link
2025-05-23 Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning Zhaohui Yang et.al. 2505.14403 null
2025-05-26 DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning Ziwei Zheng et.al. 2505.14362 link
2025-05-20 Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning Minwu Kim et.al. 2505.14216 link
2025-05-20 RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning Qianyue Hao et.al. 2505.14140 null
2025-05-20 Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning Jingqi Tong et.al. 2505.13886 link
2025-05-20 Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning Jiwon Song et.al. 2505.13866 link
2025-05-18 RAGXplain: From Explainable Evaluation to Actionable Guidance of RAG Pipelines Dvir Cohen et.al. 2505.13538 null
2025-05-16 IRLBench: A Multi-modal, Culturally Grounded, Parallel Irish-English Benchmark for Open-Ended LLM Reasoning Evaluation Khanh-Tung Tran et.al. 2505.13498 link
2025-05-19 MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision Lingxiao Du et.al. 2505.13427 link
2025-05-19 MR. Judge: Multimodal Reasoner as a Judge Renjie Pi et.al. 2505.13403 null
2025-05-20 Sense and Sensitivity: Examining the Influence of Semantic Recall on Long Context Code Reasoning Adam Štorek et.al. 2505.13353 null
2025-05-19 Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately Yuhang Wang et.al. 2505.13326 null
2025-05-19 Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space Hengli Li et.al. 2505.13308 link
2025-05-19 RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning Qiguang Chen et.al. 2505.13307 link
2025-05-19 Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning Mingrui Chen et.al. 2505.13261 null
2025-05-23 SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information Chih-Kai Yang et.al. 2505.13237 link
2025-05-21 Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model Yong Ren et.al. 2505.13062 null
2025-05-25 Fractured Chain-of-Thought Reasoning Baohao Liao et.al. 2505.12992 null
2025-05-19 DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management Xuerui Su et.al. 2505.12951 null
2025-05-19 The Traitors: Deception and Trust in Multi-Agent Language Model Simulations Pedro M. P. Curvo et.al. 2505.12923 link
2025-05-19 AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning Kai Zhang et.al. 2505.12782 null
2025-05-19 Incentivizing Multimodal Reasoning in Large Models for Direct Robot Manipulation Weiliang Tang et.al. 2505.12744 null
2025-05-18 Reasoning-CV: Fine-tuning Powerful Reasoning LLMs for Knowledge-Assisted Claim Verification Zhi Zheng et.al. 2505.12348 link
2025-05-18 LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images? Maoyuan Ye et.al. 2505.12307 link
2025-05-18 MMS-VPR: Multimodal Street-Level Visual Place Recognition Dataset and Benchmark Yiwei Ou et.al. 2505.12254 null
2025-05-17 Do Code LLMs Do Static Analysis? Chia-Yi Su et.al. 2505.12118 link
2025-05-17 Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier Jianyuan Zhong et.al. 2505.11966 null
2025-05-22 PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging Quoc-Huy Trinh et.al. 2505.11872 null
2025-05-17 Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning Yansong Ning et.al. 2505.11827 link
2025-05-16 REMOR: Automated Peer Review Generation with LLM Reasoning and Multi-Objective Reinforcement Learning Pawin Taechoyotin et.al. 2505.11718 null
2025-05-16 Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner Wenchuan Zhang et.al. 2505.11404 link
2025-05-23 SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning Zheng Li et.al. 2505.11274 null
2025-05-24 Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans Yansheng Qiu et.al. 2505.11141 null
2025-05-16 Scaling Reasoning can Improve Factuality in Large Language Models Mike Zhang et.al. 2505.11140 link
2025-05-16 Humans expect rationality and cooperation from LLM opponents in strategic games Darija Barak et.al. 2505.11011 null
2025-05-16 Vaiage: A Multi-Agent Solution to Personalized Travel Planning Binwen Liu et.al. 2505.10922 null
2025-05-15 Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning Yoichi Ishibashi et.al. 2505.10182 null
2025-05-15 XRAG: Cross-lingual Retrieval-Augmented Generation Wei Liu et.al. 2505.10089 null
2025-05-13 The Truth Becomes Clearer Through Debate! Multi-Agent Systems with Large Language Models Unmask Fake News Yuhan Liu et.al. 2505.08532 null
2025-05-13 Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation Enci Zhang et.al. 2505.08364 null
2025-05-12 KAQG: A Knowledge-Graph-Enhanced RAG for Difficulty-Controlled Question Generation Ching Han Chen et.al. 2505.07618 null
2025-05-12 How well do LLMs reason over tabular data, really? Cornelius Wolff et.al. 2505.07453 null
2025-05-12 Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning Xiaokun Wang et.al. 2505.07263 null
2025-05-12 Critique Before Thinking: Mitigating Hallucination through Rationale-Augmented Instruction Tuning Zexian Yang et.al. 2505.07172 null
2025-05-11 Seed1.5-VL Technical Report Dong Guo et.al. 2505.07062 null
2025-05-17 Bridging AI and Carbon Capture: A Dataset for LLMs in Ionic Liquids and CBE Research Gaurab Sarkar et.al. 2505.06964 link
2025-05-11 UniDiffGrasp: A Unified Framework Integrating VLM Reasoning and VLM-Guided Part Diffusion for Open-Vocabulary Constrained Grasping with Dual Arms Xueyang Guo et.al. 2505.06832 null
2025-05-11 Overview of the NLPCC 2025 Shared Task 4: Multi-modal, Multilingual, and Multi-hop Medical Instructional Video Question Answering Challenge Bin Li et.al. 2505.06814 null
2025-05-10 STRIVE: Structured Representation Integrating VLM Reasoning for Efficient Object Navigation Haokun Zhu et.al. 2505.06729 null
2025-05-17 Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Representation Learning Hang Gao et.al. 2505.06321 link
2025-05-07 Q-Heart: ECG Question Answering via Knowledge-Informed Multimodal LLMs Hung Manh Pham et.al. 2505.06296 null
2025-05-09 From Millions of Tweets to Actionable Insights: Leveraging LLMs for User Profiling Vahid Rahimzadeh et.al. 2505.06184 null
2025-05-12 APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning Azim Ospanov et.al. 2505.05758 null
2025-05-09 Evolutionary thoughts: integration of large language models and evolutionary algorithms Antonio Jimeno Yepes et.al. 2505.05756 link
2025-05-08 Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Yunxin Li et.al. 2505.04921 link
2025-05-07 Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers Kusha Sareen et.al. 2505.04842 null
2025-05-06 Advancing Conversational Diagnostic AI with Multimodal Reasoning Khaled Saab et.al. 2505.04653 null
2025-05-07 SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios Ning Cheng et.al. 2505.04201 null
2025-05-20 On-Device LLM for Context-Aware Wi-Fi Roaming Ju-Hyung Lee et.al. 2505.04174 link
2025-05-06 X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains Qianchu Liu et.al. 2505.03981 null
2025-04-30 When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator Md Fahim Anjum et.al. 2505.03786 link
2025-05-06 The Steganographic Potentials of Language Models Artem Karpov et.al. 2505.03439 null
2025-05-12 Geospatial Mechanistic Interpretability of Large Language Models Stef De Sabbata et.al. 2505.03368 link
2025-05-03 Accelerating Large Language Model Reasoning via Speculative Search Zhihai Wang et.al. 2505.02865 null
2025-05-05 HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking Runquan Gui et.al. 2505.02322 null
2025-05-04 DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving Xinmeng Hou et.al. 2505.02123 link
2025-05-04 R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation Meng-Hao Guo et.al. 2505.02018 null
2025-05-02 VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos Zongxia Li et.al. 2505.01481 link
2025-05-01 Reasoning Capabilities and Invariability of Large Language Models Alessandro Raganato et.al. 2505.00776 link
2025-04-30 Audo-Sight: Enabling Ambient Interaction For Blind And Visually Impaired Individuals Bhanuja Ainary et.al. 2505.00153 null
2025-05-02 Rosetta-PL: Propositional Logic as a Benchmark for Large Language Model Reasoning Shaun Baek et.al. 2505.00001 null
2025-05-21 Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models Guanghao Zhou et.al. 2504.21277 null
2025-05-09 Token-Efficient RL for LLM Reasoning Alan Lee et.al. 2504.20834 null
2025-04-29 Token-Efficient Prompt Injection Attack: Provoking Cessation in LLM Reasoning via Adaptive Token Compression Yu Cui et.al. 2504.20493 null
2025-04-30 VideoMultiAgents: A Multi-Agent Framework for Video Question Answering Noriyuki Kugo et.al. 2504.20091 link
2025-04-28 From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review Mohamed Amine Ferrag et.al. 2504.19678 null
2025-05-17 SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning Jiaqi Chen et.al. 2504.19162 null
2025-04-27 CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges Yu Li et.al. 2504.19093 null
2025-04-24 Training Large Language Models to Reason via EM Policy Gradient Tianbing Xu et.al. 2504.18587 null
2025-05-08 MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind Zheng Zhang et.al. 2504.18039 null
2025-05-13 DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training Xiaoyu Tian et.al. 2504.17565 null
2025-04-25 Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning Chris et.al. 2504.16656 link
2025-04-27 Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL Simone Papicchio et.al. 2504.15077 null
2025-04-20 a1: Steep Test-time Scaling Law via Environment Augmented Generation Lingrui Mei et.al. 2504.14597 null
2025-04-20 CoLoTa: A Dataset for Entity-based Commonsense Reasoning over Long-Tail Knowledge Armin Toroghi et.al. 2504.14462 null
2025-04-19 Improving RL Exploration for LLM Reasoning through Retrospective Replay Shihan Dou et.al. 2504.14363 null
2025-05-21 An Empirical Study of LLM Reasoning Ability Under Strict Output Length Constraint Yi Sun et.al. 2504.14350 null
2025-04-22 SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM Xiaojiang Zhang et.al. 2504.14286 null
2025-04-19 CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations Man Ho Lam et.al. 2504.14119 null
2025-04-18 Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods Junlin Wang et.al. 2504.14047 null
2025-03-26 3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark Ivan Sviridov et.al. 2504.13861 link
2025-05-16 Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Yang Yue et.al. 2504.13837 null
2025-04-18 Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning Jianing Wang et.al. 2504.13500 link
2025-04-17 Can LLMs reason over extended multilingual contexts? Towards long-context evaluation beyond retrieval and haystacks Amey Hengle et.al. 2504.12845 null
2025-05-19 GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks Hao Xu et.al. 2504.12764 link
2025-04-17 Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning Baining Zhao et.al. 2504.12680 link
2025-04-17 VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization Menglan Chen et.al. 2504.12661 null
2025-04-24 GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning Liangyu Xu et.al. 2504.12597 null
2025-04-13 HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation Pei Liu et.al. 2504.12330 link
2025-04-16 d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning Siyan Zhao et.al. 2504.12216 null
2025-04-16 Could Thinking Multilingually Empower LLM Reasoning? Changjiang Gao et.al. 2504.11833 link
2025-04-15 A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce Wei Xiong et.al. 2504.11343 link
2025-04-15 MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique Shuhang Liu et.al. 2504.11009 null
2025-05-14 CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives Ayoung Lee et.al. 2504.10823 null
2025-04-14 Weight-of-Thought Reasoning: Exploring Neural Network Weights for Enhanced LLM Reasoning Saif Punjwani et.al. 2504.10646 link
2025-04-30 VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge Yueqi Song et.al. 2504.10342 null
2025-04-14 SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model Zongcan Ding et.al. 2504.10320 null
2025-04-14 PRM-BAS: Enhancing Multimodal Reasoning through PRM-guided Beam Annealing Search Pengfei Hu et.al. 2504.10222 null
2025-04-15 Breaking the Data Barrier -- Building GUI Agents Through Task Generalization Junlei Zhang et.al. 2504.10127 link
2025-04-14 CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation Jia Li et.al. 2504.10046 null
2025-04-13 Short-Path Prompting in LLMs: Analyzing Reasoning Instability and Solutions for Robust Performance Zuoli Tang et.al. 2504.09586 null
2025-04-13 Draw with Thought: Unleashing Multimodal Reasoning for Scientific Diagram Generation Zhiqing Cui et.al. 2504.09479 null
2025-04-12 NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding Aniket Pal et.al. 2504.09249 null
2025-04-12 A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems Zixuan Ke et.al. 2504.09037 null
2025-04-11 Mixed Signals: Decoding VLMs' Reasoning and Underlying Bias in Vision-Language Conflict Pouya Pezeshkpour et.al. 2504.08974 null
2025-05-08 VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning Haozhe Wang et.al. 2504.08837 null
2025-04-06 AdaptRec: A Self-Adaptive Framework for Sequential Recommendations with Large Language Models Tong Zhang et.al. 2504.08786 null
2025-04-01 Accelerating Causal Network Discovery of Alzheimer Disease Biomarkers via Scientific Literature-based Retrieval Augmented Generation Xiaofan Zhou et.al. 2504.08768 null
2025-04-11 Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning Fangzhi Xu et.al. 2504.08672 link
2025-04-11 VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering Qi Zhi Lim et.al. 2504.08269 null
2025-04-15 Kimi-VL Technical Report Kimi Team et.al. 2504.07491 link
2025-04-02 DeepSeek-R1 Thoughtology: Let's about LLM Reasoning Sara Vera Marjanović et.al. 2504.07128 null
2025-04-09 KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs Elan Markowitz et.al. 2504.07087 null
2025-04-09 DeduCE: Deductive Consistency as a Framework to Evaluate LLM Reasoning Atharva Pandey et.al. 2504.07080 null
2025-04-09 To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning Tian Qin et.al. 2504.07052 null
2025-04-09 SCI-Reason: A Dataset with Chain-of-Thought Rationales for Complex Multimodal Reasoning in Academic Areas Chenghao Ma et.al. 2504.06637 null
2025-04-08 FEABench: Evaluating Language Models on Multiphysics Reasoning Ability Nayantara Mudur et.al. 2504.06260 link
2025-04-23 Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization Qingyang Zhang et.al. 2504.05812 link
2025-04-08 MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models Pengfei Zhou et.al. 2504.05782 link
2025-04-08 Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought Yi Peng et.al. 2504.05599 null
2025-04-06 ZeroED: Hybrid Zero-shot Error Detection through Large Language Model Reasoning Wei Ni et.al. 2504.05345 null
2025-04-07 Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning Sugyeong Eo et.al. 2504.05047 null
2025-04-07 LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts Yimu Wang et.al. 2504.04653 null
2025-04-06 Hierarchical Planning for Complex Tasks with Knowledge Graph-RAG and Symbolic Verification Cristina Cornelio et.al. 2504.04578 null
2025-04-06 Planning Safety Trajectories with Dual-Phase, Physics-Informed, and Transportation Knowledge-Driven Large Language Models Rui Gan et.al. 2504.04562 link
2025-04-06 Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning Xuerui Su et.al. 2504.04524 link
2025-04-06 Geo-OLM: Enabling Sustainable Earth Observation Studies with Cost-Efficient Open Language Models & State-Driven Workflows Dimitrios Stamoulis et.al. 2504.04319 null
2025-04-04 Language Models Are Implicitly Continuous Samuele Marro et.al. 2504.03933 link
2025-04-04 Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition Rishi Hazra et.al. 2504.03930 null
2025-04-07 MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models Wulin Xie et.al. 2504.03641 null
2025-04-04 Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) Jing Bi et.al. 2504.03151 null
2025-04-04 LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph Tu Ao et.al. 2504.03137 null
2025-04-25 Generative Evaluation of Complex Reasoning in Large Language Models Haowei Lin et.al. 2504.02810 link
2025-04-10 Affordable AI Assistants with Knowledge Graph of Thoughts Maciej Besta et.al. 2504.02670 null
2025-04-03 LexPam: Legal Procedure Awareness-Guided Mathematical Reasoning Kepu Zhang et.al. 2504.02590 null
2025-04-03 AnesBench: Multi-Dimensional Evaluation of LLM Reasoning in Anesthesiology Xiang Feng et.al. 2504.02404 link
2025-04-02 A Survey of Scaling in Large Language Model Reasoning Zihan Chen et.al. 2504.02181 null
2025-04-02 Exploring LLM Reasoning Through Controlled Prompt Variations Giannis Chatziveroglou et.al. 2504.02111 link
2025-04-02 Advancing AI-Scientist Understanding: Making LLM Think Like a Physicist with Interpretable Reasoning Yinggan Xu et.al. 2504.01911 null
2025-04-02 TransientTables: Evaluating LLMs' Reasoning on Temporally Evolving Semi-structured Tables Abhilash Shankarampeta et.al. 2504.01879 null
2025-04-02 Cross-Lingual Consistency: A Novel Inference Framework for Advancing Reasoning in Large Language Models Zhiwei Yu et.al. 2504.01857 null
2025-04-03 GTR: Graph-Table-RAG for Cross-Table Question Answering Jiaru Zou et.al. 2504.01346 null
2025-04-01 When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning Nishad Singhi et.al. 2504.01005 null
2025-04-01 How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study Yunjie Ji et.al. 2504.00829 null
2025-04-02 FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning Jie Ma et.al. 2504.00487 link
2025-04-01 Agentic Multimodal AI for Hyperpersonalized B2B and B2C Advertising in Competitive Markets: An AI-Driven Competitive Advertising Framework Sakhinana Sagar Srinivas et.al. 2504.00338 null
2025-03-31 Do Large Language Models Exhibit Spontaneous Rational Deception? Samuel M. Taylor et.al. 2504.00285 null
2025-03-31 SVLA: A Unified Speech-Vision-Language Assistant with Multimodal Reasoning and Speech Generation Ngoc Dung Huynh et.al. 2503.24164 null
2025-03-31 Boosting MLLM Reasoning with Text-Debiased Hint-GRPO Qihan Huang et.al. 2503.23905 null
2025-03-31 WinoWhat: A Parallel Corpus of Paraphrased WinoGrande Sentences with Common Sense Categorization Ine Gevers et.al. 2503.23779 null
2025-03-30 Evolutionary Prompt Optimization Discovers Emergent Multimodal Reasoning Strategies in Vision-Language Models Sid Bharthulwar et.al. 2503.23503 null
2025-03-29 The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction Yihuai Hong et.al. 2503.23084 null
2025-04-03 Cognitive Prompts Using Guilford's Structure of Intellect Model Oliver Kramer et.al. 2503.22036 null
2025-03-27 SWI: Speaking with Intent in Large Language Models Yuwei Yin et.al. 2503.21544 link
2025-03-27 Cultivating Game Sense for Yourself: Making VLMs Gaming Experts Wenxuan Lu et.al. 2503.21263 null
2025-03-27 Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning Huajie Tan et.al. 2503.20752 null
2025-03-26 Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging Han Wu et.al. 2503.20641 link
2025-03-25 Gemini Robotics: Bringing AI into the Physical World Gemini Robotics Team et.al. 2503.20020 null
2025-03-25 VisualQuest: A Diverse Image Dataset for Evaluating Visual Recognition in LLMs Kelaiti Xiao et.al. 2503.19936 null
2025-04-06 A Multi-Agent Framework Integrating Large Language Models and Generative AI for Accelerated Metamaterial Design Jie Tian et.al. 2503.19889 null
2025-03-25 Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking Xiaoyu Tian et.al. 2503.19855 null
2025-03-24 Training-Free Personalization via Retrieval and Reasoning on Fingerprints Deepayan Das et.al. 2503.18623 null
2025-03-23 Mind with Eyes: from Language Reasoning to Multimodal Reasoning Zhiyu Lin et.al. 2503.18071 null
2025-04-19 Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning Chenyu Zhang et.al. 2503.17987 null
2025-03-23 MedPlan:A Two-Stage RAG-Based System for Personalized Medical Plan Generation Hsin-Ling Hsu et.al. 2503.17900 null
2025-03-22 A Modular Dataset to Demonstrate LLM Abstraction Capability Adam Atanas et.al. 2503.17645 null
2025-03-22 ConSol: Sequential Probability Ratio Testing to Find Consistent LLM Reasoning Paths Efficiently Jaeyeon Lee et.al. 2503.17587 link
2025-03-21 LEMMA: Learning from Errors for MatheMatical Advancement in LLMs Zhuoshi Pan et.al. 2503.17439 link
2025-03-21 V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms Javier J. Poveda Rodrigo et.al. 2503.17422 null
2025-03-21 Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique Yansi Li et.al. 2503.17363 null
2025-03-21 OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Yihe Deng et.al. 2503.17352 link
2025-03-21 LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language Kun Chu et.al. 2503.17309 link
2025-03-21 Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study Li Zhang et.al. 2503.16788 link
2025-03-20 Towards Agentic Recommender Systems in the Era of Multimodal Large Language Models Chengkai Huang et.al. 2503.16734 null
2025-03-21 MKG-Rank: Enhancing Large Language Models with Knowledge Graph for Multilingual Medical Question Answering Feiyang Li et.al. 2503.16131 null
2025-03-20 Entropy-based Exploration Conduction for Multi-step Reasoning Jinghan Zhang et.al. 2503.15848 null
2025-03-19 LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning Federico Cocchi et.al. 2503.15621 link
2025-03-19 EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models Yinan Liang et.al. 2503.15369 null
2025-04-01 Envisioning an AI-Enhanced Mental Health Ecosystem Kellie Yu Hui Sim et.al. 2503.14883 null
2025-03-19 Think Like Human Developers: Harnessing Community Knowledge for Structured Code Reasoning Chengran Yang et.al. 2503.14838 null
2025-03-18 Temporal Consistency for LLM Reasoning Process Error Identification Jiacheng Guo et.al. 2503.14495 link
2025-03-21 Bridging Social Psychology and LLM Reasoning: Conflict-Aware Meta-Review Generation via Cognitive Alignment Wei Chen et.al. 2503.13879 null
2025-03-18 Empowering GraphRAG with Knowledge Filtering and Integration Kai Guo et.al. 2503.13804 null
2025-03-15 Cognitive Activation and Chaotic Dynamics in Large Language Models: A Quasi-Lyapunov Analysis of Reasoning Mechanisms Xiaojian Li et.al. 2503.13530 null
2025-03-14 RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration Hong Qing Yu et.al. 2503.13514 null
2025-03-17 A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives Weiqiang Jin et.al. 2503.13415 null
2025-03-17 MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research James Burgess et.al. 2503.13399 link
2025-03-17 Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning Hai-Long Sun et.al. 2503.13360 null
2025-03-17 Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning Junming Liu et.al. 2503.12972 null
2025-03-17 R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization Jingyi Zhang et.al. 2503.12937 link
2025-03-28 Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation Songjun Tu et.al. 2503.12854 link
2025-03-18 DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Xinyu Ma et.al. 2503.12797 link
2025-03-16 MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification Zhaopan Xu et.al. 2503.12505 null
2025-03-31 Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition Xiaoying Zhang et.al. 2503.12303 link
2025-03-20 Applications of Large Language Model Reasoning in Feature Generation Dharani Chandra et.al. 2503.11989 null
2025-03-14 Neutralizing Bias in LLM Reasoning using Entailment Graphs Liang Cheng et.al. 2503.11614 link
2025-03-14 VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity Jing Bi et.al. 2503.11557 null
2025-03-14 RESPONSE: Benchmarking the Ability of Language Models to Undertake Commonsense Reasoning in Crisis Situation Aissatou Diallo et.al. 2503.11348 null
2025-03-13 Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data Paul Quinlan et.al. 2503.10883 null
2025-03-18 R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization Yi Yang et.al. 2503.10615 link
2025-03-15 VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search Yiming Jia et.al. 2503.10582 null
2025-03-13 VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Weiyun Wang et.al. 2503.10291 null
2025-03-18 "Well, Keep Thinking": Enhancing LLM Reasoning with Adaptive Injection Decoding Hyunbin Jin et.al. 2503.10167 null
2025-03-13 How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game Ziyue Wang et.al. 2503.10042 link
2025-04-08 Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Bowen Jin et.al. 2503.09516 link
2025-03-12 MindGYM: Enhancing Vision-Language Models via Synthetic Self-Challenging Questions Zhe Xu et.al. 2503.09499 link
2025-03-12 A Survey on Enhancing Causal Reasoning Ability of Large Language Models Xin Li et.al. 2503.09326 null
2025-03-11 Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework Zhuo Zhi et.al. 2503.08308 null
2025-03-11 FASIONAD++ : Integrating High-Level Instruction and Information Bottleneck in FAt-Slow fusION Systems for Enhanced Safety in Autonomous Driving with Adaptive Feedback Kangan Qian et.al. 2503.08162 null
2025-03-05 An Optimization Algorithm for Multimodal Data Alignment Wei Zhang et.al. 2503.07636 null
2025-03-11 LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Yingzhe Peng et.al. 2503.07536 null
2025-03-10 MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Fanqing Meng et.al. 2503.07365 link
2025-03-10 Dynamic Path Navigation for Motion Agents with LLM Reasoning Yubo Zhao et.al. 2503.07323 null
2025-03-11 Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Wenxuan Huang et.al. 2503.06749 link
2025-03-09 Graph Retrieval-Augmented LLM for Conversational Recommendation Systems Zhangchi Qiu et.al. 2503.06430 null
2025-03-08 Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models? Kun Xiang et.al. 2503.06252 link
2025-03-15 Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning Yanjun Chen et.al. 2503.06232 null
2025-03-08 KnowLogic: A Benchmark for Commonsense Reasoning via Knowledge-Driven Data Synthesis Weidong Zhan et.al. 2503.06218 link
2025-03-07 Extracting and Emulsifying Cultural Explanation to Improve Multilingual Capability of LLMs Hamin Koo et.al. 2503.05846 null
2025-03-07 Memory-augmented Query Reconstruction for LLM-based Knowledge Graph Reasoning Mufan Xu et.al. 2503.05193 null
2025-03-07 Rewarding Curse: Analyze and Mitigate Reward Modeling Issues for LLM Reasoning Jiachun Li et.al. 2503.05188 null
2025-03-07 Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching Simon A. Aytes et.al. 2503.05179 link
2025-03-10 R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model Hengguang Zhou et.al. 2503.05132 link
2025-03-04 Learning from Failures in Multi-Attempt Reinforcement Learning Stephen Chung et.al. 2503.04808 null
2025-03-15 Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference Thanh Le-Cong et.al. 2503.04779 null
2025-03-06 Better Process Supervision with Bi-directional Rewarding Signals Wenxiang Chen et.al. 2503.04618 null
2025-04-02 SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning Chen Li et.al. 2503.04530 null
2025-03-07 Question-Aware Gaussian Experts for Audio-Visual Question Answering Hongyeob Kim et.al. 2503.04459 link
2025-03-06 Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English Runtao Zhou et.al. 2503.04099 null
2025-03-06 ReasonGraph: Visualisation of Reasoning Paths Zongqian Li et.al. 2503.03979 link
2025-03-05 Process-based Self-Rewarding Language Models Shimao Zhang et.al. 2503.03746 link
2025-03-05 COSINT-Agent: A Knowledge-Driven Multimodal Agent for Chinese Open Source Intelligence Wentao Li et.al. 2503.03215 null
2025-03-04 The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models Ke Ji et.al. 2503.02875 null
2025-03-04 Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models Zhifei Xie et.al. 2503.02318 null
2025-03-04 LLM-TabFlow: Synthetic Tabular Data Generation with Inter-column Logical Relationship Preservation Yunbo Long et.al. 2503.02161 null
2025-03-03 CorrA: Leveraging Large Language Models for Dynamic Obstacle Avoidance of Autonomous Vehicles Shanting Wang et.al. 2503.02076 null
2025-03-03 Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning Wenjie Wu et.al. 2503.01642 null
2025-03-03 Pragmatic Inference Chain (PIC) Improving LLMs' Reasoning of Authentic Implicit Toxic Language Xi Chen et.al. 2503.01539 null
2025-03-03 CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs Artem Lykov et.al. 2503.01378 null
2025-03-06 SRAG: Structured Retrieval-Augmented Generation for Multi-Entity Question Answering over Wikipedia Graph Teng Lin et.al. 2503.01346 null
2025-03-03 MINT: Multi-modal Chain of Thought in Unified Generative Models for Enhanced Image Generation Yi Wang et.al. 2503.01298 null
2025-02-28 Personalized Causal Graph Reasoning for LLMs: A Case Study on Dietary Recommendations Zhongqi Yang et.al. 2503.00134 null
2025-02-28 Contextualizing biological perturbation experiments through language Menghua Wu et.al. 2502.21290 link
2025-02-28 Rectifying Belief Space via Unlearning to Harness LLMs' Reasoning Ayana Niwa et.al. 2502.20620 null
2025-02-27 FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving Guizhen Chen et.al. 2502.20238 link
2025-02-27 Collaborative Stance Detection via Small-Large Language Model Consistency Verification Yu Yan et.al. 2502.19954 link
2025-02-27 Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models Yuan Sui et.al. 2502.19918 null
2025-02-27 Order Doesn't Matter, But Reasoning Does: Training LLMs with Order-Centric Augmentation Qianxi He et.al. 2502.19907 null
2025-03-21 Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention Weiyan Shi et.al. 2502.19877 null
2025-03-05 Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning Yanan Chen et.al. 2502.19622 null
2025-02-26 General Reasoning Requires Learning to Reason from the Get-go Seungwook Han et.al. 2502.19402 null
2025-02-26 BIG-Bench Extra Hard Mehran Kazemi et.al. 2502.19187 link
2025-02-25 Scalable Best-of-N Selection for Large Language Models via Self-Certainty Zhewei Kang et.al. 2502.18581 link
2025-02-25 SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Yuxiang Wei et.al. 2502.18449 null
2025-02-25 Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning Wenkai Yang et.al. 2502.18080 null
2025-02-21 Improving Value-based Process Verifier via Structural Prior Injection Zetian Sun et.al. 2502.17498 null
2025-02-24 Making LLMs Reason? The Intermediate Language Problem in Neurosymbolic Approaches Alexander Beiser et.al. 2502.17216 null
2025-02-24 Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI Syed Abdul Gaffar Shakhadri et.al. 2502.17092 null
2025-02-24 Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology Longchao Da et.al. 2502.17026 null
2025-02-24 All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark Davide Testa et.al. 2502.16989 null
2025-02-24 AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models Qin Zhu et.al. 2502.16906 link
2025-02-24 The Blessing of Reasoning: LLM-Based Contrastive Explanations in Black-Box Recommender Systems Yuyan Wang et.al. 2502.16759 null
2025-02-23 Reasoning about Affordances: Causal and Compositional Reasoning in LLMs Magnus F. Gjerde et.al. 2502.16606 null
2025-02-22 ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning Shulin Huang et.al. 2502.16268 null
2025-02-27 Dynamic Parallel Tree Search for Efficient LLM Reasoning Yifu Ding et.al. 2502.16235 null
2025-02-22 Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations Chunyang Li et.al. 2502.16169 link
2025-03-04 Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models Qianqi Yan et.al. 2502.16033 null
2025-02-21 MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use Zaid Khan et.al. 2502.15872 null
2025-02-21 Do Multilingual LLMs Think In English? Lisa Schut et.al. 2502.15603 null
2025-02-21 Evaluating Social Biases in LLM Reasoning Xuyang Wu et.al. 2502.15361 null
2025-02-21 Stepwise Informativeness Search for Improving LLM Reasoning Siyuan Wang et.al. 2502.15335 null
2025-02-21 Latent Factor Models Meets Instructions:Goal-conditioned Latent Factor Discovery without Task Supervision Zhouhang Xie et.al. 2502.15147 null
2025-02-19 SIFT: Grounding LLM Reasoning in Contexts via Stickers Zihao Zeng et.al. 2502.14922 link
2025-02-18 Think Inside the JSON: Reinforcement Strategy for Strict LLM Schema Adherence Bhavik Agarwal et.al. 2502.14905 null
2025-03-04 Exploring Advanced Techniques for Visual Question Answering: A Comprehensive Comparison Aiswarya Baby et.al. 2502.14827 null
2025-02-20 Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Tian Xie et.al. 2502.14768 link
2025-02-19 Enhancing LLM-Based Recommendations Through Personalized Reasoning Jiahao Liu et.al. 2502.13845 link
2025-02-19 MCTS-KBQA: Monte Carlo Tree Search for Knowledge Base Question Answering Guanming Xiong et.al. 2502.13428 null
2025-02-19 MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification Linzhuang Sun et.al. 2502.13383 link
2025-02-22 Grounding LLM Reasoning with Knowledge Graphs Alfonso Amayuelas et.al. 2502.13247 null
2025-02-18 Theorem Prover as a Judge for Synthetic Data Generation Joshua Ong Jun Leang et.al. 2502.13137 null
2025-02-18 Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options Lakshmi Nair et.al. 2502.12929 link
2025-02-18 S $^2$ R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning Ruotian Ma et.al. 2502.12853 link
2025-02-18 CutPaste&Find: Efficient Multimodal Hallucination Detector with Visual-aid Knowledge Base Cong-Duy Nguyen et.al. 2502.12591 null
2025-02-18 Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights Shubham Parashar et.al. 2502.12521 null
2025-02-18 HopRAG: Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation Hao Liu et.al. 2502.12442 null
2025-02-17 Evaluating Step-by-step Reasoning Traces: A Survey Jinu Lee et.al. 2502.12289 null
2025-02-17 SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs Yige Xu et.al. 2502.12134 link
2025-02-17 TokenSkip: Controllable Chain-of-Thought Compression in LLMs Heming Xia et.al. 2502.12067 link
2025-02-17 Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models Hyunwoo Kim et.al. 2502.11881 null
2025-02-17 Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities Hanbin Wang et.al. 2502.11829 link
2025-02-17 Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning Yuqi Pang et.al. 2502.11751 link
2025-02-17 DeFiScope: Detecting Various DeFi Price Manipulations with LLM Reasoning Juantao Zhong et.al. 2502.11521 null
2025-02-16 Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls Ante Wang et.al. 2502.11183 link
2025-02-16 LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning Tianshi Zheng et.al. 2502.11176 null
2025-02-15 A Tutorial on LLM Reasoning: Relevant Methods behind ChatGPT o1 Jun Wang et.al. 2502.10867 null
2025-02-28 USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions Hamed Rahimi et.al. 2502.10636 null
2025-02-14 Do Large Language Models Reason Causally Like Us? Even Better? Hanna M. Dettki et.al. 2502.10215 null
2025-02-14 MathConstruct: Challenging LLM Reasoning with Constructive Proofs Mislav Balunović et.al. 2502.10197 null
2025-02-13 MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Dongzhi Jiang et.al. 2502.09621 null
2025-02-14 EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges Clinton J. Wang et.al. 2502.08859 null
2025-02-11 CIRCUIT: A Benchmark for Circuit Interpretation and Reasoning Capabilities of LLMs Lejla Skelic et.al. 2502.07980 null
2025-02-05 Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment Cheryl Li et.al. 2502.07803 null
2025-02-17 Bag of Tricks for Inference-time Computation of LLM Reasoning Fan Liu et.al. 2502.07191 link
2025-02-15 Self-Supervised Prompt Optimization Jinyu Xiang et.al. 2502.06855 link
2025-02-06 Vision-Integrated LLMs for Autonomous Driving Assistance : Human Performance Comparison and Trust Evaluation Namhee Kim et.al. 2502.06843 null
2025-02-04 Policy Guided Tree Search for Enhanced LLM Reasoning Yang Li et.al. 2502.06813 null
2025-03-11 ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates Ling Yang et.al. 2502.06772 link
2025-02-10 Resurrecting saturated LLM benchmarks with adversarial encoding Igor Ivanov et.al. 2502.06738 null
2025-02-13 LawGPT: Knowledge-Guided Data Generation and Its Application to Legal LLM Zhi Zhou et.al. 2502.06572 link
2025-02-09 A Generative Framework for Bidirectional Image-Report Understanding in Chest Radiography Nicholas Evans et.al. 2502.05926 null
2025-02-08 Evaluating Vision-Language Models for Emotion Recognition Sree Bhattacharyya et.al. 2502.05660 null
2025-02-07 GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity? Yang Zhou et.al. 2502.05252 link
2025-02-07 Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures Tushar Pandey et.al. 2502.05078 link
2025-02-07 Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research Junde Wu et.al. 2502.04644 link
2025-02-05 Enhancing Reasoning to Adapt Large Language Models for Domain-Specific Applications Bo Wen et.al. 2502.04384 link
2025-02-05 Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning Jonathan Kim et.al. 2502.04381 null
2025-02-04 Investigating the Robustness of Deductive Reasoning with Large Language Models Fabian Hoppe et.al. 2502.04352 null
2025-02-04 Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search Maohao Shen et.al. 2502.02508 null
2025-02-04 CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning Jianfeng Pan et.al. 2502.02390 null
2025-02-08 Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking Jinyang Wu et.al. 2502.02339 null
2025-02-04 Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration Younan Zhu et.al. 2502.01969 null
2025-01-31 Improving Rule-based Reasoning in LLMs via Neurosymbolic Representations Varun Dhanraj et.al. 2502.01657 null
2025-02-03 Position: Empowering Time Series Reasoning with Multimodal LLMs Yaxuan Kong et.al. 2502.01477 null
2025-02-03 ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning Bill Yuchen Lin et.al. 2502.01100 null
2025-02-16 Learning Autonomous Code Integration for Math Language Models Haozhe Wang et.al. 2502.00691 null
2025-02-13 Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning Zhi Zhou et.al. 2502.00511 null
2025-02-14 Reward-Guided Speculative Decoding for Efficient LLM Reasoning Baohao Liao et.al. 2501.19324 null
2025-01-31 Efficient Reasoning with Hidden Thinking Xuan Shen et.al. 2501.19201 link
2025-01-31 BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning Han Zhong et.al. 2501.18858 null
2025-01-28 A Stochastic Dynamical Theory of LLM Self-Adversariality: Modeling Severity Drift as a Critical Process Jack David Carson et.al. 2501.16783 null
2025-01-27 Explaining GitHub Actions Failures with Large Language Models: Challenges, Insights, and Limitations Pablo Valenzuela-Toledo et.al. 2501.16495 null
2025-01-27 Large Models in Dialogue for Active Perception and Anomaly Detection Tzoulio Chamiti et.al. 2501.16300 link
2025-01-26 TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs Yuxuan Gu et.al. 2501.15674 link
2025-01-28 Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning Zeyu Gan et.al. 2501.15602 link
2025-01-26 Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework Yuhong Sun et.al. 2501.15581 null
2025-02-15 Option-ID Based Elimination For Multiple Choice Questions Zhenhao Zhu et.al. 2501.15175 link
2025-01-24 Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains Xu Chu et.al. 2501.14431 null
2025-02-12 GraphSOS: Graph Sampling and Order Selection to Help LLMs Understand Graphs Better Xu Chu et.al. 2501.14427 null
2025-01-23 Pseudocode-Injection Magic: Enabling LLMs to Tackle Graph Computational Tasks Chang Gong et.al. 2501.13731 null
2025-02-10 Cognitive Paradigms for Evaluating VLMs on Visual Reasoning Task Mohit Vaishnav et.al. 2501.13620 null
2025-01-22 EvidenceMap: Unleashing the Power of Small Language Models with Evidence Analysis for Biomedical Question Answering Chang Zong et.al. 2501.12746 null
2025-01-17 LLM Reasoner and Automated Planner: A new NPC approach Israel Puerta-Merino et.al. 2501.10106 null
2025-01-22 FRAG: A Flexible Modular Framework for Retrieval-Augmented Generation based on Knowledge Graphs Zengyi Gao et.al. 2501.09957 null
2025-01-17 Evolving Deeper LLM Thinking Kuang-Huei Lee et.al. 2501.09891 null
2025-01-23 Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Fengli Xu et.al. 2501.09686 null
2025-01-15 Multimodal LLMs Can Reason about Aesthetics in Zero-Shot Ruixiang Jiang et.al. 2501.09012 link
2025-02-10 Ensemble of Large Language Models for Curated Labeling and Rating of Free-text Data Jiaxing Qiu et.al. 2501.08413 link
2025-01-14 Reasoning with Graphs: Structuring Implicit Knowledge to Enhance LLMs Reasoning Haoyu Han et.al. 2501.07845 null
2025-01-09 Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark Yunzhuo Hao et.al. 2501.05444 link
2025-01-08 Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations Archita Srivastava et.al. 2501.04675 null
2025-01-08 DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests Charles Corbière et.al. 2501.04671 null
2025-01-08 Understanding Before Reasoning: Enhancing Chain-of-Thought with Iterative Summarization Pre-Prompting Dong-Hai Zhu et.al. 2501.04341 link
2025-01-07 Reasoning-Enhanced Self-Training for Long-Form Personalized Text Generation Alireza Salemi et.al. 2501.04167 null
2025-01-07 Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild Wanpeng Hu et.al. 2501.02964 link
2025-01-06 KG-CF: Knowledge Graph Completion with Context Filtering under the Guidance of Large Language Models Zaiyi Zheng et.al. 2501.02711 null
2025-01-04 Table as Thought: Exploring Structured Thoughts in LLM Reasoning Zhenjie Sun et.al. 2501.02152 null
2025-01-03 Recursive Decomposition of Logical Thoughts: Framework for Superior Reasoning and Knowledge Propagation in Large Language Models Kaleem Ullah Qasim et.al. 2501.02026 null
2025-01-02 Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search Shuangtao Li et.al. 2501.01478 null
2025-01-02 HetGCoT-Rec: Heterogeneous Graph-Enhanced Chain-of-Thought LLM Reasoning for Journal Recommendation Runsong Jia et.al. 2501.01203 null
2025-01-03 Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents Chengbo He et.al. 2501.00430 null
2024-12-31 EQUATOR: A Deterministic Framework for Evaluating LLM Reasoning with Open-Ended Questions. # v1.0.0-beta Raymond Bernard et.al. 2501.00257 null
2024-12-30 Efficiently Serving LLM Reasoning Programs with Certaindex Yichao Fu et.al. 2412.20993 null
2024-12-28 LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning Shuguang Chen et.al. 2412.20227 null
2025-02-17 Token-Budget-Aware LLM Reasoning Tingxu Han et.al. 2412.18547 link
2024-12-23 StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs Hailin Chen et.al. 2412.18011 null
2025-02-09 Evaluating LLM Reasoning in the Operations Research Domain with ORQA Mahdi Mostajabdaveh et.al. 2412.17874 link
2024-12-23 Diving into Self-Evolving Training for Multimodal Reasoning Wei Liu et.al. 2412.17451 null
2024-12-21 SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization Tan-Hanh Pham et.al. 2412.16771 null
2024-12-20 PruneVid: Visual Token Pruning for Efficient Video Large Language Models Xiaohu Huang et.al. 2412.16117 link
2024-12-19 Eliciting Causal Abilities in Large Language Models for Reasoning Tasks Yajing Wang et.al. 2412.15314 link
2024-12-19 Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying Federico Castagna et.al. 2412.15177 link
2024-12-19 Progressive Multimodal Reasoning via Active Retrieval Guanting Dong et.al. 2412.14835 null
2024-12-19 FiVL: A Framework for Improved Vision-Language Alignment Estelle Aflalo et.al. 2412.14672 null
2024-12-19 FaultExplainer: Leveraging Large Language Models for Interpretable Fault Detection and Diagnosis Abdullah Khan et.al. 2412.14492 link
2024-12-18 Cognition Chain for Explainable Psychological Stress Detection on Social Media Xin Wang et.al. 2412.14009 link
2024-12-27 Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence Jinghan He et.al. 2412.13949 null
2025-02-16 Do Language Models Understand Time? Xi Ding et.al. 2412.13845 link
2024-12-18 Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games Wenye Lin et.al. 2412.13602 link
2024-12-17 ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models Yuxi Sun et.al. 2412.12848 null
2024-12-12 A NotSo Simple Way to Beat Simple Bench Soham Sane et.al. 2412.12173 null
2024-12-11 What Makes In-context Learning Effective for Mathematical Reasoning: A Theoretical Analysis Jiayu Liu et.al. 2412.12157 null
2025-02-18 A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges Yibo Yan et.al. 2412.11936 null
2024-12-24 Stepwise Reasoning Error Disruption Attack of LLMs Jingyu Peng et.al. 2412.11934 null
2024-12-16 Leveraging Retrieval-Augmented Tags for Large Vision-Language Understanding in Complex Scenes Antonio Carlos Rivera et.al. 2412.11396 null
2024-12-15 SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation Hang Zhang et.al. 2412.11026 null
2024-12-15 Entropy-Regularized Process Reward Model Hanning Zhang et.al. 2412.11006 link
2024-12-14 Optimizing Vision-Language Interactions Through Decoder-Only Models Kaito Tanaka et.al. 2412.10758 null
2024-12-14 Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation Sukai Huang et.al. 2412.10675 null
2024-12-14 Thinking with Knowledge Graphs: Enhancing LLM Reasoning Through Structured Data Xue Wu et.al. 2412.10654 null
2024-12-13 EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual Editing Umar Khalid et.al. 2412.10566 null
2024-12-13 Atomic Learning Objectives Labeling: A High-Resolution Approach for Physics Education Naiming Liu et.al. 2412.09914 null
2025-01-18 Neptune: The Long Orbit to Benchmarking Long Video Understanding Arsha Nagrani et.al. 2412.09582 link
2025-02-14 Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning Zhenni Bi et.al. 2412.09078 link
2024-12-11 Training Large Language Models to Reason in a Continuous Latent Space Shibo Hao et.al. 2412.06769 link
2025-01-23 GameArena: Evaluating LLM Reasoning through Live Computer Games Lanxiang Hu et.al. 2412.06394 null
2024-12-08 Language hooks: a modular framework for augmenting LLM reasoning that decouples tool usage from the model and its prompt Damien de Mijolla et.al. 2412.05967 null
2024-12-06 MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Jarvis Guo et.al. 2412.05237 null
2024-12-05 Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Yiheng Xu et.al. 2412.04454 null
2024-12-05 SocialMind: LLM-based Proactive AR Social Assistive System with Human-like Perception for In-situ Live Interactions Bufang Yang et.al. 2412.04036 null
2024-12-04 DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation Qingdong He et.al. 2412.03255 null
2024-12-03 Explainable CTR Prediction via LLM Reasoning Xiaohan Yu et.al. 2412.02588 null
2025-02-12 NYT-Connections: A Deceptively Simple Text Classification Task that Stumps System-1 Thinkers Angel Yahir Loredo Lopez et.al. 2412.01621 null
2025-01-13 Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability Zicheng Lin et.al. 2411.19943 link
2024-11-29 TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension Zipeng Qiu et.al. 2411.19504 link
2024-11-29 COLD: Causal reasOning in cLosed Daily activities Abhinav Joshi et.al. 2411.19500 link
2024-12-16 Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning Di Zhang et.al. 2411.18203 null
2024-11-26 NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects? Jiaxuan Li et.al. 2411.17794 null
2024-11-25 Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision Zhiheng Xi et.al. 2411.16579 null
2024-11-22 On the Impact of Fine-Tuning on Chain-of-Thought Reasoning Elita Lobo et.al. 2411.15382 null
2024-11-21 Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Yuhao Dong et.al. 2411.14432 link
2024-11-20 BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games Davide Paglieri et.al. 2411.13543 null
2024-11-20 Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving Hao Zhou et.al. 2411.13076 null
2024-11-15 Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination Haojie Zheng et.al. 2411.12591 link
2024-12-23 Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus Terufumi Morishita et.al. 2411.12498 link
2024-11-18 Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation Mingchao Qi et.al. 2411.11714 link
2024-12-31 Enhancing LLM Reasoning with Reward-guided Tree Search Jinhao Jiang et.al. 2411.11694 null
2024-12-15 A dataset of questions on decision-theoretic reasoning in Newcomb-like problems Caspar Oesterheld et.al. 2411.10588 link
2024-11-15 Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Weiyun Wang et.al. 2411.10442 null
2025-01-09 LLaVA-CoT: Let Vision Language Models Reason Step-by-Step Guowei Xu et.al. 2411.10440 link
2024-11-15 Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level Andong Deng et.al. 2411.09921 null
2024-11-14 Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering Nghia Trung Ngo et.al. 2411.09213 null
2024-11-13 Tree-of-Table: Unleashing the Power of LLMs for Enhanced Large-Scale Table Understanding Deyi Ji et.al. 2411.08516 null
2024-11-18 What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? Katie Kang et.al. 2411.07681 link
2024-11-27 Self-Training Meets Consistency: Improving LLMs' Reasoning With Consistency-Driven Rationale Evaluation Jaehyeok Lee et.al. 2411.06387 link
2024-11-09 A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization Haoxin Liu et.al. 2411.06018 null
2024-11-11 LLMs as Method Actors: A Model for Prompt Engineering and Architecture Colin Doyle et.al. 2411.05778 link
2024-11-12 Kwai-STaR: Transform LLMs into State-Transition Reasoners Xingyu Lu et.al. 2411.04799 null
2024-11-21 Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding Haolin Chen et.al. 2411.04282 link
2024-11-05 CrowdGenUI: Enhancing LLM-Based UI Widget Generation with a Crowdsourced Preference Library Yimeng Liu et.al. 2411.03477 null
2025-01-27 MetRex: A Benchmark for Verilog Code Metric Reasoning Using LLMs Manar Abdelatty et.al. 2411.03471 link
2024-11-04 RuAG: Learned-rule-augmented Generation for Large Language Models Yudi Zhang et.al. 2411.03349 null
2024-10-30 Vision-Language Models Can Self-Improve Reasoning via Reflection Kanzhi Cheng et.al. 2411.00855 null
2024-11-01 Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling Yiwen Ding et.al. 2411.00750 link
2024-11-01 STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing Jiaru Zou et.al. 2411.00387 null
2024-11-08 GRS-QA -- Graph Reasoning-Structured Question Answering Dataset Anish Pahilajani et.al. 2411.00369 null
2024-10-31 Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning Jinghan Zhang et.al. 2410.24155 null
2024-10-31 RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner Fu-Chieh Chang et.al. 2410.23912 null
2024-10-31 OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models Junda Wu et.al. 2410.23703 null
2024-10-30 ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning Millennium Bismay et.al. 2410.23180 link
2024-10-30 On Memorization of Large Language Models in Logical Reasoning Chulin Xie et.al. 2410.23123 null
2024-10-28 Causal Interventions on Causal Paths: Mapping GPT-2's Reasoning From Syntax to Semantics Isabelle Lee et.al. 2410.21353 null
2024-10-28 Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments Sangmim Song et.al. 2410.20666 null
2024-10-25 Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models Danqing Wang et.al. 2410.20007 null
2024-10-25 Can Stories Help LLMs Reason? Curating Information Space Through Narrative Vahid Sadiri Javadi et.al. 2410.19221 null
2024-10-18 Make LLMs better zero-shot reasoners: Structure-orientated autonomous reasoning Pengfei He et.al. 2410.19000 link
2024-10-25 CLR-Bench: Evaluating Large Language Models in College-level Reasoning Junnan Dong et.al. 2410.17558 null
2024-10-28 Non-myopic Generation of Language Models for Reasoning and Planning Chang Ma et.al. 2410.17195 link
2024-11-06 Improving Causal Reasoning in Large Language Models: A Survey Longxuan Yu et.al. 2410.16676 link
2024-10-22 A Statistical Analysis of LLMs' Self-Evaluation Using Proverbs Ryosuke Sonoda et.al. 2410.16640 null
2024-10-21 Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models' Reasoning with Formal Logic Jason Chan et.al. 2410.16502 null
2024-11-27 On Designing Effective RL Reward at Training Time for LLM Reasoning Jiaxuan Gao et.al. 2410.15115 null
2025-01-28 Paths-over-Graph: Knowledge Graph Empowered Large Language Model Reasoning Xingyu Tan et.al. 2410.14211 null
2024-10-21 Unconstrained Model Merging for Enhanced LLM Reasoning Yiming Zhang et.al. 2410.13699 null
2024-10-16 Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models Linhao Luo et.al. 2410.13080 link
2024-10-16 KcMF: A Knowledge-compliant Framework for Schema and Entity Matching with Fine-tuning-free LLMs Yongqin Xu et.al. 2410.12480 null
2024-10-17 Enhancing LLM Trading Performance with Fact-Subjectivity Aware Reasoning Qian Wang et.al. 2410.12464 link
2024-10-16 Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up Jiahao Yuan et.al. 2410.12323 link
2024-10-16 Exploiting LLMs' Reasoning Capability to Infer Implicit Concepts in Legal Information Retrieval Hai-Long Nguyen et.al. 2410.12154 null
2024-10-15 Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming Yilun Hao et.al. 2410.12112 null
2024-10-12 OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models Jun Wang et.al. 2410.09671 null
2024-10-11 P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains Simeng Han et.al. 2410.09207 null
2024-10-11 Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning Yunpeng Gao et.al. 2410.08500 null
2024-10-10 SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation Hang Yin et.al. 2410.08189 null
2024-10-10 Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning Amrith Setlur et.al. 2410.08146 null
2024-10-10 Automatic Curriculum Expert Iteration for Reliable LLM Reasoning Zirui Zhao et.al. 2410.07627 link
2024-10-09 Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis Ahmed Abdullah et.al. 2410.06841 null
2024-10-09 Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning Xiyao Wang et.al. 2410.06508 null
2025-01-02 Filtering Discomforting Recommendations with Large Language Models Jiahao Liu et.al. 2410.05411 null
2024-10-05 Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification Zhenwen Liang et.al. 2410.05318 null
2024-10-06 Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval Pengcheng Jiang et.al. 2410.04585 link
2024-10-03 The Role of Deductive and Inductive Reasoning in Large Language Models Chengkun Cai et.al. 2410.02892 null
2024-10-02 Not All LLM Reasoners Are Created Equal Arian Hosseini et.al. 2410.01748 null
2024-12-25 Interpretable Contrastive Monte Carlo Tree Search Reasoning Zitian Gao et.al. 2410.01707 link
2024-10-02 VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment Amirhossein Kazemnejad et.al. 2410.01679 link
2024-10-02 AHP-Powered LLM Reasoning for Multi-Criteria Evaluation of Open-Ended Responses Xiaotian Lu et.al. 2410.01246 null
2024-10-01 Self-controller: Controlling LLMs with Multi-round Step-by-step Self-awareness Xiao Peng et.al. 2410.00359 null
2024-10-01 Insight: A Multi-Modal Diagnostic Pipeline using LLMs for Ocular Surface Disease Diagnosis Chun-Hsiao Yeh et.al. 2410.00292 null
2024-10-08 GUNDAM: Aligning Large Language Models with Graph Understanding Sheng Ouyang et.al. 2409.20053 null
2024-09-27 Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs Yanyuan Qiao et.al. 2409.18794 null
2024-10-23 Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning Debargha Ganguly et.al. 2409.17270 null
2024-09-20 CSCE: Boosting LLM Reasoning by Simultaneous Enhancing of Casual Significance and Consistency Kangsheng Wang et.al. 2409.17174 null
2024-09-20 Mufu: Multilingual Fused Learning for Low-Resource Translation with LLM Zheng Wei Lim et.al. 2409.13949 null
2024-09-19 SituationAdapt: Contextual UI Optimization in Mixed Reality with Situation Awareness via LLM Reasoning Zhipeng Li et.al. 2409.12836 null
2024-10-04 Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning Jiaxin Wen et.al. 2409.12452 link
2024-12-16 Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data Jiaming Zhou et.al. 2409.12437 link
2024-09-18 MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning Justin Chih-Yao Chen et.al. 2409.12147 link
2024-11-05 Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent Fatemeh Haji et.al. 2409.11527 link
2024-09-16 Enhancing RL Safety with Counterfactual LLM Reasoning Dennis Gross et.al. 2409.10188 link
2024-09-11 Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation SeongYeub Chu et.al. 2409.07355 link

(back to top)

LLM Evaluation

Publish Date Title Authors PDF Code
2025-07-22 Mind the Gap: Evaluating the Representativeness of Quantitative Medical Language Reasoning LLM Benchmarks for African Disease Burdens Fred Mutisya et.al. 2507.16322 null
2025-07-18 Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark Goeric Huybrechts et.al. 2507.15882 null
2025-07-21 Left Leaning Models: AI Assumptions on Economic Policy Maxim Chupilkin et.al. 2507.15771 null
2025-07-21 From Queries to Criteria: Understanding How Astronomers Evaluate LLMs Alina Hyk et.al. 2507.15715 null
2025-07-21 Evaluating Text Style Transfer: A Nine-Language Benchmark for Text Detoxification Vitaly Protasov et.al. 2507.15557 null
2025-07-15 LLM-based ambiguity detection in natural language instructions for collaborative surgical robots Ana Davila et.al. 2507.11525 null
2025-07-15 DCR: Quantifying Data Contamination in LLMs Evaluation Cheng Xu et.al. 2507.11405 null
2025-07-17 SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks Pavel Adamenko et.al. 2507.11059 null
2025-07-11 OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique Wasi Uddin Ahmad et.al. 2507.09075 null
2025-07-18 From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation Seokhee Hong et.al. 2507.08924 null
2025-07-11 A Third Paradigm for LLM Evaluation: Dialogue Game-Based Evaluation using clembench David Schlangen et.al. 2507.08491 null
2025-07-07 Train-before-Test Harmonizes Language Model Rankings Guanhua Zhang et.al. 2507.05195 null
2025-07-13 SymbolicThought: Integrating Language Models and Symbolic Reasoning for Consistent and Interpretable Human Relationship Understanding Runcong Zhao et.al. 2507.04189 null
2025-07-09 Skewed Score: A statistical framework to assess autograders Magda Dubois et.al. 2507.03772 null
2025-07-12 Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages Samridhi Raj Sinha et.al. 2507.01853 null
2025-07-01 Pitfalls of Evaluating Language Models with Open Benchmarks Md. Najib Hasan et.al. 2507.00460 null
2025-06-30 AutoEvoEval: An Automated Framework for Evolving Close-Ended LLM Evaluation Data JiaRu Wu et.al. 2506.23735 null
2025-06-27 WildSpeech-Bench: Benchmarking Audio LLMs in Natural Speech Conversation Jian Zhang et.al. 2506.21875 null
2025-06-25 DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs Ruokai Yin et.al. 2506.20194 null
2025-06-23 Smart-LLaMA-DPO: Reinforced Large Language Model for Explainable Smart Contract Vulnerability Detection Lei Yu et.al. 2506.18245 null
2025-06-22 The Democratic Paradox in Large Language Models' Underestimation of Press Freedom I. Loaiza et.al. 2506.18045 null
2025-06-21 CodeMorph: Mitigating Data Leakage in Large Language Model Assessment Hongzhou Rao et.al. 2506.17627 null
2025-06-20 Re-Evaluating Code LLM Benchmarks Under Semantic Mutation Zhiyuan Pan et.al. 2506.17369 null
2025-06-19 LMR-BENCH: Evaluating LLM Agent's Ability on Reproducing Language Modeling Research Shuo Yan et.al. 2506.17335 null
2025-06-20 Do We Need Large VLMs for Spotting Soccer Actions? Ritabrata Chakraborty et.al. 2506.17144 null
2025-06-17 SFT-GO: Supervised Fine-Tuning with Group Optimization for Large Language Models Gyuhak Kim et.al. 2506.15021 null
2025-06-19 MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation Xueqing Peng et.al. 2506.14028 null
2025-06-18 The NordDRG AI Benchmark for Large Language Models Tapio Pitkäranta et.al. 2506.13790 link
2025-06-20 Domain Specific Benchmarks for Evaluating Multimodal Large Language Models Khizar Anjum et.al. 2506.12958 null
2025-06-06 The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs Songyang Liu et.al. 2506.11094 null
2025-05-22 NSW-EPNews: A News-Augmented Benchmark for Electricity Price Forecasting with LLMs Zhaoge Bi et.al. 2506.11050 null
2025-04-23 Impact of Comments on LLM Comprehension of Legacy Code Rock Sabetto et.al. 2506.11007 null
2025-06-12 LLM-Driven Personalized Answer Generation and Evaluation Mohammadreza Molavi et.al. 2506.10829 null
2025-06-11 Textual Bayes: Quantifying Uncertainty in LLM-Based Systems Brendan Leigh Ross et.al. 2506.10060 null
2025-06-16 Metritocracy: Representative Metrics for Lite Benchmarks Ariel Procaccia et.al. 2506.09813 null
2025-06-10 Breaking the ICE: Exploring promises and challenges of benchmarks for Inference Carbon & Energy estimation for LLMs Samarth Sikand et.al. 2506.08727 null
2025-06-10 Sample Efficient Demonstration Selection for In-Context Learning Kiran Purohit et.al. 2506.08607 link
2025-06-09 How Benchmark Prediction from Fewer Data Misses the Mark Guanhua Zhang et.al. 2506.07673 link
2025-06-09 Beyond Benchmarks: A Novel Framework for Domain-Specific LLM Evaluation and Knowledge Mapping Nitin Sharma et.al. 2506.07658 null
2025-06-09 Vuyko Mistral: Adapting LLMs for Low-Resource Dialectal Translation Roman Kyslyi et.al. 2506.07617 null
2025-06-05 LLM-First Search: Self-Guided Exploration of the Solution Space Nathan Herr et.al. 2506.05213 link
2025-06-05 Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation Noy Sternlicht et.al. 2506.05062 link
2025-06-04 BEAR: BGP Event Analysis and Reporting Hanqing Li et.al. 2506.04514 link
2025-06-04 N $^2$ : A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion Caleb Chin et.al. 2506.04166 link
2025-06-04 Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis Kejian Zhu et.al. 2506.04142 null
2025-06-03 NetPress: Dynamically Generated LLM Benchmarks for Network Applications Yajie Zhou et.al. 2506.03231 link
2025-06-04 PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs Ze Yu Zhang et.al. 2506.02965 null
2025-06-02 Multilingual Definition Modeling Edison Marrese-Taylor et.al. 2506.01489 null
2025-06-01 Taming LLMs by Scaling Learning Rates with Gradient Grouping Siyuan Li et.al. 2506.01049 null
2025-06-06 Data Swarms: Optimizable Generation of Synthetic Evaluation Data Shangbin Feng et.al. 2506.00741 null
2025-05-31 AgentAuditor: Human-Level Safety and Security Evaluation for LLM Agents Hanjun Luo et.al. 2506.00641 null
2025-05-31 BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation Eunsu Kim et.al. 2506.00482 null
2025-05-30 MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs Gabrielle Kaili-May Liu et.al. 2505.24858 link
2025-05-30 Benchmarking Large Language Models for Cryptanalysis and Mismatched-Generalization Utsav Maskey et.al. 2505.24621 null
2025-05-30 Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation Naila Shafirni Hidayat et.al. 2505.24263 link
2025-05-29 Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs Yinong Oliver Wang et.al. 2505.23996 null
2025-05-29 Revisiting Uncertainty Estimation and Calibration of Large Language Models Linwei Tao et.al. 2505.23854 null
2025-05-28 Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective Qingchuan Ma et.al. 2505.23833 link
2025-06-24 MemAscend: System Memory Optimization for SSD-Offloaded LLM Fine-Tuning Yong-Cheng Liaw et.al. 2505.23254 null
2025-07-03 Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding Chengyue Wu et.al. 2505.22618 null
2025-05-29 Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition Hanting Chen et.al. 2505.22375 null
2025-05-28 ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments Gili Lior et.al. 2505.22169 null
2025-05-28 Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate Ashim Gupta et.al. 2505.21999 null
2025-05-21 SIMCOPILOT: Evaluating Large Language Models for Copilot-Style Code Generation Mingchao Jiang et.al. 2505.21514 null
2025-05-26 Dynamically Learned Test-Time Model Routing in Language Model Zoos with Service Level Guarantees Herbert Woisetschläger et.al. 2505.19947 null
2025-05-26 BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs Guilong Lu et.al. 2505.19457 link
2025-05-25 Likert or Not: LLM Absolute Relevance Judgments on Fine-Grained Ordinal Scales Charles Godfrey et.al. 2505.19334 null
2025-05-25 Can Large Language Models Infer Causal Relationships from Real-World Text? Ryan Saklad et.al. 2505.18931 null
2025-05-24 MedScore: Factuality Evaluation of Free-Form Medical Answers Heyuan Huang et.al. 2505.18452 link
2025-05-23 How Can I Publish My LLM Benchmark Without Giving the True Answers Away? Takashi Ishida et.al. 2505.18102 null
2025-05-23 ELSPR: Evaluator LLM Training Data Self-Purification on Non-Transitive Preferences via Tournament Graph Reconstruction Yan Yu et.al. 2505.17691 null
2025-05-22 CaseReportBench: An LLM Benchmark Dataset for Dense Information Extraction in Clinical Case Reports Xiao Yu Cindy Zhang et.al. 2505.17265 null
2025-05-21 NEXT-EVAL: Next Evaluation of Traditional and LLM Web Data Record Extraction Soyeon Kim et.al. 2505.17125 null
2025-05-21 Any Large Language Model Can Be a Reliable Judge: Debiasing with a Reasoning-based Bias Detector Haoyan Yang et.al. 2505.17100 null
2025-05-22 AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios Yunjia Qi et.al. 2505.16944 link
2025-05-22 CASTILLO: Characterizing Response Length Distributions of Large Language Models Daniel F. Perez-Ramirez et.al. 2505.16881 link
2025-05-21 Reverse Engineering Human Preferences with Reinforcement Learning Lisa Alazraki et.al. 2505.15795 null
2025-05-21 An Empirical Study of the Anchoring Effect in LLMs: Existence, Mechanism, and Potential Mitigations Yiming Huang et.al. 2505.15392 null
2025-05-21 Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory Hongli Zhou et.al. 2505.15055 link
2025-05-20 FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain Rohan Deb et.al. 2505.14826 null
2025-05-20 Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding? Bo Feng et.al. 2505.14321 null
2025-05-29 YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering Jennifer D'Souza et.al. 2505.14279 null
2025-05-20 Think-J: Learning to Think for Generative LLM-as-a-Judge Hui Huang et.al. 2505.14268 link
2025-05-19 4Hammer: a board-game reinforcement learning environment for the hour long time frame Massimo Fioravanti et.al. 2505.13638 link
2025-05-18 KG-QAGen: A Knowledge-Graph-Based Framework for Systematic Question Generation and Long-Context LLM Evaluation Nikita Tatarinov et.al. 2505.12495 link
2025-05-17 Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation Vincent Koc et.al. 2505.12058 link
2025-05-21 Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization Ximing Dong et.al. 2505.10736 null
2025-05-13 A suite of LMs comprehend puzzle statements as well as humans Adele E Goldberg et.al. 2505.08996 null
2025-05-13 Towards Contamination Resistant Benchmarks Rahmatullah Musawi et.al. 2505.08389 null
2025-05-12 A Case Study Investigating the Role of Generative AI in Quality Evaluations of Epics in Agile Software Development Werner Geyer et.al. 2505.07664 null
2025-05-09 LLMs Get Lost In Multi-Turn Conversation Philippe Laban et.al. 2505.06120 link
2025-05-15 Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information Joshua Harris et.al. 2505.06046 null
2025-05-02 Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs Ganghua Wang et.al. 2505.03814 null
2025-05-29 am-ELO: A Stable Framework for Arena-based LLM Evaluation Zirui Liu et.al. 2505.03475 null
2025-05-05 Developing A Framework to Support Human Evaluation of Bias in Generated Free Response Text Jennifer Healey et.al. 2505.03053 null
2025-05-01 Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation Vaidehi Patil et.al. 2505.01456 link
2025-04-30 A Report on the llms evaluating the high school questions Zhu Jiawei et.al. 2505.00057 null
2025-04-30 RDF-Based Structured Quality Assessment Representation of Multilingual LLM Evaluations Jonas Gwozdz et.al. 2504.21605 null
2025-04-30 Confidence in Large Language Model Evaluation: A Bayesian Approach to Limited-Sample Challenges Xiao Xiao et.al. 2504.21303 null
2025-04-27 LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations Laura Dietz et.al. 2504.19076 null
2025-04-23 Agree to Disagree? A Meta-Evaluation of LLM Misgendering Arjun Subramonian et.al. 2504.17075 link
2025-04-23 IberBench: LLM Evaluation on Iberian Languages José Ángel González et.al. 2504.16921 null
2025-04-23 Private Federated Learning using Preference-Optimized Synthetic Data Charlie Hou et.al. 2504.16438 link
2025-04-29 Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark Jasper Götting et.al. 2504.16137 null
2025-05-16 DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain Enhao Huang et.al. 2504.16116 null
2025-04-22 Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach Ruizhe Li et.al. 2504.15784 null
2025-04-20 Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey Ahsan Bilal et.al. 2504.14520 null
2025-04-20 Information Diffusion and Preferential Attachment in a Network of Large Language Models Adit Jain et.al. 2504.14438 null
2025-04-18 MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks Jaime Raldua Veuthey et.al. 2504.14039 null
2025-04-17 ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition Haidar Khan et.al. 2504.12562 link
2025-04-17 ELAB: Extensive LLM Alignment Benchmark in Persian Language Zahra Pourbahman et.al. 2504.12553 null
2025-04-16 MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models Hang Yuan et.al. 2504.12234 null
2025-04-17 Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation Julia Kreutzer et.al. 2504.11829 null
2025-04-14 HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving Avinash Kumar et.al. 2504.10724 null
2025-05-19 Large Language Models Could Be Rote Learners Yuyang Xu et.al. 2504.08300 null
2025-05-30 DeepSeek-R1 vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization? Daniil Larionov et.al. 2504.08120 null
2025-05-15 Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric Yixin Cao et.al. 2504.07440 link
2025-06-20 TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models Sher Badshah et.al. 2504.07385 null
2025-04-08 NativQA Framework: Enabling LLMs with Native, Local, and Everyday Knowledge Firoj Alam et.al. 2504.05995 null
2025-04-09 How Accurately Do Large Language Models Understand Code? Sabaat Haroon et.al. 2504.04372 null
2025-04-04 Do LLM Evaluators Prefer Themselves for a Reason? Wei-Lin Chen et.al. 2504.03846 link
2025-04-15 Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning Kai Ye et.al. 2504.03784 null
2025-04-04 Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency Erik Johannes Husom et.al. 2504.03360 null
2025-04-02 YourBench: Easy Custom Evaluation Sets for Everyone Sumuk Shashidhar et.al. 2504.01833 link
2025-04-08 Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems? Kai Yan et.al. 2504.00509 null
2025-04-01 HRET: A Self-Evolving LLM Evaluation Toolkit for Korean Hanwool Lee et.al. 2503.22968 null
2025-03-27 CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers? Jiefu Ou et.al. 2503.21717 link
2025-03-27 Evaluating book summaries from internal knowledge in Large Language Models: a cross-model and semantic consistency approach Javier Coronado-Blázquez et.al. 2503.21613 null
2025-05-19 Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models Haoxiang Sun et.al. 2503.21380 link
2025-03-25 FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models Dahyun Jung et.al. 2503.19540 link
2025-05-30 LLM Benchmarking with LLaMA2: Evaluating Code Development Performance Across Multiple Programming Languages Patrick Diehl et.al. 2503.19217 null
2025-03-28 Overtrained Language Models Are Harder to Fine-Tune Jacob Mitchell Springer et.al. 2503.19206 null
2025-03-25 Decorum: A Language-Based Approach For Style-Conditioned Synthesis of Indoor 3D Scenes Kelly O. Marshall et.al. 2503.18155 null
2025-05-14 Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark Zheqing Li et.al. 2503.17599 null
2025-03-20 The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination Yifan Sun et.al. 2503.16402 link
2025-03-20 Fùxì: A Benchmark for Evaluating Language Models on Ancient Chinese Text Understanding and Generation Shangqing Zhao et.al. 2503.15837 link
2025-06-08 Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering Francesco Maria Molfese et.al. 2503.14996 null
2025-03-13 It is Too Many Options: Pitfalls of Multiple-Choice Questions in Generative AI and Medical Education Shrutika Singh et.al. 2503.13508 null
2025-03-17 REPA: Russian Error Types Annotation for Evaluating Text Generation and Judgment Capabilities Alexander Pugachev et.al. 2503.13102 null
2025-03-14 V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning Zixu Cheng et.al. 2503.11495 null
2025-06-03 OASST-ETC Dataset: Alignment Signals from Eye-tracking Analysis of LLM Responses Angela Lopez-Cardona et.al. 2503.10927 link
2025-03-13 Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data Paul Quinlan et.al. 2503.10883 null
2025-03-13 Commenting Higher-level Code Unit: Full Code, Reduced Code, or Hierarchical Code Summarization Weisong Sun et.al. 2503.10737 null
2025-03-12 Medical Large Language Model Benchmarks Should Prioritize Construct Validity Ahmed Alaa et.al. 2503.10694 null
2025-04-17 ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition Hisham A. Alyahya et.al. 2503.10673 link
2025-05-20 RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs Zhongzhan Huang et.al. 2503.10657 link
2025-05-26 MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation Weihao Xuan et.al. 2503.10497 null
2025-03-12 Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts Hongyu Chen et.al. 2503.09347 null
2025-03-08 SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant? Xudong Lu et.al. 2503.06029 null
2025-03-07 SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs Samir Abdaljalil et.al. 2503.05980 null
2025-03-07 RocketEval: Efficient Automated LLM Evaluation via Grading Checklist Tianjun Wei et.al. 2503.05142 link
2025-02-09 Peeking Behind Closed Doors: Risks of LLM Evaluation by Private Data Curators Hritik Bansal et.al. 2503.04756 null
2025-03-07 Human Implicit Preference-Based Policy Fine-tuning for Multi-Agent Reinforcement Learning in USV Swarm Hyeonjun Kim et.al. 2503.03796 null
2025-03-04 SAGE: Steering and Refining Dialog Generation with State-Action Augmentation Yizhe Zhang et.al. 2503.03040 link
2025-05-28 Position: Don't Use the CLT in LLM Evals With Fewer Than a Few Hundred Datapoints Sam Bowyer et.al. 2503.01747 null
2025-03-04 DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation Eliya Habba et.al. 2503.01622 null
2025-03-03 None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering Zhi Rui Tam et.al. 2503.01550 null
2025-03-03 SwiLTra-Bench: The Swiss Legal Translation Benchmark Joel Niklaus et.al. 2503.01372 null
2025-03-03 LLM-Advisor: An LLM Benchmark for Cost-efficient Path Planning across Multiple Terrains Ling Xiao et.al. 2503.01236 null
2025-03-02 FunBench: Benchmarking Fundus Reading Skills of MLLMs Qijie Wei et.al. 2503.00901 null
2025-03-02 Towards Efficient Educational Chatbots: Benchmarking RAG Frameworks Umar Ali Khan et.al. 2503.00781 null
2025-04-12 Evaluating Personalized Tool-Augmented LLMs from the Perspectives of Personalization and Proactivity Yupu Hao et.al. 2503.00771 link
2025-03-01 U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack Yunfan Gao et.al. 2503.00353 link
2025-02-28 Jawaher: A Multidialectal Dataset of Arabic Proverbs for LLM Benchmarking Samar M. Magdy et.al. 2503.00231 null
2025-02-28 Consistency Evaluation of News Article Summaries Generated by Large (and Small) Language Models Colleen Gilhuly et.al. 2502.20647 null
2025-05-23 Is Your Paper Being Reviewed by an LLM? Benchmarking AI Text Detection in Peer Review Sungduk Yu et.al. 2502.19614 null
2025-02-26 Exploring Graph Tasks with Pure LLMs: A Comprehensive Benchmark and Investigation Yuxiang Wang et.al. 2502.18771 link
2025-02-23 Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation Simin Chen et.al. 2502.17521 link
2025-05-23 Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective Chengyin Xu et.al. 2502.17262 null
2025-02-24 Detecting Benchmark Contamination Through Watermarking Tom Sander et.al. 2502.17259 null
2025-02-24 Predicting Liquidity-Aware Bond Yields using Causal GANs and Deep Reinforcement Learning with LLM Evaluation Jaskaran Singh Walia et.al. 2502.17011 null
2025-02-24 AlphaAgent: LLM-Driven Alpha Mining with Regularized Exploration to Counteract Alpha Decay Ziyi Tang et.al. 2502.16789 link
2025-01-30 Retrieval Augmented Generation Based LLM Evaluation For Protocol State Machine Inference With Chain-of-Thought Reasoning Youssef Maklad et.al. 2502.15727 null
2025-03-10 Prompt-to-Leaderboard Evan Frick et.al. 2502.14855 link
2025-03-28 SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines M-A-P Team et.al. 2502.14739 null
2025-02-20 SEA-HELM: Southeast Asian Holistic Evaluation of Language Models Yosephine Susanto et.al. 2502.14301 null
2025-02-20 Transfer-Prompting: Enhancing Cross-Task Adaptation in Large Language Models via Dual-Stage Prompts Optimization Yupeng Chang et.al. 2502.14211 link
2025-02-19 Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above Nishant Balepur et.al. 2502.14127 null
2025-02-19 STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models Narun Raman et.al. 2502.13119 null
2025-02-18 HPSS: Heuristic Prompting Strategy Search for LLM Evaluators Bosi Wen et.al. 2502.13031 null
2025-05-23 None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks Eva Sánchez Salido et.al. 2502.12896 null
2025-04-08 Safe at the Margins: A General Approach to Safety Alignment in Low-Resource English Languages -- A Singlish Case Study Isaac Lim et.al. 2502.12485 null
2025-02-17 Deviation Ratings: A General, Clone-Invariant Rating Method Luke Marris et.al. 2502.11645 null
2025-02-21 TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking Shahriar Kabir Nahin et.al. 2502.11187 null
2025-02-15 Rule-Bottleneck Reinforcement Learning: Joint Explanation and Decision Optimization for Resource Allocation with Language Agents Mauricio Tec et.al. 2502.10732 null
2025-03-02 An Empirical Analysis of Uncertainty in Large Language Model Evaluations Qiujie Xie et.al. 2502.10709 link
2025-02-25 Accelerating Unbiased LLM Evaluation via Synthetic Feedback Zhaoyi Zhou et.al. 2502.10563 link
2025-02-14 MathConstruct: Challenging LLM Reasoning with Constructive Proofs Mislav Balunović et.al. 2502.10197 null
2025-02-13 Enhancing Jailbreak Attacks via Compliance-Refusal-Based Initialization Amit Levi et.al. 2502.09755 null
2025-02-13 NestQuant: Nested Lattice Quantization for Matrix Products and LLMs Semyon Savkin et.al. 2502.09720 null
2025-02-12 The Science of Evaluating Foundation Models Jiayi Yuan et.al. 2502.09670 null
2025-02-13 Copilot Arena: A Platform for Code LLM Evaluation in the Wild Wayne Chi et.al. 2502.09328 null
2025-02-12 Revisiting 3D LLM Benchmarks: Are We Really Testing 3D Capabilities? Jiahe Jin et.al. 2502.08503 link
2025-02-11 Forget What You Know about LLMs Evaluations -- LLMs are Like a Chameleon Nurit Cohen-Inger et.al. 2502.07445 link
2025-02-10 Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph Coloring Alex Heyman et.al. 2502.07087 link
2025-02-10 Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models Lujain Ibrahim et.al. 2502.07077 null
2025-02-07 LLM-Supported Natural Language to Bash Translation Finnian Westenfelder et.al. 2502.06858 link
2025-02-15 Self-Supervised Prompt Optimization Jinyu Xiang et.al. 2502.06855 link
2025-02-10 Resurrecting saturated LLM benchmarks with adversarial encoding Igor Ivanov et.al. 2502.06738 null
2025-02-10 Automatic Evaluation of Healthcare LLMs Beyond Question-Answering Anna Arias-Duart et.al. 2502.06666 null
2025-02-10 Unbiased Evaluation of Large Language Models from a Causal Perspective Meilin Chen et.al. 2502.06655 null
2025-02-10 LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks Xin Zhou et.al. 2502.06215 null
2025-02-05 Aero-LLM: A Distributed Framework for Secure UAV Communication and Intelligent Decision-Making Balakrishnan Dharmalingam et.al. 2502.05220 null
2025-02-06 TruthFlow: Truthful LLM Generation via Representation Flow Correction Hanyu Wang et.al. 2502.04556 null
2025-02-05 How do Humans and Language Models Reason About Creativity? A Comparative Analysis Antonio Laverghetta Jr. et.al. 2502.03253 null
2025-03-22 On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation Nghiem T. Diep et.al. 2502.03029 null
2025-02-02 LLM-Powered Benchmark Factory: Reliable, Generic, and Efficient Peiwen Yuan et.al. 2502.01683 link
2025-02-02 HASSLE-free: A unified Framework for Sparse plus Low-Rank Matrix Decomposition for LLMs Mehdi Makni et.al. 2502.00899 null
2025-02-01 DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks Zhiliang Chen et.al. 2502.00270 link
2025-01-30 Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation Muhammed Yusuf Kocyigit et.al. 2501.18771 null
2025-01-31 ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation Minghua He et.al. 2501.18460 null
2025-02-01 LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering Beiming Liu et.al. 2501.17183 null
2025-03-18 An LLM Benchmark for Addressee Recognition in Multi-modal Multi-party Dialogue Koji Inoue et.al. 2501.16643 null
2025-01-26 HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI Tidor-Vlad Pricope et.al. 2501.15627 null
2025-01-23 Question Answering on Patient Medical Records with Private Fine-Tuned LLMs Sara Kothari et.al. 2501.13687 null
2025-01-10 CodEv: An Automated Grading Framework Leveraging Large Language Models for Consistent and Constructive Feedback En-Qi Tseng et.al. 2501.10421 null
2025-01-15 Towards Multilingual LLM Evaluation for Baltic and Nordic languages: A study on Lithuanian History Yevhen Kostiuk et.al. 2501.09154 null
2025-01-13 Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles Samia Touileb et.al. 2501.07718 null
2025-01-03 FLAME: Financial Large-Language Model Assessment and Metrics Evaluation Jiayu Guo et.al. 2501.06211 link
2025-01-07 MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems Yannis Katsis et.al. 2501.03468 link
2025-01-05 Evaluating Large Language Models Against Human Annotators in Latent Content Analysis: Sentiment, Political Leaning, Emotional Intensity, and Sarcasm Ljubisa Bojic et.al. 2501.02532 null
2025-01-04 LLMzSzŁ: a comprehensive LLM benchmark for Polish Krzysztof Jassem et.al. 2501.02266 null
2025-03-25 VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Yuqian Yuan et.al. 2501.00599 link
2025-01-04 Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation M. Ali Bayram et.al. 2501.00593 null
2024-12-31 Echoes in AI: Quantifying Lack of Plot Diversity in LLM Outputs Weijia Xu et.al. 2501.00273 null
2024-12-30 EVOLVE: Emotion and Visual Output Learning via LLM Evaluation Jordan Sinclair et.al. 2412.20632 null
2024-12-24 Muse: A Multimodal Conversational Recommendation Dataset with Scenario-Grounded User Profiles Zihan Wang et.al. 2412.18416 null
2024-12-24 A Statistical Framework for Ranking LLM-Based Chatbots Siavash Ameli et.al. 2412.18407 link
2025-01-25 DeepCRCEval: Revisiting the Evaluation of Code Review Comment Generation Junyi Lu et.al. 2412.18291 null
2024-12-23 CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language Models Ruibo Tu et.al. 2412.17970 link
2025-01-02 Baichuan4-Finance Technical Report Hanyu Zhang et.al. 2412.15270 null
2024-12-19 ObjVariantEnsemble: Advancing Point Cloud LLM Evaluation in Challenging Scenes with Subtly Distinguished Objects Qihang Cao et.al. 2412.14837 null
2024-12-18 AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge Xiaobao Wu et.al. 2412.13670 link
2025-02-16 Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning Eitan Wagner et.al. 2412.13631 null
2025-02-17 OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain Shuting Wang et.al. 2412.13018 link
2024-12-10 How to Choose a Threshold for an Evaluation Metric for Large Language Models Bhaskarjit Sarmah et.al. 2412.12148 null
2024-12-15 Dual Traits in Probabilistic Reasoning of Large Language Models Shenxiong Li et.al. 2412.11009 link
2024-12-30 LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation Eunsu Kim et.al. 2412.10424 null
2024-12-13 Cultural Evolution of Cooperation among LLM Agents Aron Vallinder et.al. 2412.10270 null
2024-12-12 Towards Understanding the Robustness of LLM-based Evaluations under Perturbations Manav Chaudhary et.al. 2412.09269 null
2024-12-10 BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities Sahal Shaji Mullappilly et.al. 2412.07769 link
2025-02-28 PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models Qian Zhang et.al. 2412.06287 link
2024-12-02 AI Benchmarks and Datasets for LLM Evaluation Todor Ivanov et.al. 2412.01020 null
2024-11-30 Evaluating the Consistency of LLM Evaluators Noah Lee et.al. 2412.00543 null
2024-11-29 MIMDE: Exploring the Use of Synthetic vs Human Data for Evaluating Multi-Insight Multi-Document Extraction Tasks John Francis et.al. 2411.19689 null
2024-11-29 Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension Ability Yujin Han et.al. 2411.19456 link
2024-11-27 Is my Meeting Summary Good? Estimating Quality with a Multi-LLM Evaluator Frederic Kirstein et.al. 2411.18444 null
2025-01-17 CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity Zhengmin Yu et.al. 2411.16239 link
2024-11-25 SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text Reshmi Ghosh et.al. 2411.16077 null
2024-11-26 Do LLMs Agree on the Creativity Evaluation of Alternative Uses? Abdullah Al Rabeyah et.al. 2411.15560 null
2025-02-17 Ranking Unraveled: Recipes for LLM Rankings in Head-to-Head AI Combat Roland Daynauth et.al. 2411.14483 link
2024-11-21 Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models Lovish Madaan et.al. 2411.14103 null
2024-11-21 An Evaluation-Driven Approach to Designing LLM Agents: Process and Architecture Boming Xia et.al. 2411.13768 null
2024-11-21 A Framework for Evaluating LLMs Under Task Indeterminacy Luke Guerdan et.al. 2411.13760 null
2024-11-12 Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning Linyang He et.al. 2411.07533 null
2024-11-13 Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models Yancheng He et.al. 2411.07140 null
2024-11-09 Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models Xiaojun Wu et.al. 2411.06272 link
2025-02-09 ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding Israel Abebe Azime et.al. 2411.05049 null
2024-11-07 Bayesian Calibration of Win Rate Estimation with LLM Evaluators Yicheng Gao et.al. 2411.04424 link
2024-11-05 Enhancing LLM Evaluations: The Garbling Trick William F. Bradley et.al. 2411.01533 null
2025-02-19 Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models Seonil Son et.al. 2411.01281 null
2025-02-07 Mastering the Craft of Data Synthesis for CodeLLMs Meng Chen et.al. 2411.00005 link
2024-10-28 Project MPG: towards a generalized performance benchmark for LLM capabilities Lucas Spangher et.al. 2410.22368 null
2024-10-29 Self-Preference Bias in LLM-as-a-Judge Koki Wataoka et.al. 2410.21819 null
2024-10-28 Unveiling Context-Aware Criteria in Self-Assessing LLMs Taneesh Gupta et.al. 2410.21545 null
2024-10-27 LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization Jui-Nan Yen et.al. 2410.20625 link
2024-10-26 Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in Expert Knowledge Tasks Annalisa Szymanski et.al. 2410.20266 null
2024-10-23 MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning Jingfan Zhang et.al. 2410.18035 null
2025-02-21 Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements Isamu Isozaki et.al. 2410.17141 link
2024-10-21 CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution Maosong Cao et.al. 2410.16256 link
2025-01-26 mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation Nishat Raihan et.al. 2410.15037 link
2024-10-19 CAP: Data Contamination Detection via Consistency Amplification Yi Zhao et.al. 2410.15005 null
2024-10-18 Enabling Scalable Evaluation of Bias Patterns in Medical LLMs Hamed Fayyaz et.al. 2410.14763 link
2024-11-06 Diverging Preferences: When do Annotators Disagree and do Models Know? Michael JQ Zhang et.al. 2410.14632 null
2024-10-18 Combining Entropy and Matrix Nuclear Norm for Enhanced Evaluation of Language Models James Vo et.al. 2410.14480 null
2024-10-21 BenTo: Benchmark Task Reduction with In-Context Transferability Hongyu Zhao et.al. 2410.13804 link
2024-10-16 BenchmarkCards: Large Language Model and Risk Reporting Anna Sokol et.al. 2410.12974 null
2025-02-01 Language Model Preference Evaluation with Multiple Weak Evaluators Zhengyu Hu et.al. 2410.12869 link
2024-10-11 Enterprise Benchmarks for Large Language Model Evaluation Bing Zhang et.al. 2410.12857 link
2024-10-16 An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation Junjie Chen et.al. 2410.12265 null
2024-10-15 Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers Lorenzo Pacchiardi et.al. 2410.11672 link
2024-10-15 Black-box Uncertainty Quantification Method for LLM-as-a-Judge Nico Wagner et.al. 2410.11594 null
2024-10-14 Jailbreak Instruction-Tuned LLMs via end-of-sentence MLP Re-weighting Yifan Luo et.al. 2410.10150 null
2024-12-13 HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics Jingxuan Fan et.al. 2410.09988 link
2024-10-15 LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models Han Qiu et.al. 2410.09962 link
2024-10-17 Towards Multilingual LLM Evaluation for European Languages Klaudia Thellmann et.al. 2410.08928 null
2024-10-11 Test-driven Software Experimentation with LASSO: an LLM Benchmarking Example Marcus Kessel et.al. 2410.08911 null
2024-10-10 Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks Mathis Pink et.al. 2410.08133 null
2025-02-03 COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act Philipp Guldimann et.al. 2410.07959 link
2024-11-06 News Reporter: A Multi-lingual LLM Framework for Broadcast T.V News Tarun Jain et.al. 2410.07520 null
2024-10-09 Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates Xiaosen Zheng et.al. 2410.07137 link
2024-10-09 ReIFE: Re-evaluating Instruction-Following Evaluation Yixin Liu et.al. 2410.07069 link
2024-10-08 Active Evaluation Acquisition for Efficient LLM Benchmarking Yang Li et.al. 2410.05952 null
2024-10-07 TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles Qingchen Yu et.al. 2410.05262 link
2024-10-01 Language Enhanced Model for Eye (LEME): An Open-Source Ophthalmology-Specific Large Language Model Aidan Gilson et.al. 2410.03740 null
2024-10-04 TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation Jonathan Cook et.al. 2410.03608 null
2024-10-04 Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores Robert E. Blackwell et.al. 2410.03492 null
2024-10-29 AIME: AI System Optimization via Multiple LLM Evaluators Bhrij Patel et.al. 2410.03131 null
2024-10-02 Comparing Criteria Development Across Domain Experts, Lay Users, and Models in Large Language Model Evaluation Annalisa Szymanski et.al. 2410.02054 null
2024-10-02 Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models Joseph Lee et.al. 2410.01795 link
2024-10-03 Extending Context Window of Large Language Models from a Distributional Perspective Yingsheng Wu et.al. 2410.01490 link
2024-10-02 ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving Yifan Qiao et.al. 2410.01228 null
2024-10-01 ViDAS: Vision-based Danger Assessment and Scoring Pranav Gupta et.al. 2410.00477 null
2024-10-01 PclGPT: A Large Language Model for Patronizing and Condescending Language Detection Hongbo Wang et.al. 2410.00361 link
2024-11-26 LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models Haitao Li et.al. 2409.20288 link
2024-09-29 Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems Xuyang Wu et.al. 2409.19804 link
2024-10-19 Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models Xin Li et.al. 2409.19667 link
2024-10-05 IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation Fan Lin et.al. 2409.18892 link
2024-12-13 A Character-Centric Creative Story Generation via Imagination Kyeongman Park et.al. 2409.16667 null
2024-09-25 Judgment of Thoughts: Courtroom of the Binary Logical Reasoning in Large Language Models Sungjune Park et.al. 2409.16635 null
2024-12-18 Kalahi: A handcrafted, grassroots cultural LLM evaluation suite for Filipino Jann Railey Montalan et.al. 2409.15380 link
2024-12-16 MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators Qingyu Lu et.al. 2409.14335 link
2024-09-21 ChemEval: A Comprehensive Multi-Level Chemical Evaluation for Large Language Models Yuqing Huang et.al. 2409.13989 link
2024-12-17 AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs Basel Mousi et.al. 2409.11404 null
2024-10-02 LLM-as-a-Judge & Reward Model: What They Can and Cannot Do Guijin Son et.al. 2409.11239 null
2024-12-08 Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges Vinay Samuel et.al. 2409.09927 link
2024-09-13 Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia Fajri Koto et.al. 2409.08564 null
2024-09-09 Assessing SPARQL capabilities of Large Language Models Lars-Peter Meyer et.al. 2409.05925 link
2024-10-08 LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs Yuhao Wu et.al. 2409.02076 link
2024-10-14 Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation Jasper Dekoninck et.al. 2409.00696 null
2024-08-26 Evaluating ChatGPT on Nuclear Domain-Specific Data Muhammad Anwar et.al. 2409.00090 null
2024-08-28 LLMSecCode: Evaluating Large Language Models for Secure Coding Anton Rydén et.al. 2408.16100 link
2024-08-26 LLM-3D Print: Large Language Models To Monitor and Control 3D Printing Yayati Jadhav et.al. 2408.14307 null
2024-08-26 Epidemic Information Extraction for Event-Based Surveillance using Large Language Models Sergio Consoli et.al. 2408.14277 null
2024-10-04 MobileQuant: Mobile-friendly Quantization for On-device Language Models Fuwen Tan et.al. 2408.13933 link
2024-08-23 LalaEval: A Holistic Human Evaluation Framework for Domain-Specific Large Language Models Chongyan Sun et.al. 2408.13338 null
2024-08-23 Open Llama2 Model for the Lithuanian Language Artūras Nakvosas et.al. 2408.12963 null
2024-08-23 LIMP: Large Language Model Enhanced Intent-aware Mobility Prediction Songwei Li et.al. 2408.12832 link
2024-12-20 Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts Jiaqing Liu et.al. 2408.09688 null
2024-08-20 Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge Ravi Raju et.al. 2408.08808 null
2024-10-16 The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation Samee Arif et.al. 2408.08688 link
2024-10-19 Persona is a Double-edged Sword: Mitigating the Negative Impact of Role-playing Prompts in Zero-shot Reasoning Tasks Junseok Kim et.al. 2408.08631 null

(back to top)

LLM MLLM

Publish Date Title Authors PDF Code
2025-07-23 Yume: An Interactive World Generation Model Xiaofeng Mao et.al. 2507.17744 null
2025-07-23 Flow Matching Meets Biology and Life Science: A Survey Zihao Li et.al. 2507.17731 null
2025-07-23 BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems Malsha Ashani Mahawatta Dona et.al. 2507.17722 null
2025-07-23 AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer Danny D. Leybzon et.al. 2507.17718 null
2025-07-23 HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging Taha Ceritli et.al. 2507.17706 null
2025-07-23 Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models Changxin Tian et.al. 2507.17702 null
2025-07-23 Thinking Isn't an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations Zhao Song et.al. 2507.17699 null
2025-07-23 Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks Ilias Chatzistefanidis et.al. 2507.17695 null
2025-07-23 Simulating multiple human perspectives in socio-ecological systems using large language models Yongchao Zeng et.al. 2507.17680 null
2025-07-23 See the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question Answering Junjie Wang et.al. 2507.17659 null
2025-07-23 CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts Olaf Dünkel et.al. 2507.17651 null
2025-07-23 Who Attacks, and Why? Using LLMs to Identify Negative Campaigning in 18M Tweets across 19 Countries Victor Hartman et.al. 2507.17636 null
2025-07-23 A Hybrid Early-Exit Algorithm for Large Language Models Based on Space Alignment Decoding (SPADE) Bowen Zheng et.al. 2507.17618 null
2025-07-23 CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning Lingxiao Tang et.al. 2507.17548 null
2025-07-23 Anticipate, Simulate, Reason (ASR): A Comprehensive Generative AI Framework for Combating Messaging Scams Xue Wen Tan et.al. 2507.17543 null
2025-07-23 AssertFlip: Reproducing Bugs via Inversion of LLM-Generated Passing Tests Lara Khatib et.al. 2507.17542 null
2025-07-23 Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning Xinyao Liu et.al. 2507.17539 null
2025-07-23 Enabling Cyber Security Education through Digital Twins and Generative AI Vita Santa Barletta et.al. 2507.17518 null
2025-07-23 URPO: A Unified Reward & Policy Optimization Framework for Large Language Models Songshuo Lu et.al. 2507.17515 null
2025-07-23 HOTA: Hamiltonian framework for Optimal Transport Advection Nazar Buzun et.al. 2507.17513 null
2025-07-23 Unsupervised anomaly detection using Bayesian flow networks: application to brain FDG PET in the context of Alzheimer's disease Hugues Roy et.al. 2507.17486 null
2025-07-23 An Uncertainty-Driven Adaptive Self-Alignment Framework for Large Language Models Haoran Sun et.al. 2507.17477 null
2025-07-23 MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs Alexander R. Fabbri et.al. 2507.17476 null
2025-07-23 BGM-HAN: A Hierarchical Attention Network for Accurate and Fair Decision Assessment on Semi-Structured Profiles Junhua Liu et.al. 2507.17472 null
2025-07-23 ERMV: Editing 4D Robotic Multi-view images to enhance embodied agents Chang Nie et.al. 2507.17462 null
2025-07-23 Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning Situo Zhang et.al. 2507.17448 null
2025-07-23 Each to Their Own: Exploring the Optimal Embedding in RAG Shiting Chen et.al. 2507.17442 null
2025-07-23 A Comprehensive Evaluation on Quantization Techniques for Large Language Models Yutong Liu et.al. 2507.17417 null
2025-07-23 HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs Zhaolin Cai et.al. 2507.17394 null
2025-07-23 Investigating Training Data Detection in AI Coders Tianlin Li et.al. 2507.17389 null
2025-07-23 Confidence Calibration in Vision-Language-Action Models Thomas P Zollo et.al. 2507.17383 null
2025-07-23 Language-Conditioned Open-Vocabulary Mobile Manipulation with Pretrained Models Shen Tan et.al. 2507.17379 null
2025-07-23 DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning Chuzhan Hao et.al. 2507.17365 null
2025-07-23 RoadBench: A Vision-Language Foundation Model and Benchmark for Road Damage Understanding Xi Xiao et.al. 2507.17353 null
2025-07-23 CartoonAlive: Towards Expressive Live2D Modeling from Single Portraits Chao He et.al. 2507.17327 null
2025-07-23 Application of Whisper in Clinical Practice: the Post-Stroke Speech Assessment during a Naming Task Milena Davudova et.al. 2507.17326 null
2025-07-23 R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning Zhuokun Chen et.al. 2507.17307 null
2025-07-23 A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model Zhe Xu et.al. 2507.17303 null
2025-07-23 Exploring the Potential of LLMs for Serendipity Evaluation in Recommender Systems Li Kang et.al. 2507.17290 null
2025-07-23 Triple X: A LLM-Based Multilingual Speech Recognition System for the INTERSPEECH2025 MLC-SLM Challenge Miaomiao Gao et.al. 2507.17288 null
2025-07-23 Fully Automated SAM for Single-source Domain Generalization in Medical Image Segmentation Huanli Zhuo et.al. 2507.17281 null
2025-07-23 Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance Rishi Parekh et.al. 2507.17273 null
2025-07-23 Seed&Steer: Guiding Large Language Models with Compilable Prefix and Branch Signals for Unit Test Generation Shuaiyu Zhou et.al. 2507.17271 null
2025-07-23 Understanding Prompt Programming Tasks and Questions Jenny T. Liang et.al. 2507.17264 null
2025-07-23 Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs Eyal German et.al. 2507.17259 null
2025-07-23 Agent Identity Evals: Measuring Agentic Identity Elija Perrier et.al. 2507.17257 null
2025-07-23 Rethinking VAE: From Continuous to Discrete Representations Without Probabilistic Assumptions Songxuan Shi et.al. 2507.17255 null
2025-07-23 R4ec: A Reasoning, Reflection, and Refinement Framework for Recommendation Systems Hao Gu et.al. 2507.17249 null
2025-07-23 Perceptual Classifiers: Detecting Generative Images using Perceptual Features Krishna Srikar Durbha et.al. 2507.17240 null
2025-07-23 MaskedCLIP: Bridging the Masked and CLIP Space for Semi-Supervised Medical Vision-Language Pre-training Lei Zhu et.al. 2507.17239 null
2025-07-23 A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task Mashiro Toyooka et.al. 2507.17232 null
2025-07-23 PIG-Nav: Key Insights for Pretrained Image Goal Navigation Models Jiansong Wan et.al. 2507.17220 null
2025-07-23 The Pluralistic Moral Gap: Understanding Judgment and Value Differences between Humans and Large Language Models Giuseppe Russo et.al. 2507.17216 null
2025-07-23 EFS: Evolutionary Factor Searching for Sparse Portfolio Optimization Using Large Language Models Haochen Luo et.al. 2507.17211 null
2025-07-23 HypoChainer: A Collaborative System Combining LLMs and Knowledge Graphs for Hypothesis-Driven Scientific Discovery Haoran Jiang et.al. 2507.17209 null
2025-07-23 Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation Zixuan Wang et.al. 2507.17204 null
2025-07-23 DesignLab: Designing Slides Through Iterative Detection and Correction Jooyeol Yun et.al. 2507.17202 null
2025-07-23 Vec2Face+ for Face Dataset Generation Haiyu Wu et.al. 2507.17192 null
2025-07-23 LLM Meets the Sky: Heuristic Multi-Agent Reinforcement Learning for Secure Heterogeneous UAV Networks Lijie Zheng et.al. 2507.17188 null
2025-07-23 SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs Zhiqiang Liu et.al. 2507.17178 null
2025-07-23 Improving LLMs' Generalized Reasoning Abilities by Graph Problems Qifan Zhang et.al. 2507.17168 null
2025-07-23 Can LLMs Write CI? A Study on Automatic Generation of GitHub Actions Configurations Taher A. Ghaleb et.al. 2507.17165 null
2025-07-23 DOOMGAN:High-Fidelity Dynamic Identity Obfuscation Ocular Generative Morphing Bharath Krishnamurthy et.al. 2507.17158 null
2025-07-23 UNICE: Training A Universal Image Contrast Enhancer Ruodai Cui et.al. 2507.17157 null
2025-07-23 CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards Cheng Liu et.al. 2507.17147 null
2025-07-23 SADA: Stability-guided Adaptive Diffusion Acceleration Ting Jiang et.al. 2507.17135 null
2025-07-23 Resilient Multi-Agent Negotiation for Medical Supply Chains:Integrating LLMs and Blockchain for Transparent Coordination Mariam ALMutairi et.al. 2507.17134 null
2025-07-23 BrownoutServe: SLO-Aware Inference Serving under Bursty Workloads for MoE-based LLMs Jianmin Hu et.al. 2507.17133 null
2025-07-23 Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance Yufei He et.al. 2507.17131 null
2025-07-23 BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving Wanyi Zheng et.al. 2507.17120 null
2025-07-23 HySafe-AI: Hybrid Safety Architectural Analysis Framework for AI Systems: A Case Study Mandar Pitale et.al. 2507.17118 null
2025-07-23 Probabilistic Graphical Models: A Concise Tutorial Jacqueline Maasch et.al. 2507.17116 null
2025-07-23 Enhancing Transferability and Consistency in Cross-Domain Recommendations via Supervised Disentanglement Yuhan Wang et.al. 2507.17112 null
2025-07-23 Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models Andrii Balashov et.al. 2507.17107 null
2025-07-22 Risk In Context: Benchmarking Privacy Leakage of Foundation Models in Synthetic Tabular Data Generation Jessup Byun et.al. 2507.17066 null
2025-07-22 Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems Chengxuan Xia et.al. 2507.17061 null
2025-07-22 Toward Scalable Video Narration: A Training-free Approach Using Multimodal Large Language Models Tz-Ying Wu et.al. 2507.17050 null
2025-07-22 Controllable Hybrid Captioner for Improved Long-form Video Understanding Kuleen Sasse et.al. 2507.17047 null
2025-07-22 Write, Rank, or Rate: Comparing Methods for Studying Visualization Affordances Chase Stokes et.al. 2507.17024 null
2025-07-22 Causal Graph Fuzzy LLMs: A First Introduction and Applications in Time Series Forecasting Omid Orang et.al. 2507.17016 null
2025-07-22 Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge? Arduin Findeis et.al. 2507.17015 null
2025-07-22 Multi-Label Classification with Generative AI Models in Healthcare: A Case Study of Suicidality and Risk Factors Ming Huang et.al. 2507.17009 null
2025-07-22 Bringing Balance to Hand Shape Classification: Mitigating Data Imbalance Through Generative Models Gaston Gustavo Rios et.al. 2507.17008 null
2025-07-22 PyG 2.0: Scalable Learning on Real World Graphs Matthias Fey et.al. 2507.16991 null
2025-07-22 Obscured but Not Erased: Evaluating Nationality Bias in LLMs via Name-Based Bias Benchmarks Giulio Pelosio et.al. 2507.16989 null
2025-07-22 Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain Rishemjit Kaur et.al. 2507.16974 null
2025-07-22 LLM4MEA: Data-free Model Extraction Attacks on Sequential Recommenders via Large Language Models Shilong Zhao et.al. 2507.16969 null
2025-07-22 Harnessing RLHF for Robust Unanswerability Recognition and Trustworthy Response Generation in LLMs Shuyuan Lin et.al. 2507.16951 null
2025-07-22 AI-based Clinical Decision Support for Primary Care: A Real-World Study Robert Korom et.al. 2507.16947 null
2025-07-22 AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation Nima Fathi et.al. 2507.16940 null
2025-07-22 SiLQ: Simple Large Language Model Quantization-Aware Training Steven K. Esser et.al. 2507.16933 null
2025-07-22 Stellar Mass-Dispersion Measure Correlations Constrain Baryonic Feedback in Fast Radio Burst Host Galaxies Calvin Leung et.al. 2507.16816 null
2025-07-22 LingBench++: A Linguistically-Informed Benchmark and Reasoning Framework for Multi-Step and Cross-Cultural Inference with LLMs Da-Chen Lian et.al. 2507.16809 null
2025-07-22 Rethinking LLM-Based RTL Code Optimization Via Timing Logic Metamorphosis Zhihao Xu et.al. 2507.16808 null
2025-07-23 Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning Yanjun Zheng et.al. 2507.16802 null
2025-07-23 Test-Time-Matching: Decouple Personality, Memory, and Linguistic Style in LLM-based Role-Playing Language Agent Xiaoyu Zhan et.al. 2507.16799 null
2025-07-22 Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning Helena Casademunt et.al. 2507.16795 null
2025-07-22 ChatChecker: A Framework for Dialogue System Testing and Evaluation Through Non-cooperative User Simulation Roman Mayr et.al. 2507.16792 null
2025-07-22 Enhancing Domain Diversity in Synthetic Data Face Recognition with Dataset Fusion Anjith George et.al. 2507.16790 null
2025-07-22 Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Hongyin Luo et.al. 2507.16784 null
2025-07-22 Cooling Matters: Benchmarking Large Language Models and Vision-Language Models on Liquid-Cooled Versus Air-Cooled H100 GPU Systems Imran Latif et.al. 2507.16781 null
2025-07-22 When LLMs Copy to Think: Uncovering Copy-Guided Attacks in Reasoning LLMs Yue Li et.al. 2507.16773 null
2025-07-22 WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding Ran Wang et.al. 2507.16768 null
2025-07-22 Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support Fangjian Lei et.al. 2507.16754 null
2025-07-22 CMP: A Composable Meta Prompt for SAM-Based Cross-Domain Few-Shot Segmentation Shuai Chen et.al. 2507.16753 null
2025-07-22 Collaborative Inference and Learning between Edge SLMs and Cloud LLMs: A Survey of Algorithms, Execution, and Open Challenges Senyao Li et.al. 2507.16731 null
2025-07-22 Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints Zhenyun Yin et.al. 2507.16727 null
2025-07-22 Enhancing Remote Sensing Vision-Language Models Through MLLM and LLM-Based High-Quality Image-Text Dataset Generation Yiguo He et.al. 2507.16716 null
2025-07-22 Advancing Risk and Quality Assurance: A RAG Chatbot for Improved Regulatory Compliance Lars Hillebrand et.al. 2507.16711 null
2025-07-22 Biases in LLM-Generated Musical Taste Profiles for Recommendation Bruno Sguerra et.al. 2507.16708 null
2025-07-22 FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation Pingyi Fan et.al. 2507.16696 null
2025-07-22 Generating Search Explanations using Large Language Models Arif Laksito et.al. 2507.16692 null
2025-07-22 PICACO: Pluralistic In-Context Value Alignment of LLMs via Total Correlation Optimization Han Jiang et.al. 2507.16679 null
2025-07-22 Custom Algorithm-based Fault Tolerance for Attention Layers in Transformers Vasileios Titopoulos et.al. 2507.16676 null
2025-07-22 Meta-Learning for Cold-Start Personalization in Prompt-Tuned LLMs Yushang Zhao et.al. 2507.16672 null
2025-07-22 VulCoCo: A Simple Yet Effective Method for Detecting Vulnerable Code Clones Tan Bui et.al. 2507.16661 null
2025-07-22 P-CoT: A Pedagogically-motivated Participatory Chain-of-Thought Prompting for Phonological Reasoning in LLMs Dongjun Jang et.al. 2507.16656 null
2025-07-22 Towards Automated Regulatory Compliance Verification in Financial Auditing with Large Language Models Armin Berger et.al. 2507.16642 null
2025-07-22 Step-Audio 2 Technical Report Boyong Wu et.al. 2507.16632 null
2025-07-22 Automatic Fine-grained Segmentation-assisted Report Generation Frederic Jonske et.al. 2507.16623 null
2025-07-22 On the Effectiveness of LLM-as-a-judge for Code Generation and Summarization Giuseppe Crupi et.al. 2507.16587 null
2025-07-22 LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models Ahmed Lekssays et.al. 2507.16585 null
2025-07-22 From Text to Actionable Intelligence: Automating STIX Entity and Relationship Extraction Ahmed Lekssays et.al. 2507.16576 null
2025-07-22 Pixels to Principles: Probing Intuitive Physics Understanding in Multimodal Language Models Mohamad Ballout et.al. 2507.16572 null
2025-07-22 TTMBA: Towards Text To Multiple Sources Binaural Audio Generation Yuxuan He et.al. 2507.16564 null
2025-07-22 Exploring Gender Bias in Large Language Models: An In-depth Dive into the German Language Kristin Gnadt et.al. 2507.16557 null
2025-07-22 Alternative Loss Function in Evaluation of Transformer Models Jakub Michańków et.al. 2507.16548 null
2025-07-22 Learning Text Styles: A Study on Transfer, Attribution, and Verification Zhiqiang Hu et.al. 2507.16530 null
2025-07-22 Spatial 3D-LLM: Exploring Spatial Awareness in 3D Vision-Language Models Xiaoyan Wang et.al. 2507.16524 null
2025-07-22 C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning Xiuwei Chen et.al. 2507.16518 null
2025-07-22 The Ever-Evolving Science Exam Junying Wang et.al. 2507.16514 null
2025-07-22 Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications Jean Lelong et.al. 2507.16507 null
2025-07-22 ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs Zhenliang Zhang et.al. 2507.16488 null
2025-07-22 ACT: Bridging the Gap in Code Translation through Synthetic Data Generation & Adaptive Training Shreya Saxena et.al. 2507.16478 null
2025-07-22 Learning Temporal Abstractions via Variational Homomorphisms in Option-Induced Abstract MDPs Chang Li et.al. 2507.16473 null
2025-07-22 Towards Enforcing Company Policy Adherence in Agentic Workflows Naama Zwerdling et.al. 2507.16459 null
2025-07-22 An approach to measuring the performance of Automatic Speech Recognition (ASR) models in the context of Large Language Model (LLM) powered applications Sujith Pulikodan et.al. 2507.16456 null
2025-07-22 VGGT-Long: Chunk it, Loop it, Align it -- Pushing VGGT's Limits on Kilometer-scale Long RGB Sequences Kai Deng et.al. 2507.16443 null
2025-07-22 Exploring Large Language Models for Analyzing and Improving Method Names in Scientific Code Gunnar Larsen et.al. 2507.16439 null
2025-07-22 Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework Hongyi Tang et.al. 2507.16414 null
2025-07-22 GG-BBQ: German Gender Bias Benchmark for Question Answering Shalaka Satheesh et.al. 2507.16410 null
2025-07-22 Improving Code LLM Robustness to Prompt Perturbations via Layer-Aware Model Editing Shuhan Liu et.al. 2507.16407 null
2025-07-22 Sparse-View 3D Reconstruction: Recent Advances and Open Challenges Tanveer Younis et.al. 2507.16406 null
2025-07-22 LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning Bo Hou et.al. 2507.16395 null
2025-07-22 Are Foundation Models All You Need for Zero-shot Face Presentation Attack Detection? Lazaro Janier Gonzalez-Sole et.al. 2507.16393 null
2025-07-22 A general model for frictional contacts in colloidal systems Kay Hofmann et.al. 2507.16388 null
2025-07-22 Application of LLM Guided Reinforcement Learning in Formation Control with Collision Avoidance Chenhao Yao et.al. 2507.16382 null
2025-07-22 Depth Gives a False Sense of Privacy: LLM Internal States Inversion Tian Dong et.al. 2507.16372 null
2025-07-22 One Polyp Identifies All: One-Shot Polyp Segmentation with SAM via Cascaded Priors and Iterative Prompt Evolution Xinyu Mao et.al. 2507.16337 null
2025-07-22 Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny Chuanhao Yan et.al. 2507.16331 null
2025-07-22 DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling Boheng Li et.al. 2507.16329 null
2025-07-22 M-SpecGene: Generalized Foundation Model for RGBT Multispectral Vision Kailai Zhou et.al. 2507.16318 null
2025-07-22 Perovskite-R1: A Domain-Specialized LLM for Intelligent Discovery of Precursor Additives and Experimental Design Xin-De Wang et.al. 2507.16307 null
2025-07-22 Talking Like a Phisher: LLM-Based Attacks on Voice Phishing Classifiers Wenhao Li et.al. 2507.16291 null
2025-07-22 Dens3R: A Foundation Model for 3D Geometry Prediction Xianze Fang et.al. 2507.16290 null
2025-07-22 Time to Split: Exploring Data Splitting Strategies for Offline Evaluation of Sequential Recommenders Danil Gusak et.al. 2507.16289 null
2025-07-22 Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition Zefeng Qian et.al. 2507.16287 null
2025-07-22 Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training Zixiao Huang et.al. 2507.16274 null
2025-07-22 Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction Tianyun Zhong et.al. 2507.16271 null
2025-07-22 iShumei-Chinchunmei at SemEval-2025 Task 4: A balanced forgetting and retention multi-task framework using effective unlearning loss Yujian Sun et.al. 2507.16263 null
2025-07-22 Edge-case Synthesis for Fisheye Object Detection: A Data-centric Perspective Seunghyeon Kim et.al. 2507.16254 null
2025-07-22 Efficient RL for optimizing conversation level outcomes with an LLM-based tutor Hyunji Nam et.al. 2507.16252 null
2025-07-22 eX-NIDS: A Framework for Explainable Network Intrusion Detection Leveraging Large Language Models Paul R. B. Houssel et.al. 2507.16241 null
2025-07-22 Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling Chao Zhou et.al. 2507.16240 null
2025-07-22 LLM-Enhanced Reranking for Complementary Product Recommendation Zekun Xu et.al. 2507.16237 null
2025-07-22 Voice-based AI Agents: Filling the Economic Gaps in Digital Health Delivery Bo Wen et.al. 2507.16229 null
2025-07-22 Distilled Large Language Model in Confidential Computing Environment for System-on-Chip Design Dong Ben et.al. 2507.16226 null
2025-07-22 Towards Compute-Optimal Many-Shot In-Context Learning Shahriar Golchin et.al. 2507.16217 null
2025-07-22 Advancing Visual Large Language Model for Multi-granular Versatile Perception Wentao Xiang et.al. 2507.16213 null
2025-07-22 LOCOFY Large Design Models -- Design to code conversion solution Sohaib Muhammad et.al. 2507.16208 null
2025-07-22 A Human-Centered Approach to Identifying Promises, Risks, & Challenges of Text-to-Image Generative AI in Radiology Katelyn Morrison et.al. 2507.16207 null
2025-07-22 RealBench: Benchmarking Verilog Generation Models with Real-World IP Designs Pengwei Jin et.al. 2507.16200 null
2025-07-22 WakenLLM: A Fine-Grained Benchmark for Evaluating LLM Reasoning Potential and Reasoning Process Stability Zipeng Ling et.al. 2507.16199 null
2025-07-22 Do Large Language Models Have a Planning Theory of Mind? Evidence from MindGames: a Multi-Step Persuasion Task Jared Moore et.al. 2507.16196 null
2025-07-22 Emergent Cognitive Convergence via Implementation: A Structured Loop Reflecting Four Theories of Mind (A Position Paper) Myung Ho Kim et.al. 2507.16184 null
2025-07-22 LLM Data Selection and Utilization via Dynamic Bi-level Optimization Yang Yu et.al. 2507.16178 null
2025-07-22 SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting Shuhao Mei et.al. 2507.16145 null
2025-07-22 Disability Across Cultures: A Human-Centered Audit of Ableism in Western and Indic LLMs Mahika Phutane et.al. 2507.16130 null
2025-07-22 Benchmarking LLM Privacy Recognition for Social Robot Decision Making Dakota Sullivan et.al. 2507.16124 null
2025-07-22 PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation Yaofang Liu et.al. 2507.16116 null
2025-07-21 Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization Shengchao Liu et.al. 2507.16110 null
2025-07-21 Efficient Compositional Multi-tasking for On-device Large Language Models Ondrej Bohdal et.al. 2507.16083 null
2025-07-21 The Prompt Makes the Person(a): A Systematic Evaluation of Sociodemographic Persona Prompting for Large Language Models Marlene Lutz et.al. 2507.16076 null
2025-07-21 Deep Researcher with Test-Time Diffusion Rujun Han et.al. 2507.16075 null
2025-07-21 Compositional Coordination for Multi-Robot Teams with Large Language Models Zhehui Huang et.al. 2507.16068 null
2025-07-21 AI-Powered Commit Explorer (APCE) Yousab Grees et.al. 2507.16063 null
2025-07-21 AutoMeet: a proof-of-concept study of genAI to automate meetings in automotive engineering Simon Baeuerle et.al. 2507.16054 null
2025-07-21 Making REST APIs Agent-Ready: From OpenAPI to Model Context Protocol Servers for Tool-Augmented LLMs Meriem Mastouri et.al. 2507.16044 null
2025-07-21 A Pilot Study on LLM-Based Agentic Translation from Android to iOS: Pitfalls and Insights Zhili Zeng et.al. 2507.16037 null
2025-07-21 From Logic to Language: A Trust Index for Problem Solving with LLMs Tehseen Rug et.al. 2507.16028 null
2025-07-21 AI, Expert or Peer? -- Examining the Impact of Perceived Feedback Source on Pre-Service Teachers Feedback Perception and Uptake Lucas Jasper Jacobsen et.al. 2507.16013 null
2025-07-21 Diffusion Beats Autoregressive in Data-Constrained Settings Mihir Prabhudesai et.al. 2507.15857 null
2025-07-21 Latent Denoising Makes Good Visual Tokenizers Jiawei Yang et.al. 2507.15856 null
2025-07-21 Gemini 2.5 Pro Capable of Winning Gold at IMO 2025 Yichen Huang et.al. 2507.15855 null
2025-07-21 The Other Mind: How Language Models Exhibit Human Temporal Cognition Lingyu Li et.al. 2507.15851 null
2025-07-21 3LM: Bridging Arabic, STEM, and Code through Benchmarking Basma El Amel Boussaha et.al. 2507.15850 null
2025-07-21 The Impact of Language Mixing on Bilingual LLM Reasoning Yihao Li et.al. 2507.15849 null
2025-07-21 FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs Anh Nguyen et.al. 2507.15839 null
2025-07-21 Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation Alessandro B. Melchiorre et.al. 2507.15826 null
2025-07-21 ACS: An interactive framework for conformal selection Yu Gui et.al. 2507.15825 null
2025-07-21 Can Your Model Separate Yolks with a Water Bottle? Benchmarking Physical Commonsense Understanding in Video Generation Models Enes Sanli et.al. 2507.15824 null
2025-07-21 Do AI models help produce verified bug fixes? Li Huang et.al. 2507.15822 null
2025-07-21 LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra Seth Karten et.al. 2507.15815 null
2025-07-21 Diffusion models for multivariate subsurface generation and efficient probabilistic inversion Roberto Miele et.al. 2507.15809 null
2025-07-21 True Multimodal In-Context Learning Needs Attention to the Visual Context Shuo Chen et.al. 2507.15807 null
2025-07-21 ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction Danhui Chen et.al. 2507.15803 null
2025-07-21 Regularized Low-Rank Adaptation for Few-Shot Organ Segmentation Ghassen Baklouti et.al. 2507.15793 null
2025-07-21 Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning Sneheel Sarangi et.al. 2507.15788 null
2025-07-21 Reservoir Computing as a Language Model Felix Köster et.al. 2507.15779 null
2025-07-21 Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR Jiakang Wang et.al. 2507.15778 null
2025-07-21 Left Leaning Models: AI Assumptions on Economic Policy Maxim Chupilkin et.al. 2507.15771 null
2025-07-21 A Framework for Analyzing Abnormal Emergence in Service Ecosystems Through LLM-based Agent Intention Mining Yifan Shen et.al. 2507.15770 null
2025-07-21 GasAgent: A Multi-Agent Framework for Automated Gas Optimization in Smart Contracts Jingyi Zheng et.al. 2507.15761 null
2025-07-21 Understanding Large Language Models' Ability on Interdisciplinary Research Yuanhao Shen et.al. 2507.15736 null
2025-07-21 Gaze-supported Large Language Model Framework for Bi-directional Human-Robot Interaction Jens V. Rüppel et.al. 2507.15729 null
2025-07-21 TokensGen: Harnessing Condensed Tokens for Long Video Generation Wenqi Ouyang et.al. 2507.15728 null
2025-07-21 A Practical Investigation of Spatially-Controlled Image Generation with Transformers Guoxuan Xia et.al. 2507.15724 null
2025-07-21 BEnchmarking LLMs for Ophthalmology (BELO) for Ophthalmological Knowledge and Reasoning Sahana Srinivasan et.al. 2507.15717 null
2025-07-21 Chinchunmei at SemEval-2025 Task 11: Boosting the Large Language Model's Capability of Emotion Perception using Contrastive Learning Tian Li et.al. 2507.15714 null
2025-07-21 Is Large Language Model Performance on Reasoning Tasks Impacted by Different Ways Questions Are Asked? Seok Hwan Song et.al. 2507.15707 null
2025-07-21 Estimating Rate-Distortion Functions Using the Energy-Based Model Shitong Wu et.al. 2507.15700 null
2025-07-21 CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models Congmin Zheng et.al. 2507.15698 null
2025-07-21 Surfacing Variations to Calibrate Perceived Reliability of MLLM-generated Image Descriptions Meng Chen et.al. 2507.15692 null
2025-07-21 P3: Prompts Promote Prompting Xinyu Zhang et.al. 2507.15675 null
2025-07-21 BugScope: Learn to Find Bugs Like Human Jinyao Guo et.al. 2507.15671 null
2025-07-21 VeriRAG: A Retrieval-Augmented Framework for Automated RTL Testability Repair Haomin Qi et.al. 2507.15664 null
2025-07-21 SustainDiffusion: Optimising the Social and Environmental Sustainability of Stable Diffusion Models Giordano d'Aloisio et.al. 2507.15663 null
2025-07-21 HW-MLVQA: Elucidating Multilingual Handwritten Document Understanding with a Comprehensive VQA Benchmark Aniket Pal et.al. 2507.15655 null
2025-07-21 Extracting Visual Facts from Intermediate Layers for Mitigating Hallucinations in Multimodal Large Language Models Haoran Zhou et.al. 2507.15652 null
2025-07-21 Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training Kailai Yang et.al. 2507.15640 null
2025-07-21 DHEvo: Data-Algorithm Based Heuristic Evolution for Generalizable MILP Solving Zhihao Zhang et.al. 2507.15615 null
2025-07-21 Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems Andrii Balashov et.al. 2507.15613 null
2025-07-21 CylinderPlane: Nested Cylinder Representation for 3D-aware Image Generation Ru Jia et.al. 2507.15606 null
2025-07-21 Applying the Chinese Wall Reverse Engineering Technique to Large Language Model Code Editing Manatsawin Hanmongkolchai et.al. 2507.15599 null
2025-07-21 Learning to Extract Rational Evidence via Reinforcement Learning for Retrieval-Augmented Generation Xinping Zhao et.al. 2507.15586 null
2025-07-21 DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding Xiaoyi Bao et.al. 2507.15569 null
2025-07-21 Evaluating Text Style Transfer: A Nine-Language Benchmark for Text Detoxification Vitaly Protasov et.al. 2507.15557 null
2025-07-21 Efficient Routing of Inference Requests across LLM Instances in Cloud-Edge Computing Shibo Yu et.al. 2507.15553 null
2025-07-21 RankMixer: Scaling Up Ranking Models in Industrial Recommenders Jie Zhu et.al. 2507.15551 null
2025-07-21 PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors Yimeng Chen et.al. 2507.15550 null
2025-07-21 LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning Cole Robertson et.al. 2507.15521 null
2025-07-21 HAMLET: Hyperadaptive Agent-based Modeling for Live Embodied Theatrics Sizhou Chen et.al. 2507.15518 null
2025-07-21 Step-level Verifier-guided Hybrid Test-Time Scaling for Large Language Models Kaiyan Chang et.al. 2507.15512 null
2025-07-21 ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution Alexandru Coca et.al. 2507.15501 null
2025-07-21 PhishIntentionLLM: Uncovering Phishing Website Intentions through Multi-Agent Retrieval-Augmented Generation Wenhao Li et.al. 2507.15419 null
2025-07-21 PDEformer-2: A Versatile Foundation Model for Two-Dimensional Partial Differential Equations Zhanhong Ye et.al. 2507.15409 null
2025-07-21 PiMRef: Detecting and Explaining Ever-evolving Spear Phishing Emails with Knowledge Base Invariants Ruofan Liu et.al. 2507.15393 null
2025-07-21 DAViD: Data-efficient and Accurate Vision Models from Synthetic Data Fatemeh Saleh et.al. 2507.15365 null
2025-07-21 Revisiting the Effect of Grid-Following Converter on Frequency Dynamics -- Part I: Center of Inertia Jiahao Liu et.al. 2507.15358 null
2025-07-21 Metaphor and Large Language Models: When Surface Features Matter More than Deep Understanding Elisa Sanchez-Bayona et.al. 2507.15357 null
2025-07-21 RAD: Retrieval High-quality Demonstrations to Enhance Decision-making Lu Guo et.al. 2507.15356 null
2025-07-21 Scaling Decentralized Learning with FLock Zehua Cheng et.al. 2507.15349 null
2025-07-21 Probing Information Distribution in Transformer Architectures through Entropy Analysis Amedeo Buonanno et.al. 2507.15347 null
2025-07-21 StackTrans: From Large Language Model to Large Pushdown Automata Model Kechi Zhang et.al. 2507.15343 null
2025-07-21 Reasoning Models are Test Exploiters: Rethinking Multiple-Choice Narun Raman et.al. 2507.15337 null
2025-07-21 On the Inevitability of Left-Leaning Political Bias in Aligned Language Models Thilo Hagendorff et.al. 2507.15328 null
2025-07-21 BenchDepth: Are We on the Right Way to Evaluate Depth Foundation Models? Zhenyu Li et.al. 2507.15321 null
2025-07-21 Butterfly Effects in Toolchains: A Comprehensive Analysis of Failed Parameter Filling in LLM Tool-Agent Systems Qian Xiong et.al. 2507.15296 null
2025-07-21 A Novel Self-Evolution Framework for Large Language Models Haoran Sun et.al. 2507.15281 null
2025-07-21 ChiMed 2.0: Advancing Chinese Medical Dataset in Facilitating Large Language Modeling Yuanhe Tian et.al. 2507.15275 null
2025-07-21 Conditional Video Generation for High-Efficiency Video Compression Fangqiu Yi et.al. 2507.15269 null
2025-07-21 IM-Chat: A Multi-agent LLM-based Framework for Knowledge Transfer in Injection Molding Industry Junhyeong Lee et.al. 2507.15268 null
2025-07-21 VLM-UDMC: VLM-Enhanced Unified Decision-Making and Motion Control for Urban Autonomous Driving Haichao Liu et.al. 2507.15266 null
2025-07-21 CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers Jiaqi Han et.al. 2507.15260 null
2025-07-21 MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations Deyun Zhang et.al. 2507.15255 null
2025-07-21 Input Reduction Enhanced LLM-based Program Repair Boyang Yang et.al. 2507.15251 null
2025-07-21 FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers Yanbing Zhang et.al. 2507.15249 null
2025-07-21 SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search Xiaofeng Shi et.al. 2507.15245 null
2025-07-21 Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse Autoencoders Krishna Kanth Nakka et.al. 2507.15227 null
2025-07-21 Solving Formal Math Problems by Decomposition and Iterative Reflection Yichi Zhou et.al. 2507.15225 null
2025-07-21 SimdBench: Benchmarking Large Language Models for SIMD-Intrinsic Code Generation Yibo He et.al. 2507.15224 null
2025-07-21 Hierarchical Part-based Generative Model for Realistic 3D Blood Vessel Siqi Chen et.al. 2507.15223 null
2025-07-21 Improving Joint Embedding Predictive Architecture with Diffusion Noise Yuping Qiu et.al. 2507.15216 null
2025-07-21 Collaborative Distillation Strategies for Parameter-Efficient Language Model Deployment Xiandong Meng et.al. 2507.15198 null
2025-07-21 Better Models and Algorithms for Learning Ising Models from Dynamics Jason Gaitonde et.al. 2507.15173 null
2025-07-20 What Level of Automation is "Good Enough"? A Benchmark of Large Language Models for Meta-Analysis Data Extraction Lingbo Li et.al. 2507.15152 null
2025-07-20 Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction Ce Zhang et.al. 2507.15130 null
2025-07-20 AnalogFed: Federated Discovery of Analog Circuit Topologies with Generative AI Qiufeng Li et.al. 2507.15104 null
2025-07-20 Filling the Gap: Is Commonsense Knowledge Generation useful for Natural Language Inference? Chathuri Jayaweera et.al. 2507.15100 null
2025-07-20 BleedOrigin: Dynamic Bleeding Source Localization in Endoscopic Submucosal Dissection via Dual-Stage Detection and Tracking Mengya Xu et.al. 2507.15094 null
2025-07-20 A Penalty Goes a Long Way: Measuring Lexical Diversity in Synthetic Texts Under Prompt-Influenced Length Variations Vijeta Deshpande et.al. 2507.15092 null
2025-07-20 Aesthetics is Cheap, Show me the Text: An Empirical Evaluation of State-of-the-Art Generative Models for OCR Peirong Zhang et.al. 2507.15085 null
2025-07-20 Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback Yiyuan Yang et.al. 2507.15066 null
2025-07-20 WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization Zhengwei Tao et.al. 2507.15061 null
2025-07-20 LibLMFuzz: LLM-Augmented Fuzz Target Generation for Black-box Libraries Ian Hardgrove et.al. 2507.15058 null
2025-07-20 Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding Yuanhan Zhang et.al. 2507.15028 null
2025-07-20 Deep Generative Models in Condition and Structural Health Monitoring: Opportunities, Limitations and Future Outlook Xin Yang et.al. 2507.15026 null
2025-07-20 Survey of GenAI for Automotive Software Development: From Requirements to Executable Code Nenad Petrovic et.al. 2507.15025 null
2025-07-20 RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback Qiaoyu Tang et.al. 2507.15024 null
2025-07-20 EduThink4AI: Translating Educational Critical Thinking into Multi-Agent LLM Systems Xinmeng Hou et.al. 2507.15015 null
2025-07-20 Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression Roy H. Jennings et.al. 2507.14997 null
2025-07-18 Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning Shashanka Venkataramanan et.al. 2507.14137 null
2025-07-18 NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining Maksim Kuprashevich et.al. 2507.14119 null
2025-07-18 CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning Xiaoya Li et.al. 2507.14111 null
2025-07-18 Automated Interpretation of Non-Destructive Evaluation Contour Maps Using Large Language Models for Bridge Condition Assessment Viraj Nishesh Darji et.al. 2507.14107 null
2025-07-18 Generative AI-Driven High-Fidelity Human Motion Simulation Hari Iyer et.al. 2507.14097 null
2025-07-18 Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track Brian Ondov et.al. 2507.14096 null
2025-07-18 DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration Xiyun Li et.al. 2507.14088 null
2025-07-18 DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits Garapati Keerthana et.al. 2507.14079 null
2025-07-18 Foundation Models as Class-Incremental Learners for Dermatological Image Classification Mohamed Elkhayat et.al. 2507.14050 null
2025-07-18 Evaluating the Effectiveness of Cost-Efficient Large Language Models in Benchmark Biomedical Tasks Israt Jahan et.al. 2507.14045 null
2025-07-18 TGIF: Talker Group-Informed Familiarization of Target Speaker Extraction Tsun-An Hsieh et.al. 2507.14044 null
2025-07-18 Architecting Human-AI Cocreation for Technical Services -- Interaction Modes and Contingency Factors Jochen Wulf et.al. 2507.14034 null
2025-07-18 KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models Lam Nguyen et.al. 2507.14032 null
2025-07-18 Moodifier: MLLM-Enhanced Emotion-Driven Image Editing Jiarong Ye et.al. 2507.14024 null
2025-07-18 Efficient Temporal Tokenization for Mobility Prediction with Large Language Models Haoyu He et.al. 2507.14017 null
2025-07-18 Leveraging Pathology Foundation Models for Panoptic Segmentation of Melanoma in H&E Images Jiaqi Lv et.al. 2507.13974 null
2025-07-18 DUALRec: A Hybrid Sequential and Language Model Framework for Context-Aware Movie Recommendation Yitong Li et.al. 2507.13957 null
2025-07-18 Cross-modal Causal Intervention for Alzheimer's Disease Prediction Yutao Jin et.al. 2507.13956 null
2025-07-18 Exploiting Primacy Effect To Improve Large Language Models Bianca Raimondi et.al. 2507.13949 null
2025-07-18 Generalist Forecasting with Frozen Video Models via Latent Diffusion Jacob C Walker et.al. 2507.13942 null
2025-07-18 Preprint: Did I Just Browse A Website Written by LLMs? Sichang "Steven" He et.al. 2507.13933 null
2025-07-18 Enhancing LiDAR Point Features with Foundation Model Priors for 3D Object Detection Yujian Mo et.al. 2507.13899 null
2025-07-18 Using LLMs to identify features of personal and professional skills in an open-response situational judgment test Cole Walsh et.al. 2507.13881 null
2025-07-18 Large Language Models as Innovators: A Framework to Leverage Latent Space Exploration for Novelty Discovery Mateusz Bystroński et.al. 2507.13874 null
2025-07-18 SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection Aleksandr Gashkov et.al. 2507.13859 null
2025-07-18 InTraVisTo: Inside Transformer Visualisation Tool Nicolò Brunello et.al. 2507.13858 null
2025-07-18 DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training Zhixin Wang et.al. 2507.13833 null
2025-07-18 Question-Answer Extraction from Scientific Articles Using Knowledge Graphs and Large Language Models Hosein Azarbonyad et.al. 2507.13827 null
2025-07-18 RAG-based Architectures for Drug Side Effect Retrieval in LLMs Shad Nygren et.al. 2507.13822 null
2025-07-18 Team of One: Cracking Complex Video QA with Model Synergy Jun Xie et.al. 2507.13820 null
2025-07-18 CodeEdu: A Multi-Agent Collaborative Platform for Personalized Coding Education Jianing Zhao et.al. 2507.13814 null
2025-07-18 SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing Yingying Zhang et.al. 2507.13812 null
2025-07-18 On-the-Fly Fine-Tuning of Foundational Neural Network Potentials: A Bayesian Neural Network Approach Tim Rensmeyer et.al. 2507.13805 null
2025-07-18 MolPIF: A Parameter Interpolation Flow Model for Molecule Generation Yaowei Jin et.al. 2507.13762 null
2025-07-18 PRIDE -- Parameter-Efficient Reduction of Identity Discrimination for Equality in LLMs Maluna Menke et.al. 2507.13743 null
2025-07-18 Can Synthetic Images Conquer Forgetting? Beyond Unexplored Doubts in Few-Shot Class-Incremental Learning Junsu Kim et.al. 2507.13739 null
2025-07-18 DailyLLM: Context-Aware Activity Log Generation Using Multi-Modal Sensors and LLMs Ye Tian et.al. 2507.13737 null
2025-07-18 The Judge Variable: Challenging Judge-Agnostic Legal Judgment Prediction Guillaume Zambrano et.al. 2507.13732 null
2025-07-18 LLaPipe: LLM-Guided Reinforcement Learning for Automated Data Preparation Pipeline Construction Jing Chang et.al. 2507.13712 null
2025-07-18 CogniQ-H: A Soft Hierarchical Reinforcement Learning Paradigm for Automated Data Preparation Jing Chang et.al. 2507.13710 null
2025-07-18 Consistent Explainers or Unreliable Narrators? Understanding LLM-generated Group Recommendations Cedric Waterschoot et.al. 2507.13705 null
2025-07-18 TopicAttack: An Indirect Prompt Injection Attack via Topic Transition Yulin Chen et.al. 2507.13686 null
2025-07-18 LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues Haoyang Li et.al. 2507.13681 null
2025-07-18 KiC: Keyword-inspired Cascade for Cost-Efficient Text Generation with LLMs Woo-Chan Kim et.al. 2507.13666 null
2025-07-18 CU-ICU: Customizing Unsupervised Instruction-Finetuned Language Models for ICU Datasets via Text-to-Text Transfer Transformer Teerapong Panboonyuen et.al. 2507.13655 null
2025-07-18 Towards channel foundation models (CFMs): Motivations, methodologies and opportunities Jun Jiang et.al. 2507.13637 null
2025-07-18 Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques Niveen O. Jaffal et.al. 2507.13629 null
2025-07-18 BifrostRAG: Bridging Dual Knowledge Graphs for Multi-Hop Question Answering in Construction Safety Yuxin Zhang et.al. 2507.13625 null
2025-07-18 Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters Shanbo Cheng et.al. 2507.13618 null
2025-07-18 Linguistic and Embedding-Based Profiling of Texts generated by Humans and Large Language Models Sergio E. Zanotto et.al. 2507.13614 null
2025-07-18 CoTasks: Chain-of-Thought based Video Instruction Tuning Tasks Yanan Wang et.al. 2507.13609 null
2025-07-18 GIFT: Gradient-aware Immunization of diffusion models against malicious Fine-Tuning with safe concepts retention Amro Abdalla et.al. 2507.13598 null
2025-07-17 A Collaborative Framework Integrating Large Language Model and Chemical Fragment Space: Mutual Inspiration for Lead Design Hao Tuo et.al. 2507.13580 null
2025-07-17 Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries Hyunji Nam et.al. 2507.13579 null
2025-07-17 LLM-Based Community Surveys for Operational Decision Making in Interconnected Utility Infrastructures Adaeze Okeukwu-Ogbonnaya et.al. 2507.13577 null
2025-07-17 Apple Intelligence Foundation Language Models: Tech Report 2025 Hanzhi Zhou et.al. 2507.13575 null
2025-07-17 Temporal Adaptation of Pre-trained Foundation Models for Music Structure Analysis Yixiao Zhang et.al. 2507.13572 null
2025-07-17 A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models Kirill Borodin et.al. 2507.13563 null
2025-07-17 Demystifying Feature Requests: Leveraging LLMs to Refine Feature Requests in Open-Source Software Pragyan K C et.al. 2507.13555 null
2025-07-17 GOFAI meets Generative AI: Development of Expert Systems by means of Large Language Models Eduardo C. Garrido-Merchán et.al. 2507.13550 null
2025-07-17 A Computational Approach to Modeling Conversational Systems: Analyzing Large-Scale Quasi-Patterned Dialogue Flows Mohamed Achref Ben Ammar et.al. 2507.13544 null
2025-07-17 Provable Low-Frequency Bias of In-Context Learning of Representations Yongyi Yang et.al. 2507.13540 null
2025-07-17 Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation Genki Kusano et.al. 2507.13525 null
2025-07-17 Humans learn to prefer trustworthy AI over human partners Yaomin Jiang et.al. 2507.13524 null
2025-07-17 GraphTrafficGPT: Enhancing Traffic Management Through Graph-Based AI Agent Coordination Nabil Abdelaziz Ferhat Taleb et.al. 2507.13511 null
2025-07-17 Fake or Real: The Impostor Hunt in Texts for Space Operations Agata Kaczmarek et.al. 2507.13508 null
2025-07-17 Revisiting LLM Value Probing Strategies: Are They Robust and Expressive? Siqi Shen et.al. 2507.13490 null
2025-07-17 Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers Liang Lin et.al. 2507.13474 null
2025-07-17 ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations Shiye Cao et.al. 2507.13468 null
2025-07-17 "PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models Jing Gu et.al. 2507.13428 null
2025-07-17 VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding Shihao Wang et.al. 2507.13353 null
2025-07-17 Hierarchical Rectified Flow Matching with Mini-Batch Couplings Yichi Zhang et.al. 2507.13350 null
2025-07-17 Imbalance in Balance: Online Concept Balancing in Generation Models Yukai Shi et.al. 2507.13345 null
2025-07-17 Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes Tyler Loakman et.al. 2507.13335 null
2025-07-17 A Survey of Context Engineering for Large Language Models Lingrui Mei et.al. 2507.13334 null
2025-07-17 The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner Zhouqi Hua et.al. 2507.13332 null
2025-07-17 GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM Kyeongjin Ahn et.al. 2507.13323 null
2025-07-17 Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark Junsu Kim et.al. 2507.13314 null
2025-07-17 The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations Carlos Arriaga et.al. 2507.13302 null
2025-07-17 AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research Yilun Zhao et.al. 2507.13300 null
2025-07-17 Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management Luis Gasco et.al. 2507.13275 null
2025-07-17 Automating Steering for Safe Multimodal Large Language Models Lyucheng Wu et.al. 2507.13255 null
2025-07-17 RemVerse: Supporting Reminiscence Activities for Older Adults through AI-Assisted Virtual Reality Ruohao Li et.al. 2507.13247 null
2025-07-17 HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models Ashray Gupta et.al. 2507.13238 null
2025-07-17 Enhancing Cross-task Transfer of Large Language Models via Activation Steering Xinyu Tang et.al. 2507.13236 null
2025-07-17 VITA: Vision-to-Action Flow Matching Policy Dechen Gao et.al. 2507.13231 null
2025-07-18 MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling Etienne Le Naour et.al. 2507.13207 null
2025-07-18 Automatically assessing oral narratives of Afrikaans and isiXhosa children Retief Louw et.al. 2507.13205 null
2025-07-17 Black Box Deployed -- Functional Criteria for Artificial Moral Agents in the LLM Era Matthew E. Brophy et.al. 2507.13175 null
2025-07-17 SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks Kutub Uddin et.al. 2507.13170 null
2025-07-17 Online Rounding for Set Cover under Subset Arrivals Jarosław Byrka et.al. 2507.13159 null
2025-07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Hao Sun et.al. 2507.13158 null
2025-07-17 Multi-population GAN Training: Analyzing Co-Evolutionary Algorithms Walter P. Casas et.al. 2507.13157 null
2025-07-17 SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models Xiangyu Dong et.al. 2507.13152 null
2025-07-17 DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model Maulana Bisyir Azhari et.al. 2507.13145 null
2025-07-17 RIDAS: A Multi-Agent Framework for AI-RAN with Representation- and Intention-Driven Agents Kuiyuan Ding et.al. 2507.13140 null
2025-07-17 Detecting LLM-generated Code with Subtle Modification by Adversarial Training Xin Yin et.al. 2507.13123 null
2025-07-17 A Computational Framework to Identify Self-Aspects in Text Jaya Caporusso et.al. 2507.13115 null
2025-07-17 R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning Xiaohan Guo et.al. 2507.13107 null
2025-07-17 Intelligent Virtual Sonographer (IVS): Enhancing Physician-Robot-Patient Communication Tianyu Song et.al. 2507.13052 null
2025-07-17 MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems Yu Cui et.al. 2507.13038 null
2025-07-17 Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities Liuyi Wang et.al. 2507.13019 null
2025-07-17 Teach Old SAEs New Domain Tricks with Boosting Nikita Koriagin et.al. 2507.12990 null
2025-07-17 A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints Youssef Tawfilis et.al. 2507.12979 null
2025-07-17 UniSLU: Unified Spoken Language Understanding from Heterogeneous Cross-Task Datasets Zhichao Sheng et.al. 2507.12951 null
2025-07-17 Insights into a radiology-specialised multimodal large language model with sparse autoencoders Kenza Bouzid et.al. 2507.12950 null
2025-07-17 Probabilistic Soundness Guarantees in LLM Reasoning Chains Weiqiu You et.al. 2507.12948 null
2025-07-17 Analysis of Image-and-Text Uncertainty Propagation in Multimodal Large Language Models with Cardiac MR-Based Applications Yucheng Tang et.al. 2507.12945 null
2025-07-17 Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion Caixia Dong et.al. 2507.12938 null
2025-07-17 Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models Yifan Xu et.al. 2507.12916 null
2025-07-17 Agentar-DeepFinance-300K: A Large-Scale Financial Dataset via Systematic Chain-of-Thought Synthesis Optimization Xiaoke Zhao et.al. 2507.12901 null
2025-07-17 Generalist Bimanual Manipulation via Foundation Video Diffusion Models Yao Feng et.al. 2507.12898 null
2025-07-17 DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization Huakang Chen et.al. 2507.12890 null
2025-07-17 VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks Jian Yao et.al. 2507.12885 null
2025-07-17 Generative Multi-Target Cross-Domain Recommendation Jinqiu Jin et.al. 2507.12871 null
2025-07-17 Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved) Chongli Qin et.al. 2507.12856 null
2025-07-17 DEMONSTRATE: Zero-shot Language to Robotic Control via Multi-task Demonstration Learning Rahel Rickenbach et.al. 2507.12855 null
2025-07-17 AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning Yiming Ren et.al. 2507.12841 null
2025-07-17 Bridging the Gap: Leveraging Retrieval-Augmented Generation to Better Understand Public Concerns about Vaccines Muhammad Javed et.al. 2507.12840 null
2025-07-17 MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval Jeong-Woo Park et.al. 2507.12819 null
2025-07-17 Large Language Models' Internal Perception of Symbolic Music Andrew Shin et.al. 2507.12808 null
2025-07-17 Semantic-guided Fine-tuning of Foundation Model for Long-tailed Visual Recognition Yufei Peng et.al. 2507.12807 null
2025-07-17 MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models Zhiwei Liu et.al. 2507.12806 null
2025-07-17 DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment Junjie Gao et.al. 2507.12796 null
2025-07-17 Learning Robust Negation Text Representations Thinh Hung Truong et.al. 2507.12782 null
2025-07-17 A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models Weijieying Ren et.al. 2507.12774 null
2025-07-17 Local Representative Token Guided Merging for Text-to-Image Generation Min-Jeong Lee et.al. 2507.12771 null
2025-07-17 Think-Before-Draw: Decomposing Emotion Semantics & Fine-Grained Controllable Expressive Talking Head Generation Hanlei Shi et.al. 2507.12761 null
2025-07-17 osmAG-LLM: Zero-Shot Open-Vocabulary Object Navigation via Semantic Maps and Large Language Models Reasoning Fujing Xie et.al. 2507.12753 null
2025-07-17 Multimodal-Guided Dynamic Dataset Pruning for Robust and Efficient Data-Centric Learning Suorong Yang et.al. 2507.12750 null
2025-07-17 Strategy Adaptation in Large Language Model Werewolf Agents Fuya Nakamori et.al. 2507.12732 null
2025-07-17 PinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform Xiangyi Chen et.al. 2507.12704 null
2025-07-17 Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images Zahra TehraniNasab et.al. 2507.12698 null
2025-07-16 Improving Drug Identification in Overdose Death Surveillance using Large Language Models Arthur J. Funnell et.al. 2507.12679 null
2025-07-16 ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle Mihran Miroyan et.al. 2507.12674 null
2025-07-16 The first open machine translation system for the Chechen language Abu-Viskhan A. Umishov et.al. 2507.12672 null
2025-07-16 Single Conversation Methodology: A Human-Centered Protocol for AI-Assisted Software Development Salvador D. Escobedo et.al. 2507.12665 null
2025-07-16 VLMgineer: Vision Language Models as Robotic Toolsmiths George Jiayuan Gao et.al. 2507.12644 null
2025-07-16 NLI4VolVis: Natural Language Interaction for Volume Visualization via LLM Multi-Agents and Editable 3D Gaussian Splatting Kuangshi Ai et.al. 2507.12621 null
2025-07-16 BootSeer: Analyzing and Mitigating Initialization Bottlenecks in Large-Scale LLM Training Rui Li et.al. 2507.12619 null
2025-07-16 Learning What Matters: Probabilistic Task Selection via Mutual Information for Model Finetuning Prateek Chanda et.al. 2507.12612 null
2025-07-16 Enhancing In-Domain and Out-Domain EmoFake Detection via Cooperative Multilingual Speech Foundation Models Orchid Chetia Phukan et.al. 2507.12595 null
2025-07-16 Assay2Mol: large language model-based drug design using BioAssay context Yifan Deng et.al. 2507.12574 null
2025-07-16 Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models Gen Luo et.al. 2507.12566 null
2025-07-17 PhysX: Physical-Grounded 3D Asset Generation Ziang Cao et.al. 2507.12465 null
2025-07-16 CytoSAE: Interpretable Cell Embeddings for Hematology Muhammed Furkan Dasdelen et.al. 2507.12464 null
2025-07-16 Mitigating Object Hallucinations via Sentence-Level Early Intervention Shangpin Peng et.al. 2507.12455 null
2025-07-16 Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models Yik Siu Chan et.al. 2507.12428 null
2025-07-16 Advancing Retrieval-Augmented Generation for Structured Enterprise and Internal Data Chandana Cheerla et.al. 2507.12425 null
2025-07-16 SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Xinyi He et.al. 2507.12415 null
2025-07-16 Modeling Feasible Locomotion of Nanobots for Cancer Detection and Treatment Noble Harasha et.al. 2507.12400 null
2025-07-16 Assessing the Value of Visual Input: A Benchmark of Multimodal Large Language Models for Robotic Path Planning Jacinto Colan et.al. 2507.12391 null
2025-07-16 Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics Meysam Alizadeh et.al. 2507.12372 null
2025-07-16 Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate Ana Davila et.al. 2507.12370 null
2025-07-16 GitChameleon: Evaluating AI Code Generation Against Python Library Version Incompatibilities Diganta Misra et.al. 2507.12367 null
2025-07-16 Thought Purity: Defense Paradigm For Chain-of-Thought Attack Zihao Xue et.al. 2507.12314 null
2025-07-16 Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization Prashanth Vijayaraghavan et.al. 2507.12308 null
2025-07-16 Humans are more gullible than LLMs in believing common psychological myths Bevan Koopman et.al. 2507.12296 null
2025-07-16 Text-ADBench: Text Anomaly Detection Benchmark based on LLMs Embedding Feng Xiao et.al. 2507.12295 null
2025-07-16 SHACL Validation in the Presence of Ontologies: Semantics and Rewriting Techniques Anouk Oudshoorn et.al. 2507.12286 null
2025-07-16 FADE: Adversarial Concept Erasure in Flow Models Zixuan Fu et.al. 2507.12283 null
2025-07-17 Next-Gen Museum Guides: Autonomous Navigation and Visitor Interaction with an Agentic Robot Luca Garello et.al. 2507.12273 null
2025-07-16 Improving Contextual ASR via Multi-grained Fusion with Large Language Models Shilin Zhou et.al. 2507.12252 null
2025-07-16 Generate to Ground: Multimodal Text Conditioning Boosts Phrase Grounding in Medical Vision-Language Models Felix Nützel et.al. 2507.12236 null
2025-07-16 MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM Tao Chen et.al. 2507.12232 null
2025-07-16 Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning Yuhao Chen et.al. 2507.12215 null
2025-07-16 Draw an Ugly Person An Exploration of Generative AIs Perceptions of Ugliness Garyoung Kim et.al. 2507.12212 null
2025-07-16 BuildEvo: Designing Building Energy Consumption Forecasting Heuristics via LLM-driven Evolution Subin Lin et.al. 2507.12207 null
2025-07-16 Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage Junqing Lin et.al. 2507.12205 null
2025-07-16 RODS: Robust Optimization Inspired Diffusion Sampling for Detecting and Reducing Hallucination in Generative Models Yiqi Tian et.al. 2507.12201 null
2025-07-16 Multi-Component VAE with Gaussian Markov Random Field Fouad Oubari et.al. 2507.12165 null
2025-07-16 PRISM: Distributed Inference for Foundation Models at Edge Muhammad Azlan Qazi et.al. 2507.12145 null
2025-07-16 Overview of the Sensemaking Task at the ELOQUENT 2025 Lab: LLMs as Teachers, Students and Evaluators Pavel Šindelář et.al. 2507.12143 null
2025-07-16 RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization Vladimir Bogachev et.al. 2507.12142 null
2025-07-16 Room Impulse Response Generation Conditioned on Acoustic Parameters Silvia Arellano et.al. 2507.12136 null
2025-07-16 Iterative Augmentation with Summarization Refinement (IASR) Evaluation for Unstructured Survey data Modeling and Analysis Payal Bhattad et.al. 2507.12126 null
2025-07-16 Open-Vocabulary Indoor Object Grounding with 3D Hierarchical Scene Graph Sergey Linok et.al. 2507.12123 null
2025-07-16 DeepShade: Enable Shade Simulation by Text-conditioned Image Generation Longchao Da et.al. 2507.12103 null
2025-07-16 LLAMA: Multi-Feedback Smart Contract Fuzzing Framework with LLM-Guided Seed Generation Keke Gai et.al. 2507.12084 null
2025-07-16 Findings of MEGA: Maths Explanation with LLMs using the Socratic Method for Active Learning Tosin Adewumi et.al. 2507.12079 null
2025-07-16 Evaluating the Ability of Large Language Models to Reason about Cardinal Directions, Revisited Anthony G Cohn et.al. 2507.12059 null
2025-07-16 FloGAN: Scenario-Based Urban Mobility Flow Generation via Conditional GANs and Dynamic Region Decoupling Seanglidet Yean et.al. 2507.12053 null
2025-07-16 A Comparative Approach to Assessing Linguistic Creativity of Large Language Models and Humans Anca Dinu et.al. 2507.12039 null
2025-07-16 3D-MoRe: Unified Modal-Contextual Reasoning for Embodied Question Answering Rongtao Xu et.al. 2507.12026 null
2025-07-16 EME-TTS: Unlocking the Emphasis and Emotion Link in Speech Synthesis Haoxun Li et.al. 2507.12015 null
2025-07-16 DSSD: Efficient Edge-Device Deployment and Collaborative Inference via Distributed Split Speculative Decoding Jiahong Ning et.al. 2507.12000 null
2025-07-16 Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection Tairan Huang et.al. 2507.11997 null
2025-07-16 Robust Planning for Autonomous Vehicles with Diffusion-Based Failure Samplers Juanran Wang et.al. 2507.11991 null
2025-07-16 Aime: Towards Fully-Autonomous Multi-Agent Framework Yexuan Shi et.al. 2507.11988 null
2025-07-16 Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions Lukas Ellinger et.al. 2507.11981 null
2025-07-16 Value-Based Large Language Model Agent Simulation for Mutual Evaluation of Trust and Interpersonal Closeness Yuki Sakamoto et.al. 2507.11979 null
2025-07-16 Graph Representations for Reading Comprehension Analysis using Large Language Model and Eye-Tracking Biomarker Yuhong Zhang et.al. 2507.11972 null
2025-07-16 Watch, Listen, Understand, Mislead: Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation Sahid Hossain Mustakim et.al. 2507.11968 null
2025-07-16 Toxicity-Aware Few-Shot Prompting for Low-Resource Singlish Translation Ziyu Ge et.al. 2507.11966 null
2025-07-16 PoTPTQ: A Two-step Power-of-Two Post-training for LLMs Xinyu Wang et.al. 2507.11959 null
2025-07-16 The benefits of query-based KGQA systems for complex and temporal questions in LLM era Artem Alekseev et.al. 2507.11954 null
2025-07-16 BlockBPE: Parallel BPE Tokenization Amos You et.al. 2507.11941 null
2025-07-16 A Multi-Level Similarity Approach for Single-View Object Grasping: Matching, Planning, and Fine-Tuning Hao Chen et.al. 2507.11938 null
2025-07-16 A Survey of Deep Learning for Geometry Problem Solving Jianzhe Ma et.al. 2507.11936 null
2025-07-16 Hyperphantasia: A Benchmark for Evaluating the Mental Visualization Capabilities of Multimodal LLMs Mohammad Shahab Sepehri et.al. 2507.11932 null
2025-07-16 From Generative to Episodic: Sample-Efficient Replicable Reinforcement Learning Max Hopkins et.al. 2507.11926 null
2025-07-16 Marco-Bench-MIF: On Multilingual Instruction-Following Capability of Large Language Models Bo Zeng et.al. 2507.11882 null
2025-07-16 DualReward: A Dynamic Reinforcement Learning Framework for Cloze Tests Distractor Generation Tianyou Huang et.al. 2507.11875 null
2025-07-16 CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching Sidharth Kannan et.al. 2507.11842 null
2025-07-16 The Evolving Role of Large Language Models in Scientific Innovation: Evaluator, Collaborator, and Scientist Haoxuan Zhang et.al. 2507.11810 null
2025-07-16 Tracing Facts or just Copies? A critical investigation of the Competitions of Mechanisms in Large Language Models Dante Campregher et.al. 2507.11809 null
2025-07-15 Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation Alessandro Palma et.al. 2507.11789 null
2025-07-15 Foundation Models for Brain Signals: A Critical Review of Current Progress and Future Directions Gayal Kuruppu et.al. 2507.11783 null
2025-07-15 Large-scale distributed synchronization systems, using a cancel-on-completion redundancy mechanism Alexander Stolyar et.al. 2507.11779 null
2025-07-15 Scaling laws for activation steering with Llama 2 models and refusal mechanisms Sheikh Abdur Raheem Ali et.al. 2507.11771 null
2025-07-15 LLMs are Bayesian, in Expectation, not in Realization Leon Chlon et.al. 2507.11768 null
2025-07-15 Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning Fan Shi et.al. 2507.11761 null
2025-07-15 CRABS: A syntactic-semantic pincer strategy for bounding LLM interpretation of Python notebooks Meng Li et.al. 2507.11742 null
2025-07-15 Auto-Formulating Dynamic Programming Problems with Large Language Models Chenyu Zhou et.al. 2507.11737 null
2025-07-15 Subgraph Generation for Generalizing on Out-of-Distribution Links Jay Revolinsky et.al. 2507.11710 null
2025-07-15 MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization Atharva Naik et.al. 2507.11687 null
2025-07-15 Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification Moises Andrade et.al. 2507.11662 null
2025-07-15 Deep Generative Methods and Tire Architecture Design Fouad Oubari et.al. 2507.11639 null
2025-07-15 Interpretable Prediction of Lymph Node Metastasis in Rectal Cancer MRI Using Variational Autoencoders Benjamin Keel et.al. 2507.11638 null
2025-07-15 MapIQ: Benchmarking Multimodal Large Language Models for Map Question Answering Varun Srivastava et.al. 2507.11625 null
2025-07-15 k-Contextuality as a Heuristic for Memory Separations in Learning Mariesa H. Teo et.al. 2507.11604 null
2025-07-15 SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics Suyuan Zhao et.al. 2507.11588 null
2025-07-15 Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation Zhen Xu et.al. 2507.11540 null
2025-07-15 Streaming 4D Visual Geometry Transformer Dong Zhuo et.al. 2507.11539 null
2025-07-15 DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering Yinsheng Li et.al. 2507.11527 null
2025-07-15 LLM-based ambiguity detection in natural language instructions for collaborative surgical robots Ana Davila et.al. 2507.11525 null
2025-07-15 AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air Shiyi Yang et.al. 2507.11515 null
2025-07-15 HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing Pan Du et.al. 2507.11474 null
2025-07-15 LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer Yaoxian Dong et.al. 2507.11457 null
2025-07-15 Implementing Adaptations for Vision AutoRegressive Model Kaif Shaikh et.al. 2507.11441 null
2025-07-15 Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models Paul A. Bereuter et.al. 2507.11427 null
2025-07-16 Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize? Yanjian Zhang et.al. 2507.11423 null
2025-07-15 Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations Miray Özcan et.al. 2507.11417 null
2025-07-15 Seq vs Seq: An Open Suite of Paired Encoders and Decoders Orion Weller et.al. 2507.11412 null
2025-07-15 KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning? Soumadeep Saha et.al. 2507.11408 null
2025-07-15 EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes LG AI Research et.al. 2507.11407 null
2025-07-15 DCR: Quantifying Data Contamination in LLMs Evaluation Cheng Xu et.al. 2507.11405 null
2025-07-15 Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs Gabriel Bo et.al. 2507.11371 null
2025-07-15 From Chaos to Automation: Enabling the Use of Unstructured Data for Robotic Process Automation Kelly Kurowski et.al. 2507.11364 null
2025-07-15 What is the Best Process Model Representation? A Comparative Analysis for Process Modeling with Large Language Models Alexis Brissard et.al. 2507.11356 null
2025-07-15 Foundation Models for Logistics: Toward Certifiable, Conversational Planning Interfaces Yunhao Yang et.al. 2507.11352 null
2025-07-15 RefModel: Detecting Refactorings using Foundation Models Pedro Simões et.al. 2507.11346 null
2025-07-15 Guiding LLM Decision-Making with Fairness Reward Models Zara Hall et.al. 2507.11344 null
2025-07-15 MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network Jianfei Jiang et.al. 2507.11333 null
2025-07-15 Automated Novelty Evaluation of Academic Paper: A Collaborative Approach Integrating Human and Large Language Model Knowledge Wenqing Wu et.al. 2507.11330 null
2025-07-15 Internal Value Alignment in Large Language Models through Controlled Value Vector Activation Haoran Jin et.al. 2507.11316 null
2025-07-15 LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification Fengxiao Tang et.al. 2507.11310 null
2025-07-15 Dr.Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian Andrei Niculae et.al. 2507.11299 null
2025-07-15 Opus: A Prompt Intention Framework for Complex Workflow Generation Théo Fagnoni et.al. 2507.11288 null
2025-07-15 Taming Uncertainty via Automation: Observing, Analyzing, and Optimizing Agentic AI Systems Dany Moshkovich et.al. 2507.11277 null
2025-07-15 FMC: Formalization of Natural Language Mathematical Competition Problems Jiaxuan Xie et.al. 2507.11275 null
2025-07-15 KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding Luohe Shi et.al. 2507.11273 null
2025-07-15 An Empirical Study of Multi-Agent RAG for Real-World University Admissions Counseling Anh Nguyen-Duc et.al. 2507.11272 null
2025-07-15 MFGDiffusion: Mask-Guided Smoke Synthesis for Enhanced Forest Fire Detection Guanghao Wu et.al. 2507.11252 null
2025-07-15 Generative Click-through Rate Prediction with Applications to Search Advertising Lingwei Kong et.al. 2507.11246 null
2025-07-15 NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation Models X. Feng et.al. 2507.11245 null
2025-07-15 Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages Lyzander Marciano Andrylie et.al. 2507.11230 null
2025-07-15 An Agentic Flow for Finite State Machine Extraction using Prompt Chaining Fares Wael et.al. 2507.11222 null
2025-07-15 EsBBQ and CaBBQ: The Spanish and Catalan Bias Benchmarks for Question Answering Valle Ruiz-Fernández et.al. 2507.11216 null
2025-07-15 Role-Playing LLM-Based Multi-Agent Support Framework for Detecting and Addressing Family Communication Bias Rushia Harada et.al. 2507.11210 null
2025-07-15 Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding Conrad Borchers et.al. 2507.11198 null
2025-07-15 Mixture of Experts in Large Language Models Danyang Zhang et.al. 2507.11181 null
2025-07-15 Latent Space Consistency for Sparse-View CT Reconstruction Duoyou Chen et.al. 2507.11152 null
2025-07-15 What Should LLMs Forget? Quantifying Personal Data in LLMs for Right-to-Be-Forgotten Requests Dimitri Staufer et.al. 2507.11128 null
2025-07-15 MSA at ImageCLEF 2025 Multimodal Reasoning: Multilingual Multimodal Reasoning With Ensemble Vision Language Models Seif Ahmed et.al. 2507.11114 null
2025-07-15 Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs Sanhanat Sivapiromrat et.al. 2507.11112 null
2025-07-15 KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model Jie Yang et.al. 2507.11102 null
2025-07-15 The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs Zichen Wen et.al. 2507.11097 null
2025-07-15 EditGen: Harnessing Cross-Attention Control for Instruction-Based Auto-Regressive Audio Editing Vassilis Sioros et.al. 2507.11096 null
2025-07-15 Beyond Traditional Algorithms: Leveraging LLMs for Accurate Cross-Border Entity Identification Andres Azqueta-Gavaldón et.al. 2507.11086 null
2025-07-15 Function-to-Style Guidance of LLMs for Code Translation Longhui Zhang et.al. 2507.11083 null
2025-07-15 Tactical Decision for Multi-UGV Confrontation with a Vision-Language Model-Based Commander Li Wang et.al. 2507.11079 null
2025-07-15 LogTinyLLM: Tiny Large Language Models Based Contextual Log Anomaly Detection Isaiah Thompson Ocansey et.al. 2507.11071 null
2025-07-15 SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks Pavel Adamenko et.al. 2507.11059 null
2025-07-15 LLM-Augmented Symptom Analysis for Cardiovascular Disease Risk Prediction: A Clinical NLP Haowei Yang et.al. 2507.11052 null
2025-07-15 Aligned Query Expansion: Efficient Query Expansion for Information Retrieval through LLM Alignment Adam Yang et.al. 2507.11042 null
2025-07-15 Functional Emotion Modeling in Biomimetic Reinforcement Learning Louis Wang et.al. 2507.11027 null
2025-07-15 Incentivizing Knowledge Transfers Zhonghong Kuang et.al. 2507.11018 null
2025-07-15 First-Order Error Matters: Accurate Compensation for Quantized Large Language Models Xingyu Zheng et.al. 2507.11017 null
2025-07-15 SIMCODE: A Benchmark for Natural Language to ns-3 Network Simulation Code Generation Tasnim Ahmed et.al. 2507.11014 null
2025-07-15 Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation Yanbo Wang et.al. 2507.11001 null
2025-07-15 Teach Me Sign: Stepwise Prompting LLM for Sign Language Production Zhaoyi An et.al. 2507.10972 null
2025-07-15 DS@GT at eRisk 2025: From prompts to predictions, benchmarking early depression detection with conversational agent based assessments and temporal attention models Anthony Miyaguchi et.al. 2507.10958 null
2025-07-15 Modeling Understanding of Story-Based Analogies Using Large Language Models Kalit Inani et.al. 2507.10957 null
2025-07-15 Towards Practical Benchmarking of Data Cleaning Techniques: On Generating Authentic Errors via Large Language Models Xinyuan Liu et.al. 2507.10934 null
2025-07-15 Artificial Finance: How AI Thinks About Money Orhan Erdem et.al. 2507.10933 null
2025-07-15 Enhancing Safe and Controllable Protein Generation via Knowledge Preference Optimization Yuhao Wang et.al. 2507.10923 null
2025-07-15 HanjaBridge: Resolving Semantic Ambiguity in Korean LLMs via Hanja-Augmented Pre-Training Seungho Choi et.al. 2507.10920 null
2025-07-15 LLM-Driven Dual-Level Multi-Interest Modeling for Recommendation Ziyan Wang et.al. 2507.10917 null
2025-07-15 Lessons Learned from Evaluation of LLM based Multi-agents in Safer Therapy Recommendation Yicong Wu et.al. 2507.10911 null
2025-07-15 Evaluating Generated Commit Messages with Large Language Models Qunhong Zeng et.al. 2507.10906 null
2025-07-15 LiLM-RDB-SFC: Lightweight Language Model with Relational Database-Guided DRL for Optimized SFC Provisioning Parisa Fard Moshiri et.al. 2507.10903 null
2025-07-15 Object-Centric Mobile Manipulation through SAM2-Guided Perception and Imitation Learning Wang Zhicheng et.al. 2507.10899 null
2025-07-15 LLMATCH: A Unified Schema Matching Framework with Large Language Models Sha Wang et.al. 2507.10897 null
2025-07-15 Learning from Imperfect Data: Robust Inference of Dynamic Systems using Simulation-based Generative Model Hyunwoo Cho et.al. 2507.10884 null
2025-07-15 From Alerts to Intelligence: A Novel LLM-Aided Framework for Host-based Intrusion Detection Danyu Sun et.al. 2507.10873 null
2025-07-14 WhisperKit: On-device Real-time ASR with Billion-Scale Transformers Atila Orhon et.al. 2507.10860 null
2025-07-14 MultiVox: Benchmarking Voice Assistants for Multimodal Interactions Ramaneswaran Selvakumar et.al. 2507.10859 null
2025-07-14 LLMs on Trial: Evaluating Judicial Fairness for Large Language Models Yiran Hu et.al. 2507.10852 null
2025-07-14 LLM-Guided Agentic Object Detection for Open-World Understanding Furkan Mumcu et.al. 2507.10844 null
2025-07-14 REAL-IoT: Characterizing GNN Intrusion Detection Robustness under Practical Adversarial Attack Zhonghao Zhan et.al. 2507.10836 null
2025-07-14 Supporting SENĆOTEN Language Documentation Efforts with Automatic Speech Recognition Mengzhe Geng et.al. 2507.10827 null
2025-07-14 Semantic Context for Tool Orchestration Robert Müller et.al. 2507.10820 null
2025-07-14 How Robust are LLM-Generated Library Imports? An Empirical Study using Stack Overflow Jasmine Latendresse et.al. 2507.10818 null
2025-07-14 Versatile and Generalizable Manipulation via Goal-Conditioned Reinforcement Learning with Grounded Object Detection Huiyi Wang et.al. 2507.10814 null
2025-07-14 Automated Thematic Analyses Using LLMs: Xylazine Wound Management Social Media Chatter Use Case JaMor Hairston et.al. 2507.10803 null
2025-07-14 Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers Yilun Zhao et.al. 2507.10787 null
2025-07-14 Warehouse Spatial Question Answering with LLM Agent Hsiang-Wei Huang et.al. 2507.10778 null
2025-07-14 rt-RISeg: Real-Time Model-Free Robot Interactive Segmentation for Active Instance-Level Object Understanding Howard H. Qian et.al. 2507.10776 null
2025-07-14 Spatial Reasoners for Continuous Variables in Any Domain Bart Pogodzinski et.al. 2507.10768 null
2025-07-14 Integrating Biological Knowledge for Robust Microscopy Image Profiling on De Novo Cell Lines Jiayuan Chen et.al. 2507.10737 null
2025-07-14 Bridging Brains and Machines: A Unified Frontier in Neuroscience, Artificial Intelligence, and Neuromorphic Systems Sohan Shankar et.al. 2507.10722 null
2025-07-14 Exploring User Security and Privacy Attitudes and Concerns Toward the Use of General-Purpose LLM Chatbots for Mental Health Jabari Kwesi et.al. 2507.10695 null
2025-07-14 Machine-learning inference of stellar properties using integrated photometric and spectroscopic data Ilay Kamai et.al. 2507.10666 null
2025-07-14 Emulating Dark Matter Halo Merger Trees with Graph Generative Models Tri Nguyen et.al. 2507.10652 null
2025-07-14 MP1: Mean Flow Tames Policy Learning in 1-step for Robotic Manipulation Juyi Sheng et.al. 2507.10543 null
2025-07-14 Fusing LLM Capabilities with Routing Data Tao Feng et.al. 2507.10540 null
2025-07-14 Graph World Model Tao Feng et.al. 2507.10539 null
2025-07-14 CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks Hongchao Jiang et.al. 2507.10535 null
2025-07-14 Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Mingqi Wu et.al. 2507.10532 null
2025-07-14 Accurate generation of chemical reaction transition states by conditional flow matching Ping Tuo et.al. 2507.10530 null
2025-07-14 Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI Jiangkai Wu et.al. 2507.10510 null
2025-07-14 Scene-Aware Conversational ADAS with Generative AI for Real-Time Driver Assistance Kyungtae Han et.al. 2507.10500 null
2025-07-14 Can You Detect the Difference? İsmail Tarım et.al. 2507.10475 null
2025-07-14 MLAR: Multi-layer Large Language Model-based Robotic Process Automation Applicant Tracking Mohamed T. Younes et.al. 2507.10472 null
2025-07-14 An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments Mikko Korkiakoski et.al. 2507.10469 null
2025-07-14 Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems Hammad Atta et.al. 2507.10457 null
2025-07-14 Text-Visual Semantic Constrained AI-Generated Image Quality Assessment Qiang Li et.al. 2507.10432 null
2025-07-14 Towards Emotion Co-regulation with LLM-powered Socially Assistive Robots: Integrating LLM Prompts and Robotic Behaviors to Support Parent-Neurodivergent Child Dyads Jing Li et.al. 2507.10427 null
2025-07-14 Zorse: Optimizing LLM Training Efficiency on Heterogeneous GPU Clusters Runsheng Benson Guo et.al. 2507.10392 null
2025-07-14 Test-Time Canonicalization by Foundation Models for Robust Perception Utkarsh Singhal et.al. 2507.10375 null
2025-07-14 Using AI to replicate human experimental results: a motion study Rosa Illan Castillo et.al. 2507.10342 null
2025-07-14 Grammar-Guided Evolutionary Search for Discrete Prompt Optimisation Muzhaffar Hazman et.al. 2507.10326 null
2025-07-14 Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching Yuhan Liu et.al. 2507.10318 null
2025-07-14 Recognizing Dementia from Neuropsychological Tests with State Space Models Liming Wang et.al. 2507.10311 null
2025-07-14 DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs Jiahe Zhao et.al. 2507.10302 null
2025-07-14 FaceLLM: A Multimodal Large Language Model for Face Understanding Hatef Otroshi Shahreza et.al. 2507.10300 null
2025-07-14 Prompt Informed Reinforcement Learning for Visual Coverage Path Planning Venkat Margapuri et.al. 2507.10284 null
2025-07-14 Cross-Timeslot Optimization for Distributed GPU Inference Using Reinforcement Learning Chengze Du et.al. 2507.10259 null
2025-07-14 Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection Jinglun Li et.al. 2507.10225 null
2025-07-14 Absher: A Benchmark for Evaluating Large Language Models Understanding of Saudi Dialects Renad Al-Monef et.al. 2507.10216 null
2025-07-14 A Training-Free, Task-Agnostic Framework for Enhancing MLLM Performance on High-Resolution Images Jaeseong Lee et.al. 2507.10202 null
2025-07-14 History Matching under Uncertainty of Geological Scenarios with Implicit Geological Realism Control with Generative Deep Learning and Graph Convolutions Gleb Shishaev et.al. 2507.10201 null
2025-07-14 Natural Language-based Assessment of L2 Oral Proficiency using LLMs Stefano Bannò et.al. 2507.10200 null
2025-07-14 Breaking the Myth: Can Small Models Infer Postconditions Too? Gehao Zhang et.al. 2507.10182 null
2025-07-14 Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving Wonung Kim et.al. 2507.10178 null
2025-07-14 Abusive text transformation using LLMs Rohitash Chandra et.al. 2507.10177 null
2025-07-14 Task-Based Flexible Feature Distillation for LLMs Khouloud Saadi et.al. 2507.10155 null
2025-07-14 Past-Future Scheduler for LLM Serving under SLA Guarantees Ruihao Gong et.al. 2507.10150 null
2025-07-14 DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation Ivan Martinović et.al. 2507.10118 null
2025-07-14 Accelerating Automatic Program Repair with Dual Retrieval-Augmented Fine-Tuning and Patch Generation on Large Language Models Hanyang Guo et.al. 2507.10103 null
2025-07-14 Fusing Large Language Models with Temporal Transformers for Time Series Forecasting Chen Su et.al. 2507.10098 null
2025-07-14 Towards High Supervised Learning Utility Training Data Generation: Data Pruning and Column Reordering Tung Sum Thomas Kwok et.al. 2507.10088 null
2025-07-14 Foundation Model Driven Robotics: A Comprehensive Review Muhammad Tayyab Khan et.al. 2507.10087 null
2025-07-14 Cultural Bias in Large Language Models: Evaluating AI Agents through Moral Questionnaires Simon Münker et.al. 2507.10073 null
2025-07-14 ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism Zedong Liu et.al. 2507.10069 null
2025-07-14 LLMShot: Reducing snapshot testing maintenance via LLMs Ergün Batuhan Kaynak et.al. 2507.10062 null
2025-07-14 GeLaCo: An Evolutionary Approach to Layer Compression David Ponce et.al. 2507.10059 null
2025-07-14 Explicit Vulnerability Generation with LLMs: An Investigation Beyond Adversarial Attacks Emir Bosnak et.al. 2507.10054 null
2025-07-14 Automating SPARQL Query Translations between DBpedia and Wikidata Malte Christian Bartels et.al. 2507.10045 null
2025-07-14 Towards Applying Large Language Models to Complement Single-Cell Foundation Models Steven Palayew et.al. 2507.10039 null
2025-07-14 EAT: QoS-Aware Edge-Collaborative AIGC Task Scheduling via Attention-Guided Diffusion Reinforcement Learning Zhifei Xu et.al. 2507.10026 null
2025-07-14 Qualitative Study for LLM-assisted Design Study Process: Strategies, Challenges, and Roles Shaolun Ruan et.al. 2507.10024 null
2025-07-14 The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents Lixu Wang et.al. 2507.10016 null
2025-07-14 (Almost) Free Modality Stitching of Foundation Models Jaisidh Singh et.al. 2507.10015 null
2025-07-14 Protective Factor-Aware Dynamic Influence Learning for Suicide Risk Prediction on Social Media Jun Li et.al. 2507.10008 null
2025-07-14 Deep Hidden Cognition Facilitates Reliable Chain-of-Thought Reasoning Zijun Chen et.al. 2507.10007 null
2025-07-14 Differentially Private Federated Low Rank Adaptation Beyond Fixed-Matrix Ming Wen et.al. 2507.09990 null
2025-07-14 Demonstrating the Octopi-1.5 Visual-Tactile-Language Model Samson Yu et.al. 2507.09985 null
2025-07-14 Tiny Reward Models Sarah Pan et.al. 2507.09973 null
2025-07-14 AnalogTester: A Large Language Model-Based Framework for Automatic Testbench Generation in Analog Circuit Design Weiyu Chen et.al. 2507.09965 null
2025-07-14 DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models Luolin Xiong et.al. 2507.09955 null
2025-07-14 Can GPT-4o mini and Gemini 2.0 Flash Predict Fine-Grained Fashion Product Attributes? A Zero-Shot Analysis Shubham Shukla et.al. 2507.09950 null
2025-07-14 Iceberg: Enhancing HLS Modeling with Synthetic Data Zijian Ding et.al. 2507.09948 null
2025-07-14 Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed Inference Jiaming Cheng et.al. 2507.09942 null
2025-07-14 Memorization Sinks: Isolating Memorization during LLM Training Gaurav R. Ghosal et.al. 2507.09937 null
2025-07-14 Enhancing Retrieval Augmented Generation with Hierarchical Text Segmentation Chunking Hai Toan Nguyen et.al. 2507.09935 null
2025-07-14 Mechanistic Interpretability of LoRA-Adapted Language Models for Nuclear Reactor Safety Applications Yoon Pyo Lee et.al. 2507.09931 null
2025-07-14 Solving dynamic portfolio selection problems via score-based diffusion models Ahmad Aghapour et.al. 2507.09916 null
2025-07-14 Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce Scenarios Siyue Yao et.al. 2507.09915 null
2025-07-14 TolerantECG: A Foundation Model for Imperfect Electrocardiogram Huynh Nguyen Dang et.al. 2507.09887 null
2025-07-14 VerifyBench: A Systematic Benchmark for Evaluating Reasoning Verifiers Across Domains Xuzhao Li et.al. 2507.09884 null
2025-07-14 AdaBrain-Bench: Benchmarking Brain Foundation Models for Brain-Computer Interface Applications Jiamin Wu et.al. 2507.09882 null
2025-07-14 Covering a Few Submodular Constraints and Applications Tanvi Bajpai et.al. 2507.09879 null
2025-07-14 ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models Yongheng Zhang et.al. 2507.09876 null
2025-07-14 Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition Qinyuan Ye et.al. 2507.09875 null
2025-07-14 Turning the Tide: Repository-based Code Reflection Wei Zhang et.al. 2507.09866 null
2025-07-14 A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends Yihao Ding et.al. 2507.09861 null
2025-07-14 Model-Grounded Symbolic Artificial Intelligence Systems Learning and Reasoning with Model-Grounded Symbolic Artificial Intelligence Systems Aniruddha Chattopadhyay et.al. 2507.09854 null
2025-07-14 Rethinking Prompt Optimization: Reinforcement, Diversification, and Migration in Blackbox LLMs MohammadReza Davari et.al. 2507.09839 null
2025-07-14 Generative Audio Language Modeling with Continuous-valued Tokens and Masked Next-Token Prediction Shu-wen Yang et.al. 2507.09834 null
2025-07-13 Generative Cognitive Diagnosis Jiatong Li et.al. 2507.09831 null
2025-07-13 Measuring What Matters: A Framework for Evaluating Safety Risks in Real-World LLM Applications Jia Yi Goh et.al. 2507.09820 null
2025-07-13 VRU-Accident: A Vision-Language Benchmark for Video Question Answering and Dense Captioning for Accident Scene Understanding Younggun Kim et.al. 2507.09815 null
2025-07-13 A Scalable and Efficient Signal Integration System for Job Matching Ping Liu et.al. 2507.09797 null
2025-07-13 CADmium: Fine-Tuning Code Language Models for Text-Driven Sequential CAD Design Prashant Govindarajan et.al. 2507.09792 null
2025-07-13 Prompting for Performance: Exploring LLMs for Configuring Software Helge Spieker et.al. 2507.09790 null
2025-07-13 TinyTroupe: An LLM-powered Multiagent Persona Simulation Toolkit Paulo Salem et.al. 2507.09788 null
2025-07-13 Efficient Molecular Conformer Generation with SO(3)-Averaged Flow Matching and Reflow Zhonglin Cao et.al. 2507.09785 null
2025-07-13 Do we need equivariant models for molecule generation? Ewa M. Nowara et.al. 2507.09753 null
2025-07-13 Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations Bradley P. Allen et.al. 2507.09751 null
2025-07-13 BrainFLORA: Uncovering Brain Concept Representation via Multimodal Neural Embeddings Dongyang Li et.al. 2507.09747 null
2025-07-13 Enhancing Trading Performance Through Sentiment Analysis with Large Language Models: Evidence from the S&P 500 Haojie Liu et.al. 2507.09739 null
2025-07-13 Continental scale habitat modelling with artificial intelligence and multimodal earth observation Sara Si-Moussi et.al. 2507.09732 null
2025-07-13 Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces Baturay Saglam et.al. 2507.09709 null
2025-07-13 MCEval: A Dynamic Framework for Fair Multilingual Cultural Evaluation of LLMs Shulin Huang et.al. 2507.09701 null
2025-07-13 ExpStar: Towards Automatic Commentary Generation for Multi-discipline Scientific Experiments Jiali Chen et.al. 2507.09693 null
2025-07-13 Prompt2DEM: High-Resolution DEMs for Urban and Open Environments from Global Prompts Using a Monocular Foundation Model Osher Rafaeli et.al. 2507.09681 null
2025-07-13 Can AI Rely on the Systematicity of Truth? The Challenge of Modelling Normative Domains Matthieu Queloz et.al. 2507.09676 null
2025-07-13 Is Quantization a Deal-breaker? Empirical Insights from Large Code Models Saima Afrin et.al. 2507.09665 null
2025-07-13 Towards Concise and Adaptive Thinking in Large Reasoning Models: A Survey Jason Zhu et.al. 2507.09662 null
2025-07-13 Negotiating Comfort: Simulating Personality-Driven LLM Agents in Shared Residential Social Networks Ann Nedime Nese Rende et.al. 2507.09657 null
2025-07-13 Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset Lily Hong Zhang et.al. 2507.09650 null
2025-07-13 Can Group Relative Policy Optimization Improve Thai Legal Reasoning and Question Answering? Pawitsapak Akarajaradwong et.al. 2507.09638 null
2025-07-13 Demystifying Flux Architecture Or Greenberg et.al. 2507.09595 null
2025-07-11 Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective Hangjie Yuan et.al. 2507.08801 null
2025-07-11 NeuralOS: Towards Simulating Operating Systems via Neural Generative Models Luke Rivard et.al. 2507.08800 null
2025-07-11 One Token to Fool LLM-as-a-Judge Yulai Zhao et.al. 2507.08794 null
2025-07-11 From One to More: Contextual Part Latents for 3D Generation Shaocong Dong et.al. 2507.08772 null
2025-07-11 BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity Chenyang Song et.al. 2507.08771 null
2025-07-11 Multilingual Multimodal Software Developer for Code Generation Linzheng Chai et.al. 2507.08719 null
2025-07-11 Unreal is all you need: Multimodal ISAC Data Simulation with Only One Engine Kongwu Huang et.al. 2507.08716 null
2025-07-11 KG-Attention: Knowledge Graph-Guided Attention at Test-Time via Bidirectional Information Aggregation Songlin Zhai et.al. 2507.08704 null
2025-07-11 ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way Rajarshi Roy et.al. 2507.08679 null
2025-07-11 LLMCup: Ranking-Enhanced Comment Updating with LLMs Hua Ge et.al. 2507.08671 null
2025-07-11 KELPS: A Framework for Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment Jiyao Zhang et.al. 2507.08665 null
2025-07-11 Introspection of Thought Helps AI Agents Haoran Sun et.al. 2507.08664 null
2025-07-11 Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning Xingguang Ji et.al. 2507.08649 null
2025-07-11 DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images Haoran Sun et.al. 2507.08648 null
2025-07-11 NL in the Middle: Code Translation with LLMs and Intermediate Representations Chi-en Amy Tai et.al. 2507.08627 null
2025-07-11 A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1 Marcin Pietroń et.al. 2507.08621 null
2025-07-11 Agentic Large Language Models for Conceptual Systems Engineering and Design Soheyl Massoudi et.al. 2507.08619 null
2025-07-11 AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs Florian Grötschla et.al. 2507.08616 null
2025-07-11 Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data Parag Dutta et.al. 2507.08610 null
2025-07-11 Unlocking Speech Instruction Data Potential with Query Rewriting Yonghua Hei et.al. 2507.08603 null
2025-07-11 Visual Semantic Description Generation with MLLMs for Image-Text Matching Junyu Chen et.al. 2507.08590 null
2025-07-11 To Trade or Not to Trade: An Agentic Approach to Estimating Market Risk Improves Trading Decisions Dimitrios Emmanoulopoulos et.al. 2507.08584 null
2025-07-11 Large Multi-modal Model Cartographic Map Comprehension for Textual Locality Georeferencing Kalana Wijegunarathna et.al. 2507.08575 null
2025-07-11 AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling Preslav Aleksandrov et.al. 2507.08567 null
2025-07-11 FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation Yuxuan Jiang et.al. 2507.08557 null
2025-07-11 White-Basilisk: A Hybrid Model for Code Vulnerability Detection Ioannis Lamprou et.al. 2507.08540 null
2025-07-11 The AI Language Proficiency Monitor -- Tracking the Progress of LLMs on Multilingual Benchmarks David Pomerenke et.al. 2507.08538 null
2025-07-11 A Multi-granularity Concept Sparse Activation and Hierarchical Knowledge Graph Fusion Framework for Rare Disease Diagnosis Mingda Zhang et.al. 2507.08529 null
2025-07-11 InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching Yilun Wang et.al. 2507.08523 null
2025-07-11 Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation Liu He et.al. 2507.08513 null
2025-07-11 From Language to Logic: A Bi-Level Framework for Structured Reasoning Keying Yang et.al. 2507.08501 null
2025-07-11 Semantic-Augmented Latent Topic Modeling with LLM-in-the-Loop Mengze Hong et.al. 2507.08498 null
2025-07-11 LLaPa: A Vision-Language Model Framework for Counterfactual-Aware Procedural Planning Shibo Sun et.al. 2507.08496 null
2025-07-11 A Third Paradigm for LLM Evaluation: Dialogue Game-Based Evaluation using clembench David Schlangen et.al. 2507.08491 null
2025-07-11 ILT-Iterative LoRA Training through Focus-Feedback-Fix for Multilingual Speech Recognition Qingliang Meng et.al. 2507.08477 null
2025-07-11 SynBridge: Bridging Reaction States via Discrete Flow for Bidirectional Reaction Prediction Haitao Lin et.al. 2507.08475 null
2025-07-11 Using Large Language Models for Legal Decision-Making in Austrian Value-Added Tax Law: An Experimental Study Marina Luketina et.al. 2507.08468 null
2025-07-11 F3-Net: Foundation Model for Full Abnormality Segmentation of Medical Images with Flexible Input Modality Requirement Seyedeh Sahar Taheri Otaghsara et.al. 2507.08460 null
2025-07-11 Diagnosing Failures in Large Language Models' Answers: Integrating Error Attribution into Evaluation Framework Zishan Xu et.al. 2507.08459 null
2025-07-11 A document is worth a structured record: Principled inductive bias design for document recognition Benjamin Meyer et.al. 2507.08458 null
2025-07-11 CUE-RAG: Towards Accurate and Cost-Efficient Graph-Based RAG via Multi-Partite Graph and Query-Driven Iterative Retrieval Yaodong Su et.al. 2507.08445 null
2025-07-11 Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation Anlin Zheng et.al. 2507.08441 null
2025-07-11 Finding Common Ground: Using Large Language Models to Detect Agreement in Multi-Agent Decision Conferences Selina Heller et.al. 2507.08440 null
2025-07-11 xpSHACL: Explainable SHACL Validation using Retrieval-Augmented Generation and Large Language Models Gustavo Correa Publio et.al. 2507.08432 null
2025-07-11 ChainEdit: Propagating Ripple Effects in LLM Knowledge Editing through Logical Rule-Guided Chains Zilu Dong et.al. 2507.08427 null
2025-07-11 Generative artificial intelligence and hybrid models to accelerate LES in reactive flows: Application to hydrogen/methane combustion Xiangrui Zou et.al. 2507.08426 null
2025-07-11 A Survey of Large Language Models in Discipline-specific Research: Challenges, Methods and Opportunities Lu Xiang et.al. 2507.08425 null
2025-07-11 InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes Zesong Yang et.al. 2507.08416 null
2025-07-11 Multi-modal Mutual-Guidance Conditional Prompt Learning for Vision-Language Models Shijun Yang et.al. 2507.08410 null
2025-07-11 PanMatch: Unleashing the Potential of Large Vision Models for Unified Matching Models Yongjian Zhang et.al. 2507.08400 null
2025-07-11 Understanding Driving Risks using Large Language Models: Toward Elderly Driver Assessment Yuki Yoshihara et.al. 2507.08367 null
2025-07-11 Leveraging Machine Learning and Enhanced Parallelism Detection for BPMN Model Generation from Text Phuong Nam Lê et.al. 2507.08362 null
2025-07-11 Cycle Context Verification for In-Context Medical Image Segmentation Shishuai Hu et.al. 2507.08357 null
2025-07-11 Exploring Design of Multi-Agent LLM Dialogues for Research Ideation Keisuke Ueda et.al. 2507.08350 null
2025-07-11 What Factors Affect LLMs and RLLMs in Financial Question Answering? Peng Wang et.al. 2507.08339 null
2025-07-11 CoCo-Bot: Energy-based Composable Concept Bottlenecks for Interpretable Generative Models Sangwon Kim et.al. 2507.08334 null
2025-07-11 CRMAgent: A Multi-Agent LLM System for E-Commerce CRM Message Template Generation Yinzhu Quan et.al. 2507.08325 null
2025-07-11 Generative AI in Science: Applications, Challenges, and Emerging Questions Ryan Harries et.al. 2507.08310 null
2025-07-11 Improving MLLM's Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency Yupu Liang et.al. 2507.08309 null
2025-07-11 M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning Inclusion AI et.al. 2507.08306 null
2025-07-11 KAT-V1: Kwai-AutoThink Technical Report Zizheng Zhan et.al. 2507.08297 null
2025-07-11 Invariant-based Robust Weights Watermark for Large Language Models Qingxiao Guo et.al. 2507.08288 null
2025-07-11 Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training Aleksei Ilin et.al. 2507.08284 null
2025-07-11 Agent Safety Alignment via Reinforcement Learning Zeyang Sha et.al. 2507.08270 null
2025-07-11 A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning Hiroshi Yoshihara et.al. 2507.08267 null
2025-07-11 CL3R: 3D Reconstruction and Contrastive Learning for Enhanced Robotic Manipulation Representations Wenbo Cui et.al. 2507.08262 null
2025-07-11 Quantum-Accelerated Neural Imputation with Large Language Models (LLMs) Hossein Jamali et.al. 2507.08255 null
2025-07-11 Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models Ulzee An et.al. 2507.08254 null
2025-07-11 Leveraging Large Language Models for Classifying App Users' Feedback Yasaman Abedini et.al. 2507.08250 null
2025-07-11 Time Variation in the TeV Cosmic Ray Anisotropy with IceCube and Energy Dependence of the Solar Dipole Perri Zilberman et.al. 2507.08242 null
2025-07-11 Data Generation without Function Estimation Hadi Daneshmand et.al. 2507.08239 null
2025-07-11 InsightBuild: LLM-Powered Causal Reasoning in Smart Building Systems Pinaki Prasad Guha Neogi et.al. 2507.08235 null
2025-07-11 Can LLMs Reliably Simulate Real Students' Abilities in Mathematics and Reading Comprehension? KV Aditya Srivatsa et.al. 2507.08232 null
2025-07-11 Making VLMs More Robot-Friendly: Self-Critical Distillation of Low-Level Procedural Reasoning Chan Young Park et.al. 2507.08224 null
2025-07-10 Effect of Static vs. Conversational AI-Generated Messages on Colorectal Cancer Screening Intent: a Randomized Controlled Trial Neil K. R. Sehgal et.al. 2507.08211 null
2025-07-10 Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions Quanyan Zhu et.al. 2507.08208 null
2025-07-10 A Dynamic Stackelberg Game Framework for Agentic AI Defense Against LLM Jailbreaking Zhengye Han et.al. 2507.08207 null
2025-07-10 TruthTorchLM: A Comprehensive Library for Predicting Truthfulness in LLM Outputs Duygu Nur Yaldiz et.al. 2507.08203 null
2025-07-10 Consciousness as a Jamming Phase Kaichen Ouyang et.al. 2507.08197 null
2025-07-10 CTRLS: Chain-of-Thought Reasoning via Latent State-Transition Junda Wu et.al. 2507.08182 null
2025-07-10 Analysis of Propaganda in Tweets From Politically Biased Sources Vivek Sharma et.al. 2507.08169 null
2025-07-10 KP-A: A Unified Network Knowledge Plane for Catalyzing Agentic Network Intelligence Yun Tang et.al. 2507.08164 null
2025-07-10 ALCo-FM: Adaptive Long-Context Foundation Model for Accident Prediction Pinaki Prasad Guha Neogi et.al. 2507.08153 null
2025-07-10 Distilling Empathy from Large Language Models Henry J. Xie et.al. 2507.08151 null
2025-07-10 Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores Vivek Chari et.al. 2507.08143 null
2025-07-10 GRASP: Generic Reasoning And SPARQL Generation across Knowledge Graphs Sebastian Walter et.al. 2507.08107 null
2025-07-10 Low-rank Momentum Factorization for Memory Efficient Training Pouria Mahdavinia et.al. 2507.08091 null
2025-07-10 Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions Simon Matrenok et.al. 2507.08068 null
2025-07-10 Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs Ziyue Li et.al. 2507.07996 null
2025-07-10 Multigranular Evaluation for Brain Visual Decoding Weihao Xia et.al. 2507.07993 null
2025-07-10 Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs Jeongseok Hyun et.al. 2507.07990 null
2025-07-10 Automating Expert-Level Medical Reasoning Evaluation of Large Language Models Shuang Zhou et.al. 2507.07988 null
2025-07-10 OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding JingLi Lin et.al. 2507.07984 null
2025-07-10 Performance and Practical Considerations of Large and Small Language Models in Clinical Decision Support in Rheumatology Sabine Felde et.al. 2507.07983 null
2025-07-10 Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling Haoyu Wu et.al. 2507.07982 null
2025-07-10 Defending Against Prompt Injection With a Few DefensiveTokens Sizhe Chen et.al. 2507.07974 null
2025-07-10 Scaling RL to Long Videos Yukang Chen et.al. 2507.07966 null
2025-07-10 Dynamic Chunking for End-to-End Hierarchical Sequence Modeling Sukjun Hwang et.al. 2507.07955 null
2025-07-10 Input Conditioned Layer Dropping in Speech Foundation Models Abdul Hannan et.al. 2507.07954 null
2025-07-10 Low Resource Reconstruction Attacks Through Benign Prompts Sol Yarkoni et.al. 2507.07947 null
2025-07-10 Can Large Language Models Improve Phishing Defense? A Large-Scale Controlled Experiment on Warning Dialogue Explanations Federico Maria Cau et.al. 2507.07916 null
2025-07-10 MIRA: A Novel Framework for Fusing Modalities in Medical RAG Jinhong Wang et.al. 2507.07902 null
2025-07-10 An Integrated Framework of Prompt Engineering and Multidimensional Knowledge Graphs for Legal Dispute Analysis Mingda Zhang et.al. 2507.07893 null
2025-07-10 Automating MD simulations for Proteins using Large language Models: NAMD-Agent Achuth Chandrasekhar et.al. 2507.07887 null
2025-07-10 Opting Out of Generative AI: a Behavioral Experiment on the Role of Education in Perplexity AI Avoidance Roberto Ulloa et.al. 2507.07881 null
2025-07-10 LISTEN: Lightweight Industrial Sound-representable Transformer for Edge Notification Changheon Han et.al. 2507.07879 null
2025-07-10 Mitigating Watermark Stealing Attacks in Generative Models via Multi-Key Watermarking Toluwani Aremu et.al. 2507.07871 null
2025-07-10 DocCHA: Towards LLM-Augmented Interactive Online diagnosis System Xinyi Liu et.al. 2507.07870 null
2025-07-10 THUNDER: Tile-level Histopathology image UNDERstanding benchmark Pierre Marza et.al. 2507.07860 null
2025-07-10 From Ambiguity to Accuracy: The Transformative Effect of Coreference Resolution on Retrieval-Augmented Generation systems Youngjoon Jang et.al. 2507.07847 null
2025-07-10 Towards Benchmarking Foundation Models for Tabular Data With Text Martin Mráz et.al. 2507.07829 null
2025-07-10 MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving Lu Xu et.al. 2507.07818 null
2025-07-10 Understanding and Controlling Repetition Neurons and Induction Heads in In-Context Learning Nhi Hoai Doan et.al. 2507.07810 null
2025-07-10 SecureSpeech: Prompt-based Speaker and Content Protection Belinda Soh Hui Hui et.al. 2507.07799 null
2025-07-10 Measuring AI Alignment with Human Flourishing Elizabeth Hilliard et.al. 2507.07787 null
2025-07-10 Where are we with calibration under dataset shift in image classification? Mélanie Roschewitz et.al. 2507.07780 null
2025-07-10 A Unified Empirical Risk Minimization Framework for Flexible N-Tuples Weak Supervision Shuying Huang et.al. 2507.07771 null
2025-07-10 Structured Prompts, Better Outcomes? Exploring the Effects of a Structured Interface with ChatGPT in a Graduate Robotics Course Jerome Brender et.al. 2507.07767 null
2025-07-10 When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance Peizhang Shao et.al. 2507.07748 null
2025-07-10 On the capabilities of LLMs for classifying and segmenting time series of fruit picking motions into primitive actions Eleni Konstantinidou et.al. 2507.07745 null
2025-07-10 GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing Peiyan Zhang et.al. 2507.07735 null
2025-07-10 Not All Preferences are What You Need for Post-Training: Selective Alignment Strategy for Preference Optimization Zhijin Dong et.al. 2507.07725 null
2025-07-10 KeyKnowledgeRAG (K^2RAG): An Enhanced RAG method for improved LLM question-answering capabilities Hruday Markondapatnaikuni et.al. 2507.07695 null
2025-07-10 From Domain Documents to Requirements: Retrieval-Augmented Generation in the Space Industry Chetan Arora et.al. 2507.07689 null
2025-07-10 Rationale-Enhanced Decoding for Multi-modal Chain-of-Thought Shin'ya Yamaguchi et.al. 2507.07685 null
2025-07-10 Accelerating Transposed Convolutions on FPGA-based Edge Devices Jude Haris et.al. 2507.07683 null
2025-07-10 Prompt Engineering for Requirements Engineering: A Literature Review and Roadmap Kaicheng Huang et.al. 2507.07682 null
2025-07-10 PlanQA: A Benchmark for Spatial Reasoning in LLMs using Structured Representations Fedor Rodionov et.al. 2507.07644 null
2025-07-10 FrugalRAG: Learning to retrieve and reason for multi-hop QA Abhinav Java et.al. 2507.07634 null
2025-07-10 T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates Zhitao Wang et.al. 2507.07633 null
2025-07-10 Exploring the Limits of Model Compression in LLMs: A Knowledge Distillation Study on QA Tasks Joyeeta Datta et.al. 2507.07630 null
2025-07-10 SpatialViz-Bench: Automatically Generated Spatial Visualization Reasoning Tasks for MLLMs Siting Wang et.al. 2507.07610 null
2025-07-10 Enhancing Vaccine Safety Surveillance: Extracting Vaccine Mentions from Emergency Department Triage Notes Using Fine-Tuned Large Language Models Sedigh Khademi et.al. 2507.07599 null
2025-07-10 NexViTAD: Few-shot Unsupervised Cross-Domain Defect Detection via Vision Foundation Models and Multi-Task Learning Tianwei Mu et.al. 2507.07579 null
2025-07-10 Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation Yupu Liang et.al. 2507.07572 null
2025-07-10 CEA-LIST at CheckThat! 2025: Evaluating LLMs as Detectors of Bias and Opinion in Text Akram Elbouanani et.al. 2507.07539 null
2025-07-10 MAPEX: Modality-Aware Pruning of Experts for Remote Sensing Foundation Models Joelle Hanna et.al. 2507.07527 null
2025-07-10 Toward Real-World Chinese Psychological Support Dialogues: CPsDD Dataset and a Co-Evolving Multi-Agent System Yuanchen Shi et.al. 2507.07509 null
2025-07-10 PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving Mihir Parmar et.al. 2507.07495 null
2025-07-10 Sparse Autoencoders Reveal Interpretable Structure in Small Gene Language Models Haoxiang Guan et.al. 2507.07486 null
2025-07-10 Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models Kaiqu Liang et.al. 2507.07484 null
2025-07-10 General purpose models for the chemical sciences Nawaf Alampara et.al. 2507.07456 null
2025-07-10 RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning Hongzhi Zhang et.al. 2507.07451 null
2025-07-10 StarDojo: Benchmarking Open-Ended Behaviors of Agentic Multimodal LLMs in Production-Living Simulations with Stardew Valley Weihao Tan et.al. 2507.07445 null
2025-07-10 SAND: Boosting LLM Agents with Self-Taught Action Deliberation Yu Xia et.al. 2507.07441 null
2025-07-10 Towards Interpretable Time Series Foundation Models Matthieu Boileau et.al. 2507.07439 null
2025-07-10 Neural networks leverage nominally quantum and post-quantum representations Paul M. Riechers et.al. 2507.07432 null
2025-07-10 DrugMCTS: a drug repurposing framework combining multi-agent, RAG and Monte Carlo Tree Search Zerui Yang et.al. 2507.07426 null
2025-07-10 Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning Jingjing Jiang et.al. 2507.07424 null
2025-07-10 May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks Nishit V. Pandya et.al. 2507.07417 null
2025-07-10 EPIC: Efficient Prompt Interaction for Text-Image Classification Xinyao Yu et.al. 2507.07415 null
2025-07-10 GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation Fardin Rastakhiz et.al. 2507.07414 null
2025-07-10 Hybrid LLM-Enhanced Intrusion Detection for Zero-Day Threats in IoT Networks Mohammad F. Al-Hammouri et.al. 2507.07413 null
2025-07-10 Phishing Detection in the Gen-AI Era: Quantized LLMs vs Classical Models Jikesh Thapa et.al. 2507.07406 null
2025-07-10 KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows Zaifeng Pan et.al. 2507.07400 null
2025-07-10 Behave Your Motion: Habit-preserved Cross-category Animal Motion Transfer Zhimin Zhang et.al. 2507.07394 null
2025-07-10 Learning Collective Variables from Time-lagged Generation Seonghyun Park et.al. 2507.07390 null
2025-07-10 Bradley-Terry and Multi-Objective Reward Modeling Are Complementary Zhiwei Zhang et.al. 2507.07375 null
2025-07-10 PacGDC: Label-Efficient Generalizable Depth Completion with Projection Ambiguity and Consistency Haotian Wang et.al. 2507.07374 null
2025-07-09 On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment Sarah Ball et.al. 2507.07341 null
2025-07-09 Bridging the Plausibility-Validity Gap by Fine-Tuning a Reasoning-Enhanced LLM for Chemical Synthesis and Discovery Malikussaid et.al. 2507.07328 null
2025-07-09 Frontier LLMs Still Struggle with Simple Reasoning Tasks Alan Malek et.al. 2507.07313 null
2025-07-09 Multi-Agent Retrieval-Augmented Framework for Evidence-Based Counterspeech Against Health Misinformation Anirban Saha Anik et.al. 2507.07307 null
2025-07-09 Application of LLMs to Multi-Robot Path Planning and Task Allocation Ashish Kumar et.al. 2507.07302 null
2025-07-09 Time Series Foundation Models for Multivariate Financial Time Series Forecasting Ben A. Marconi et.al. 2507.07296 null
2025-07-09 Thermodynamic Prediction Enabled by Automatic Dataset Building and Machine Learning Juejing Liu et.al. 2507.07293 null
2025-07-09 Open Source Planning & Control System with Language Agents for Autonomous Scientific Discovery Licong Xu et.al. 2507.07257 null
2025-07-09 A Language-Driven Framework for Improving Personalized Recommendations: Merging LLMs with Traditional Algorithms Aaron Goldstein et.al. 2507.07251 null
2025-07-09 Medical Red Teaming Protocol of Language Models: On the Importance of User Perspectives in Healthcare Settings Minseon Kim et.al. 2507.07248 null
2025-07-09 Attentions Under the Microscope: A Comparative Study of Resource Utilization for Variants of Self-Attention Zhengyu Tian et.al. 2507.07247 null
2025-07-09 An Information-Theoretic Perspective on Multi-LLM Uncertainty Estimation Maya Kruse et.al. 2507.07236 null
2025-07-09 SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains Krithika Ramesh et.al. 2507.07229 null
2025-07-09 Compute Can't Handle the Truth: Why Communication Tax Prioritizes Memory and Interconnects in Modern AI Infrastructure Myoungsoo Jung et.al. 2507.07223 null
2025-07-09 Neurosymbolic Feature Extraction for Identifying Forced Labor in Supply Chains Zili Wang et.al. 2507.07217 null
2025-07-09 Scale leads to compositional generalization Florian Redhardt et.al. 2507.07207 null
2025-07-09 State-Inference-Based Prompting for Natural Language Trading with Game NPCs Minkyung Kim et.al. 2507.07203 null
2025-07-09 A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality Mohamed Elmoghany et.al. 2507.07202 null
2025-07-09 Combining Pre-Trained Models for Enhanced Feature Representation in Reinforcement Learning Elia Piccoli et.al. 2507.07197 null
2025-07-09 Bridging the Last Mile of Prediction: Enhancing Time Series Forecasting with Conditional Guided Flow Matching Huibo Xu et.al. 2507.07192 null
2025-07-09 Prompt Perturbations Reveal Human-Like Biases in LLM Survey Responses Jens Rupprecht et.al. 2507.07188 null
2025-07-09 Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs Itay Itzhak et.al. 2507.07186 null
2025-07-09 Interpretable EEG-to-Image Generation with Semantic Prompts Arshak Rezvani et.al. 2507.07157 null
2025-07-09 Evaluating Retrieval-Augmented Generation Agents for Autonomous Scientific Discovery in Astrophysics Xueqing Xu et.al. 2507.07155 null
2025-07-09 Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor Vatsal Agarwal et.al. 2507.07106 null
2025-07-09 Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models Tiezheng Zhang et.al. 2507.07104 null
2025-07-09 Evaluating Attribute Confusion in Fashion Text-to-Image Generation Ziyue Liu et.al. 2507.07079 null
2025-07-09 5C Prompt Contracts: A Minimalist, Creative-Friendly, Token-Efficient Design Framework for Individual and SME LLM Usage Ugur Ari et.al. 2507.07045 null
2025-07-09 UniConv: Unifying Retrieval and Response Generation for Large Language Models in Conversations Fengran Mo et.al. 2507.07030 null
2025-07-09 First Return, Entropy-Eliciting Explore Tianyu Zheng et.al. 2507.07017 null
2025-07-09 Integrating Pathology Foundation Models and Spatial Transcriptomics for Cellular Decomposition from Histology Images Yutong Sun et.al. 2507.07013 null
2025-07-09 GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning S M Taslim Uddin Raju et.al. 2507.07006 null
2025-07-09 Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs Yahan Yu et.al. 2507.06999 null
2025-07-09 MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation Qilong Xing et.al. 2507.06992 null
2025-07-09 Are They All Good? Evaluating the Quality of CoTs in LLM-based Code Generation Binquan Zhang et.al. 2507.06980 null
2025-07-09 Hallucinating 360°: Panoramic Street-View Generation via Local Scenes Diffusion and Probabilistic Prompting Fei Teng et.al. 2507.06971 null
2025-07-09 Scaling Towards the Information Boundary of Instruction Set: InfinityInstruct-Subject Technical Report Li Du et.al. 2507.06968 null
2025-07-09 Investigating the Robustness of Retrieval-Augmented Generation at the Query Level Sezen Perçin et.al. 2507.06956 null
2025-07-09 What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models Keyon Vafa et.al. 2507.06952 null
2025-07-09 Rethinking Verification for LLM Code Generation: From Generation to Testing Zihan Ma et.al. 2507.06920 null
2025-07-09 Exploring LLMs for Predicting Tutor Strategy and Student Outcomes in Dialogues Fareya Ikram et.al. 2507.06910 null
2025-07-09 MultiJustice: A Chinese Dataset for Multi-Party, Multi-Charge Legal Prediction Xiao Wang et.al. 2507.06909 null
2025-07-09 SCoRE: Streamlined Corpus-based Relation Extraction using Multi-Label Contrastive Learning and Bayesian kNN Luca Mariotti et.al. 2507.06895 null
2025-07-09 Developing and Maintaining an Open-Source Repository of AI Evaluations: Challenges and Insights Alexandra Abbas et.al. 2507.06893 null
2025-07-09 Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model Jing Liang et.al. 2507.06892 null
2025-07-09 DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models Liang Wang et.al. 2507.06853 null
2025-07-09 The Dark Side of LLMs Agent-based Attacks for Complete Computer Takeover Matteo Lupinacci et.al. 2507.06850 null
2025-07-09 Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation Tao Feng et.al. 2507.06830 null
2025-07-09 Adaptive Termination for Multi-round Parallel Reasoning: An Universal Semantic Entropy-Guided Framework Zenan Xu et.al. 2507.06829 null
2025-07-09 Democratizing High-Fidelity Co-Speech Gesture Video Generation Xu Yang et.al. 2507.06812 null
2025-07-09 Text to model via SysML: Automated generation of dynamical system computational models from unstructured natural language text via enhanced System Modeling Language diagrams Matthew Anderson Hendricks et.al. 2507.06803 null
2025-07-09 Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining: Method, Evaluation and Applications Seonwu Kim et.al. 2507.06795 null
2025-07-09 Checklist Engineering Empowers Multilingual LLM Judges Mohammad Ghiasvand Mohammadkhani et.al. 2507.06774 null
2025-07-09 Leveraging LLMs for Semantic Conflict Detection via Unit Test Generation Nathalia Barbosa et.al. 2507.06762 null
2025-07-09 LOVON: Legged Open-Vocabulary Object Navigator Daojie Peng et.al. 2507.06747 null
2025-07-09 PenTest2.0: Towards Autonomous Privilege Escalation Using GenAI Haitham S. Al-Sinani et.al. 2507.06742 null
2025-07-09 Hierarchical Feature Alignment for Gloss-Free Sign Language Translation Sobhan Asasi et.al. 2507.06732 null
2025-07-09 On the Effect of Uncertainty on Layer-wise Inference Dynamics Sunwoo Kim et.al. 2507.06722 null
2025-07-09 A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding Zhenyang Liu et.al. 2507.06719 null
2025-07-09 CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs Garapati Keerthana et.al. 2507.06715 null
2025-07-09 Elite Polarization in European Parliamentary Speeches: a Novel Measurement Approach Using Large Language Models Gennadii Iakovlev et.al. 2507.06658 null
2025-07-09 Deep Disentangled Representation Network for Treatment Effect Estimation Hui Meng et.al. 2507.06650 null
2025-07-09 EXAONE Path 2.0: Pathology Foundation Model with End-to-End Supervision Myungjang Pyeon et.al. 2507.06639 null
2025-07-09 UniOD: A Universal Model for Outlier Detection across Diverse Domains Dazhi Fu et.al. 2507.06624 null
2025-07-09 Expediting data extraction using a large language model (LLM) and scoping review protocol: a methodological study within a complex scoping review James Stewart-Evans et.al. 2507.06623 null
2025-07-09 FuDoBa: Fusing Document and Knowledge Graph-based Representations with Bayesian Optimisation Boshko Koloski et.al. 2507.06622 null
2025-07-09 Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation Anshuk Uppal et.al. 2507.06613 null
2025-07-09 From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization Xinjie Chen et.al. 2507.06573 null
2025-07-09 SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference Qian Chen et.al. 2507.06567 null
2025-07-09 The Flaws of Others: An LLM-driven Framework for Scientific Knowledge Production Juan B. Gutiérrez et.al. 2507.06565 null
2025-07-09 SkyVLN: Vision-and-Language Navigation and NMPC Control for UAVs in Urban Environments Tianshun Li et.al. 2507.06564 null
2025-07-09 SPEAR: Subset-sampled Performance Evaluation via Automated Ground Truth Generation for RAG Zou Yuheng et.al. 2507.06554 null
2025-07-09 Large Language Model for Extracting Complex Contract Information in Industrial Scenes Yunyang Cao et.al. 2507.06539 null
2025-07-09 InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes under Herd Behavior Huisheng Wang et.al. 2507.06528 null
2025-07-09 FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation Liqiang Jing et.al. 2507.06523 null
2025-07-09 SpindleKV: A Novel KV Cache Reduction Method Balancing Both Shallow and Deep Layers Zicong Tang et.al. 2507.06517 null
2025-07-09 QUEST: Query Optimization in Unstructured Document Analysis Zhaoze Sun et.al. 2507.06515 null
2025-07-09 Towards LLM-based Root Cause Analysis of Hardware Design Failures Siyu Qiu et.al. 2507.06512 null
2025-07-09 Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection Yupeng Hu et.al. 2507.06510 null
2025-07-09 GR-LLMs: Recent Advances in Generative Recommendation Based on Large Language Models Zhen Yang et.al. 2507.06507 null
2025-07-09 Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings Russell Taylor et.al. 2507.06506 null
2025-07-09 MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models Yiwen Liu et.al. 2507.06502 null
2025-07-09 On the Robustness of Verbal Confidence of LLMs in Adversarial Attacks Stephen Obadinma et.al. 2507.06489 null
2025-07-09 Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning Ziyang Wang et.al. 2507.06485 null
2025-07-09 3D-Generalist: Self-Improving Vision-Language-Action Models for Crafting 3D Worlds Fan-Yun Sun et.al. 2507.06484 null
2025-07-09 Learning Japanese with Jouzu: Interaction Outcomes with Stylized Dialogue Fictional Agents Zackary Rackauckas et.al. 2507.06483 null
2025-07-09 IMPACT: Industrial Machine Perception via Acoustic Cognitive Transformer Changheon Han et.al. 2507.06481 null
2025-07-09 Generative Lagrangian data assimilation for ocean dynamics under extreme sparsity Niloofar Asefi et.al. 2507.06479 null
2025-07-09 Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models Aaron Dharna et.al. 2507.06466 null
2025-07-09 Evaluating Efficiency and Novelty of LLM-Generated Code for Graph Analysis Atieh Barati Nia et.al. 2507.06463 null
2025-07-08 A Semantic Parsing Framework for End-to-End Time Normalization Xin Su et.al. 2507.06450 null
2025-07-08 Perception-Aware Policy Optimization for Multimodal Reasoning Zhenhailong Wang et.al. 2507.06448 null
2025-07-08 Exploring Task Performance with Interpretable Models via Sparse Auto-Encoders Shun Wang et.al. 2507.06427 null
2025-07-08 Reward Models Can Improve Themselves: Reward-Guided Adversarial Failure Mode Discovery for Robust Reward Modeling Pankayaraj Pathmanathan et.al. 2507.06419 null
2025-07-08 PAST: A multimodal single-cell foundation model for histopathology and spatial transcriptomics in cancer Changchun Yang et.al. 2507.06418 null
2025-07-08 Voltage Regulation in Distribution Systems with Data Center Loads Yize Chen et.al. 2507.06416 null
2025-07-08 An AI-Driven Thermal-Fluid Testbed for Advanced Small Modular Reactors: Integration of Digital Twin and Large Language Models Doyeong Lim et.al. 2507.06399 null
2025-07-08 SLDB: An End-To-End Heterogeneous System-on-Chip Benchmark Suite for LLM-Aided Design Elisavet Lydia Alvanaki et.al. 2507.06376 null
2025-07-08 Bridging AI and Software Security: A Comparative Vulnerability Assessment of LLM Agent Deployment Paradigms Tarek Gasmi et.al. 2507.06323 null
2025-07-08 Too Human to Model:The Uncanny Valley of LLMs in Social Simulation -- When Generative Language Agents Misalign with Modelling Principles Yongchao Zeng et.al. 2507.06310 null
2025-07-08 Humans overrely on overconfident language models, across languages Neil Rathi et.al. 2507.06306 null
2025-07-08 RSRefSeg 2: Decoupling Referring Remote Sensing Image Segmentation with Foundation Models Keyan Chen et.al. 2507.06231 null
2025-07-08 Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers Zhiyuan Peng et.al. 2507.06223 null
2025-07-08 Is Diversity All You Need for Scalable Robotic Manipulation? Modi Shi et.al. 2507.06219 null
2025-07-08 A Survey on Latent Reasoning Rui-Jie Zhu et.al. 2507.06203 null
2025-07-08 UQLM: A Python Package for Uncertainty Quantification in Large Language Models Dylan Bouchard et.al. 2507.06196 null
2025-07-08 SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads Jiale Lao et.al. 2507.06192 null
2025-07-08 Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review Zhicheng Lin et.al. 2507.06185 null
2025-07-08 Data-Semantics-Aware Recommendation of Diverse Pivot Tables Whanhee Cho et.al. 2507.06171 null
2025-07-09 Skywork-R1V3 Technical Report Wei Shen et.al. 2507.06167 null
2025-07-08 Evaluation of Habitat Robotics using Large Language Models William Li et.al. 2507.06157 null
2025-07-08 Large Language Models Predict Human Well-being -- But Not Equally Everywhere Pat Pataranutaporn et.al. 2507.06141 null
2025-07-08 Coding Triangle: How Does Large Language Model Understand Code? Taolin Zhang et.al. 2507.06138 null
2025-07-08 PrefixAgent: An LLM-Powered Design Framework for Efficient Prefix Adder Optimization Dongsheng Zuo et.al. 2507.06127 null
2025-07-09 Omni-Video: Democratizing Unified Video Understanding and Generation Zhiyu Tan et.al. 2507.06119 null
2025-07-08 Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis Xintong Hu et.al. 2507.06116 null
2025-07-08 Reflections Unlock: Geometry-Aware Reflection Disentanglement in 3D Gaussian Splatting for Photorealistic Scenes Rendering Jiayi Song et.al. 2507.06103 null
2025-07-09 FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models Bo Pang et.al. 2507.06057 null
2025-07-08 Entropy-Memorization Law: Evaluating Memorization Difficulty of Data in LLMs Yizhan Huang et.al. 2507.06056 null
2025-07-08 Kernel Trace Distance: Quantum Statistical Metric between Measures through RKHS Density Operators Arturo Castellanos et.al. 2507.06055 null
2025-07-08 Hierarchical Interaction Summarization and Contrastive Prompting for Explainable Recommendations Yibin Liu et.al. 2507.06044 null
2025-07-08 CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations Xiaohu Li et.al. 2507.06043 null
2025-07-08 CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation Kushal Gajjar et.al. 2507.06013 null
2025-07-08 DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations Nicholas Popovič et.al. 2507.05997 null
2025-07-08 Development and Evaluation of HopeBot: an LLM-based chatbot for structured and interactive PHQ-9 depression screening Zhijun Guo et.al. 2507.05984 null
2025-07-08 Multi-Agent Debate Strategies to Enhance Requirements Engineering with Large Language Models Marc Oriol et.al. 2507.05981 null
2025-07-08 RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages Gabriel Chua et.al. 2507.05980 null
2025-07-08 Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval Haiwen Li et.al. 2507.05970 null
2025-07-08 OpenFActScore: Open-Source Atomic Evaluation of Factuality in Text Generation Lucas Fonseca Lage et.al. 2507.05965 null
2025-07-08 Evaluation of Large Language Model-Driven AutoML in Data and Model Management from Human-Centered Perspective Jiapeng Yao et.al. 2507.05962 null
2025-07-08 A Wireless Foundation Model for Multi-Task Prediction Yucheng Sheng et.al. 2507.05938 null
2025-07-08 BlueLM-2.5-3B Technical Report Baojiao Xiong et.al. 2507.05934 null
2025-07-08 Few-shot text-based emotion detection Teodor-George Marchitan et.al. 2507.05918 null
2025-07-08 Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis Gholamali Aminian et.al. 2507.05913 null
2025-07-08 AI-Reporter: A Path to a New Genre of Scientific Communication Gerd Graßhoff et.al. 2507.05903 null
2025-07-08 Psychometric Item Validation Using Virtual Respondents with Trait-Response Mediators Sungjib Lim et.al. 2507.05890 null
2025-07-08 Current Practices for Building LLM-Powered Reasoning Tools Are Ad Hoc -- and We Can Do Better Aaron Bembenek et.al. 2507.05886 null
2025-07-08 RecRankerEval: A Flexible and Extensible Framework for Top-k LLM-based Recommendation Zeyuan Meng et.al. 2507.05880 null
2025-07-08 KERAG_R: Knowledge-Enhanced Retrieval-Augmented Generation for Recommendation Zeyuan Meng et.al. 2507.05863 null
2025-07-08 USIGAN: Unbalanced Self-Information Feature Transport for Weakly Paired Image IHC Virtual Staining Yue Peng et.al. 2507.05843 null
2025-07-08 Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models L'ea Dubois et.al. 2507.05822 null
2025-07-08 2D Instance Editing in 3D Space Yuhuan Xie et.al. 2507.05819 null
2025-07-08 Affective-ROPTester: Capability and Bias Analysis of LLMs in Predicting Retinopathy of Prematurity Shuai Zhao et.al. 2507.05816 null
2025-07-08 Just Say Better or Worse: A Human-AI Collaborative Framework for Medical Image Segmentation Without Manual Annotations Yizhe Zhang et.al. 2507.05815 null
2025-07-08 Improving Robustness of Foundation Models in Domain Adaptation with Soup-Adapters Marco Roschkowski et.al. 2507.05807 null
2025-07-08 DREAM: Document Reconstruction via End-to-end Autoregressive Model Xin Li et.al. 2507.05805 null
2025-07-08 Creating a customisable freely-accessible Socratic AI physics tutor Eugenio Tufino et.al. 2507.05795 null
2025-07-08 TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model Yujie Hu et.al. 2507.05790 null
2025-07-08 Flippi: End To End GenAI Assistant for E-Commerce Anand A. Rajasekar et.al. 2507.05788 null
2025-07-08 Text-Guided Token Communication for Wireless Image Transmission Bole Liu et.al. 2507.05781 null
2025-07-08 LeAD: The LLM Enhanced Planning System Converged with End-to-end Autonomous Driving Yuhang Zhang et.al. 2507.05754 null
2025-07-08 Jigsaw: Training Multi-Billion-Parameter AI Weather Models with Optimized Model Parallelism Deifilia Kieckhefen et.al. 2507.05753 null
2025-07-08 DocTalk: Scalable Graph-based Dialogue Synthesis for Enhancing LLM Conversational Capabilities Jing Yang Lee et.al. 2507.05750 null
2025-07-08 Tissue Concepts v2: a Supervised Foundation Model for whole slide images Till Nicke et.al. 2507.05742 null
2025-07-08 When Transformers Meet Recommenders: Integrating Self-Attentive Sequential Recommendation with Fine-Tuned LLMs Kechen Liu et.al. 2507.05733 null
2025-07-08 ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark He Wang et.al. 2507.05727 null
2025-07-08 Large Language Models for Agent-Based Modelling: Current and possible uses across the modelling cycle Loïs Vanhée et.al. 2507.05723 null
2025-07-08 HIRAG: Hierarchical-Thought Instruction-Tuning Retrieval-Augmented Generation YiHan Jiao et.al. 2507.05714 null
2025-07-08 DRAGON: Dynamic RAG Benchmark On News Fedor Chernogorskii et.al. 2507.05713 null
2025-07-08 Smoothie-Qwen: Post-Hoc Smoothing to Reduce Language Bias in Multilingual LLMs SeungWon Ji et.al. 2507.05686 null
2025-07-08 MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos Rongsheng Wang et.al. 2507.05675 null
2025-07-08 Integrating Diffusion-based Multi-task Learning with Online Reinforcement Learning for Robust Quadruped Robot Control Xinyao Qin et.al. 2507.05674 null
2025-07-08 TuneShield: Mitigating Toxicity in Conversational AI while Fine-tuning on Untrusted Data Aravind Cheruvu et.al. 2507.05660 null
2025-07-08 LLMs are Introvert Litian Zhang et.al. 2507.05638 null
2025-07-08 SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression Yiqiao Jin et.al. 2507.05633 null
2025-07-08 Enhancing Student Learning with LLM-Generated Retrieval Practice Questions: An Empirical Study in Data Science Courses Yuan An et.al. 2507.05629 null
2025-07-08 DreamGrasp: Zero-Shot 3D Multi-Object Reconstruction from Partial-View Images for Robotic Manipulation Young Hun Kim et.al. 2507.05627 null
2025-07-08 Flipping Knowledge Distillation: Leveraging Small Models' Expertise to Enhance LLMs in Text Matching Mingzhe Li et.al. 2507.05617 null
2025-07-08 Domain adaptation of large language models for geotechnical applications Lei Fan et.al. 2507.05613 null
2025-07-08 MMW: Side Talk Rejection Multi-Microphone Whisper on Smart Glasses Yang Liu et.al. 2507.05609 null
2025-07-08 Structured Task Solving via Modular Embodied Intelligence: A Case Study on Rubik's Cube Chongshan Fan et.al. 2507.05607 null
2025-07-08 Self-Review Framework for Enhancing Instruction Following Capability of LLM Sihyun Park et.al. 2507.05598 null
2025-07-08 PaddleOCR 3.0 Technical Report Cheng Cui et.al. 2507.05595 null
2025-07-08 MLlm-DR: Towards Explainable Depression Recognition with MultiModal Large Language Models Wei Zhang et.al. 2507.05591 null
2025-07-08 The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation Alexander Xiong et.al. 2507.05578 null
2025-07-08 Beyond Retrieval: Ensembling Cross-Encoders and GPT Rerankers with LLMs for Biomedical QA Shashank Verma et.al. 2507.05577 null
2025-07-08 Prompt Migration: Stabilizing GenAI Applications with Evolving Large Language Models Shivani Tripathi et.al. 2507.05573 null
2025-07-08 ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models Jiaxu Tian et.al. 2507.05568 null
2025-07-08 Search-based Selection of Metamorphic Relations for Optimized Robustness Testing of Large Language Models Sangwon Hyun et.al. 2507.05565 null
2025-07-08 Enhancing Test-Time Scaling of Large Language Models with Hierarchical Retrieval-Augmented MCTS Alex ZH Dou et.al. 2507.05557 null
2025-07-08 A Malliavin calculus approach to score functions in diffusion generative models Ehsan Mirafzali et.al. 2507.05550 null
2025-07-07 SenseCF: LLM-Prompted Counterfactuals for Intervention and Sensor Data Augmentation Shovito Barua Soumma et.al. 2507.05541 null
2025-07-07 Conversational Education at Scale: A Multi-LLM Agent Workflow for Procedural Learning and Pedagogic Quality Assessment Jiahuan Pei et.al. 2507.05528 null
2025-07-07 Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications Jean-Philippe Corbeil et.al. 2507.05517 null
2025-07-07 Tool for Supporting Debugging and Understanding of Normative Requirements Using LLMs Alex Kleijwegt et.al. 2507.05504 null
2025-07-07 MolFORM: Multi-modal Flow Matching for Structure-Based Drug Design Jie Huang et.al. 2507.05503 null
2025-07-07 Deep Research Comparator: A Platform For Fine-grained Human Annotations of Deep Research Agents Prahaladh Chandrahasan et.al. 2507.05495 null
2025-07-07 MBFormer: A General Transformer-based Learning Paradigm for Many-body Interactions in Real Materials Bowen Hou et.al. 2507.05480 null
2025-07-07 Dense and comeager conjugacy classes in zero-dimensional dynamics Michal Doucha et.al. 2507.05474 null
2025-07-07 Inaugural MOASEI Competition at AAMAS'2025: A Technical Report Ceferino Patino et.al. 2507.05469 null
2025-07-07 Risk-Aware Aerocapture Guidance Through a Probabilistic Indicator Function Grace E. Calkins et.al. 2507.05454 null
2025-07-07 On the Semantics of Large Language Models Martin Schuele et.al. 2507.05448 null
2025-07-07 PhoniTale: Phonologically Grounded Mnemonic Generation for Typologically Distant Language Pairs Sana Kang et.al. 2507.05444 null
2025-07-07 Mastering Regional 3DGS: Locating, Initializing, and Editing with Diverse 2D Priors Lanqing Guo et.al. 2507.05426 null
2025-07-07 "Lost-in-the-Later": Framework for Quantifying Contextual Grounding in Large Language Models Yufei Tao et.al. 2507.05424 null
2025-07-07 Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning Jaedong Hwang et.al. 2507.05418 null
2025-07-07 PBE Meets LLM: When Few Examples Aren't Few-Shot Enough Shuning Zhang et.al. 2507.05403 null
2025-07-07 Neural-Driven Image Editing Pengfei Zhou et.al. 2507.05397 null
2025-07-07 Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences Guillem Ramírez et.al. 2507.05391 null
2025-07-07 From General to Specialized: The Need for Foundational Models in Agriculture Vishal Nedungadi et.al. 2507.05390 null
2025-07-07 Reinforcement Fine-Tuning Naturally Mitigates Forgetting in Continual Post-Training Song Lai et.al. 2507.05386 null
2025-07-07 Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing Chun-Hsiao Yeh et.al. 2507.05259 null
2025-07-07 Spatio-Temporal LLM: Reasoning about Environments and Actions Haozhen Zheng et.al. 2507.05258 null
2025-07-07 Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions Yuanzhe Hu et.al. 2507.05257 null
2025-07-07 Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Yana Wei et.al. 2507.05255 null
2025-07-07 Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models Ziqi Miao et.al. 2507.05248 null
2025-07-07 Modeling Latent Partner Strategies for Adaptive Zero-Shot Human-Agent Collaboration Benjamin Li et.al. 2507.05244 null
2025-07-07 StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling Meng Wei et.al. 2507.05240 null
2025-07-07 All in One: Visual-Description-Guided Unified Point Cloud Segmentation Zongyan Han et.al. 2507.05211 null
2025-07-07 MedGemma Technical Report Andrew Sellergren et.al. 2507.05201 null
2025-07-07 CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale Jonathan Hyun et.al. 2507.05178 null
2025-07-07 OpenS2S: Advancing Open-Source End-to-End Empathetic Large Speech Language Model Chen Wang et.al. 2507.05177 null
2025-07-07 A Dynamical Systems Perspective on the Analysis of Neural Networks Dennis Chemnitz et.al. 2507.05164 null
2025-07-07 4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture Yutian Chen et.al. 2507.05163 null
2025-07-07 AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models Chinnappa Guggilla et.al. 2507.05157 null
2025-07-07 Interpretable Mnemonic Generation for Kanji Learning via Expectation-Maximization Jaewook Lee et.al. 2507.05137 null
2025-07-07 LERa: Replanning with Visual Feedback in Instruction Following Svyatoslav Pchelintsev et.al. 2507.05135 null
2025-07-07 An Evaluation of Large Language Models on Text Summarization Tasks Using Prompt Engineering Techniques Walid Mohamed Aly et.al. 2507.05123 null
2025-07-07 LVM4CSI: Enabling Direct Application of Pre-Trained Large Vision Models for Wireless Channel Tasks Jiajia Guo et.al. 2507.05121 null
2025-07-07 VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots Danil S. Grigorev et.al. 2507.05118 null
2025-07-07 DICE: Discrete inverse continuity equation for learning population dynamics Tobias Blickhan et.al. 2507.05107 null
2025-07-07 The Hidden Threat in Plain Text: Attacking RAG Data Loaders Alberto Castagnaro et.al. 2507.05093 null
2025-07-07 Gaussian approximation for non-linearity parameter estimation in perturbed random fields on the sphere Claudio Durastanti et.al. 2507.05074 null
2025-07-07 ICAS: Detecting Training Data from Autoregressive Image Generative Models Hongyao Yu et.al. 2507.05068 null
2025-07-07 Replacing thinking with tool usage enables reasoning in small language models Corrado Rainone et.al. 2507.05065 null
2025-07-07 What Shapes User Trust in ChatGPT? A Mixed-Methods Study of User Attributes, Trust Dimensions, Task Context, and Societal Perceptions among University Students Kadija Bouyzourn et.al. 2507.05046 null
2025-07-07 MoLink: Distributed and Efficient Serving Framework for Large Models Lewei Jin et.al. 2507.05043 null
2025-07-07 Beyond Scaling Curves: Internal Dynamics of Neural Networks Through the NTK Lens Konstantin Nikolaou et.al. 2507.05035 null
2025-07-07 Estimating Object Physical Properties from RGB-D Vision and Depth Robot Sensors Using Deep Learning Ricardo Cardoso et.al. 2507.05029 null
2025-07-07 A Generative Diffusion Model for Amorphous Materials Kai Yang et.al. 2507.05024 null
2025-07-07 Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification Chenfei Xiong et.al. 2507.05010 null
2025-07-07 Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition Britty Baby et.al. 2507.05007 null
2025-07-07 From Autonomy to Agency: Agentic Vehicles for Human-Centered Mobility Systems Jiangbo Yu et.al. 2507.04996 null
2025-07-07 Parameterized Diffusion Optimization enabled Autoregressive Ordinal Regression for Diabetic Retinopathy Grading Qinkai Yu et.al. 2507.04978 null
2025-07-07 Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models Eunseop Yoon et.al. 2507.04976 null
2025-07-07 The Case for Instance-Optimized LLMs in OLAP Databases Bardia Mohammadi et.al. 2507.04967 null
2025-07-07 EXPOTION: Facial Expression and Motion Control for Multimodal Music Generation Fathinah Izzati et.al. 2507.04955 null
2025-07-07 ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation Chenchen Zhang et.al. 2507.04952 null
2025-07-07 ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding Jianjiang Yang et.al. 2507.04943 null
2025-07-07 Contextual Light-Particle Interference Brian Stout et.al. 2507.04935 null
2025-07-07 LIFT: Automating Symbolic Execution Optimization with Large Language Models for AI Networks Ruoxi Wang et.al. 2507.04931 null
2025-07-07 HV-MMBench: Benchmarking MLLMs for Human-Centric Video Understanding Yuxuan Cai et.al. 2507.04909 null
2025-07-07 Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations A. Bochkov et.al. 2507.04886 null
2025-07-07 DoPI: Doctor-like Proactive Interrogation LLM for Traditional Chinese Medicine Zewen Sun et.al. 2507.04877 null
2025-07-07 Music Boomerang: Reusing Diffusion Models for Data Augmentation and Audio Manipulation Alexander Fichtinger et.al. 2507.04864 null
2025-07-07 Supporting Software Formal Verification with Large Language Models: An Experimental Study Weiqi Wang et.al. 2507.04857 null
2025-07-07 Semantically Consistent Discrete Diffusion for 3D Biological Graph Modeling Chinmay Prabhakar et.al. 2507.04856 null
2025-07-07 $\textit{Grahak-Nyay:}$ Consumer Grievance Redressal through Large Language Models Shrey Ganatra et.al. 2507.04854 null
2025-07-07 Dialogue-Based Multi-Dimensional Relationship Extraction from Novels Yuchen Yan et.al. 2507.04852 null
2025-07-07 Spec-TOD: A Specialized Instruction-Tuned LLM Framework for Efficient Task-Oriented Dialogue Systems Quang-Vinh Nguyen et.al. 2507.04841 null
2025-07-07 RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction Johannes Künzel et.al. 2507.04839 null
2025-07-07 The Geopolitical Determinants of Economic Growth, 1960-2019 Tianyu Fan et.al. 2507.04833 null
2025-07-07 Harnessing Pairwise Ranking Prompting Through Sample-Efficient Ranking Distillation Junru Wu et.al. 2507.04820 null
2025-07-07 Application and Evaluation of Large Language Models for Forecasting the Impact of Traffic Incidents George Jagadeesh et.al. 2507.04803 null
2025-07-07 Generalization bounds for score-based generative models: a synthetic proof Arthur Stéphanovitch et.al. 2507.04794 null
2025-07-07 Reason to Rote: Rethinking Memorization in Reasoning Yupei Du et.al. 2507.04782 null
2025-07-07 From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection Zexi Jia et.al. 2507.04769 null
2025-07-07 ABench-Physics: Benchmarking Physical Reasoning in LLMs via High-Difficulty and Dynamic Physics Problems Yiming Zhang et.al. 2507.04766 null
2025-07-07 GraphBrep: Learning B-Rep in Graph Structure for Efficient CAD Generation Weilin Lai et.al. 2507.04765 null
2025-07-07 Intervening to learn and compose disentangled representations Alex Markham et.al. 2507.04754 null
2025-07-07 Large Language Models for Network Intrusion Detection Systems: Foundations, Implementations, and Future Directions Shuo Yang et.al. 2507.04752 null
2025-07-07 LLMs as Architects and Critics for Multi-Source Opinion Summarization Anuj Attri et.al. 2507.04751 null
2025-07-07 LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction Sungmin Lee et.al. 2507.04748 null
2025-07-07 Activation Steering for Chain-of-Thought Compression Seyedarmin Azizi et.al. 2507.04742 null
2025-07-07 ChipSeek-R1: Generating Human-Surpassing RTL with LLM via Hierarchical Reward-Driven Reinforcement Learning Zhirong Chen et.al. 2507.04736 null
2025-07-07 An analysis of vision-language models for fabric retrieval Francesco Giuliari et.al. 2507.04735 null
2025-07-07 "This Suits You the Best": Query Focused Comparative Explainable Summarization Arnav Attri et.al. 2507.04733 null
2025-07-07 Who's the Mole? Modeling and Detecting Intention-Hiding Malicious Agents in LLM-Based Multi-Agent Systems Yizhe Xie et.al. 2507.04724 null
2025-07-07 LOOM-Scope: a comprehensive and efficient LOng-cOntext Model evaluation framework Zecheng Tang et.al. 2507.04723 null
2025-07-07 Geometric-Guided Few-Shot Dental Landmark Detection with Human-Centric Foundation Model Anbang Wang et.al. 2507.04710 null
2025-07-07 Why We Feel What We Feel: Joint Detection of Emotions and Their Opinion Triggers in E-commerce Arnav Attri et.al. 2507.04708 null
2025-07-07 Tempo-R0: A Video-MLLM for Temporal Video Grounding through Efficient Temporal Sensing Reinforcement Learning Feng Yue et.al. 2507.04702 null
2025-07-07 XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL Yifu Liu et.al. 2507.04701 null
2025-07-07 A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets Zexi Jia et.al. 2507.04699 null
2025-07-07 Performance Evaluation of General Purpose Large Language Models for Basic Linear Algebra Subprograms Code Generation Daichi Mukunoki et.al. 2507.04697 null
2025-07-07 AKEGEN: A LLM-based Tabular Corpus Generator for Evaluating Dataset Discovery in Data Lakes Zhenwei Dai et.al. 2507.04687 null
2025-07-07 ChangeBridge: Spatiotemporal Image Generation with Multimodal Controls for Remote Sensing Zhenghui Zhao et.al. 2507.04678 null
2025-07-07 VectorLLM: Human-like Extraction of Structured Building Contours vis Multimodal LLMs Tao Zhang et.al. 2507.04664 null
2025-07-07 MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding Zhicheng Zhang et.al. 2507.04635 null
2025-07-07 Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? Yun Qu et.al. 2507.04632 null
2025-07-07 Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts Yun Wang et.al. 2507.04631 null
2025-07-07 Heterogeneous User Modeling for LLM-based Recommendation Honghui Bao et.al. 2507.04626 null
2025-07-07 Knowledge-Aware Self-Correction in Language Models via Structured Memory Graphs Swayamjit Saha et.al. 2507.04625 null
2025-07-07 Hierarchical Intent-guided Optimization with Pluggable LLM-Driven Semantics for Session-based Recommendation Jinpeng Chen et.al. 2507.04623 null
2025-07-07 Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences Yusong Zhang et.al. 2507.04621 null
2025-07-07 any4: Learned 4-bit Numeric Representation for LLMs Mostafa Elhoushi et.al. 2507.04610 null
2025-07-07 PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes Xinliang Frederick Zhang et.al. 2507.04607 null
2025-07-07 QR-LoRA: Efficient and Disentangled Fine-tuning via QR Decomposition for Customized Generation Jiahui Yang et.al. 2507.04599 null
2025-07-06 Evaluating LLMs on Real-World Forecasting Against Human Superforecasters Janna Lu et.al. 2507.04562 null
2025-07-06 MambaVideo for Discrete Video Tokenization with Channel-Split Quantization Dawit Mureja Argaw et.al. 2507.04559 null
2025-07-06 Self-supervised learning of speech representations with Dutch archival data Nik Vaessen et.al. 2507.04554 null
2025-07-06 Greedy Dynamic Matching Nick Arnosti et.al. 2507.04551 null
2025-07-06 DP-Fusion: Token-Level Differentially Private Inference for Large Language Models Rushil Thareja et.al. 2507.04531 null
2025-07-06 DOTResize: Reducing LLM Width via Discrete Optimal Transport-based Neuron Merging Neha Verma et.al. 2507.04517 null
2025-07-06 Unveiling the Potential of Diffusion Large Language Model in Controllable Generation Zhen Xiong et.al. 2507.04504 null
2025-07-06 A validity-guided workflow for robust large language model research in psychology Zhicheng Lin et.al. 2507.04491 null
2025-07-06 Source Attribution in Retrieval-Augmented Generation Ikhtiyor Nematov et.al. 2507.04480 null
2025-07-06 Model Inversion Attacks on Llama 3: Extracting PII from Large Language Models Sathesh P. Sivashanmugam et.al. 2507.04478 null
2025-07-06 The role of large language models in UI/UX design: A systematic literature review Ammar Ahmed et.al. 2507.04469 null
2025-07-06 GradOT: Training-free Gradient-preserving Offsite-tuning for Large Language Models Kai Yao et.al. 2507.04455 null
2025-07-06 ESSA: Evolutionary Strategies for Scalable Alignment Daria Korotyshova et.al. 2507.04453 null
2025-07-03 MultiGen: Using Multimodal Generation in Simulation to Learn Multimodal Policies in Real Renhao Wang et.al. 2507.02864 null
2025-07-03 RefTok: Reference-Based Tokenization for Video Generation Xiang Fan et.al. 2507.02862 null
2025-07-03 Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching Xin Zhou et.al. 2507.02860 null
2025-07-03 Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation Jiaer Xia et.al. 2507.02859 null
2025-07-03 Requirements Elicitation Follow-Up Question Generation Yuchen Shen et.al. 2507.02858 null
2025-07-03 MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs Purbesh Mitra et.al. 2507.02851 null
2025-07-03 Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection Ziqi Miao et.al. 2507.02844 null
2025-07-03 LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding Yuchen Ma et.al. 2507.02843 null
2025-07-03 StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason Kaiyi Zhang et.al. 2507.02841 null
2025-07-03 ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning Ruiyang Zhou et.al. 2507.02834 null
2025-07-03 SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model Wencheng Zhang et.al. 2507.02822 null
2025-07-03 Multimodal Mathematical Reasoning with Diverse Solving Perspective Wenhao Shi et.al. 2507.02804 null
2025-07-03 Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models Riccardo Cantini et.al. 2507.02799 null
2025-07-03 No time to train! Training-Free Reference-Based Instance Segmentation Miguel Espinosa et.al. 2507.02798 null
2025-07-03 From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding Xiangfeng Wang et.al. 2507.02790 null
2025-07-03 Moral Responsibility or Obedience: What Do We Want from AI? Joseph Boland et.al. 2507.02788 null
2025-07-03 Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs Ken Tsui et.al. 2507.02778 null
2025-07-03 KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs Yuzhang Xie et.al. 2507.02773 null
2025-07-03 Grounding Intelligence in Movement Melanie Segado et.al. 2507.02771 null
2025-07-03 DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment Ke-Han Lu et.al. 2507.02768 null
2025-07-03 Knowledge Protocol Engineering: A New Paradigm for AI in Domain-Specific Knowledge Work Guangwei Zhang et.al. 2507.02760 null
2025-07-03 Fast and Simplex: 2-Simplicial Attention in Triton Aurko Roy et.al. 2507.02754 null
2025-07-03 Who's Sorry Now: User Preferences Among Rote, Empathic, and Explanatory Apologies from LLM Chatbots Zahra Ashktorab et.al. 2507.02745 null
2025-07-03 Prompt learning with bounding box constraints for medical image segmentation Mélanie Gaillochet et.al. 2507.02743 null
2025-07-03 Early Signs of Steganographic Capabilities in Frontier LLMs Artur Zolkowski et.al. 2507.02737 null
2025-07-03 Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving Matthieu Zimmer et.al. 2507.02726 null
2025-07-03 On the Convergence of Large Language Model Optimizer for Black-Box Network Management Hoon Lee et.al. 2507.02689 null
2025-07-03 Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs Francesco Di Salvo et.al. 2507.02671 null
2025-07-03 AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models Ziyin Zhou et.al. 2507.02664 null
2025-07-03 Hey AI, Generate Me a Hardware Code! Agentic AI-based Hardware Design & Verification Deepak Narayan Gadde et.al. 2507.02660 null
2025-07-03 Medical Data Pecking: A Context-Aware Approach for Automated Quality Evaluation of Structured Medical Data Irena Girshovitz et.al. 2507.02628 null
2025-07-03 VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning Siran Chen et.al. 2507.02626 null
2025-07-03 FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference Xing Liu et.al. 2507.02620 null
2025-07-03 Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory Kenneth Payne et.al. 2507.02618 null
2025-07-03 DynamiCare: A Dynamic Multi-Agent Framework for Interactive and Open-Ended Medical Decision-Making Tianqi Shang et.al. 2507.02616 null
2025-07-03 De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks Wei Fan et.al. 2507.02606 null
2025-07-03 MPF: Aligning and Debiasing Language Models post Deployment via Multi Perspective Fusion Xin Guan et.al. 2507.02595 null
2025-07-03 Revisiting Active Learning under (Human) Label Variation Cornelia Gruber et.al. 2507.02593 null
2025-07-03 Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning Buzhen Huang et.al. 2507.02565 null
2025-07-03 LLMREI: Automating Requirements Elicitation Interviews with LLMs Alexander Korn et.al. 2507.02564 null
2025-07-03 Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability Luca Baroni et.al. 2507.02559 null
2025-07-03 Clarifying Before Reasoning: A Coq Prover with Structural Context Yanzhen Lu et.al. 2507.02541 null
2025-07-03 Are You Listening to Me? Fine-Tuning Chatbots for Empathetic Dialogue Paulo Ricardo Knob et.al. 2507.02537 null
2025-07-03 Meta-Fair: AI-Assisted Fairness Testing of Large Language Models Miguel Romero-Arjona et.al. 2507.02533 null
2025-07-03 Open-Source System for Multilingual Translation and Cloned Speech Synthesis Mateo Cámara et.al. 2507.02530 null
2025-07-03 RetrySQL: text-to-SQL training with retry data for self-correcting query generation Alicja Rączkowska et.al. 2507.02529 null
2025-07-03 Continual Gradient Low-Rank Projection Fine-Tuning for LLMs Chenxu Wang et.al. 2507.02503 null
2025-07-03 CrowdTrack: A Benchmark for Difficult Multiple Pedestrian Tracking in Real Scenarios Teng Fu et.al. 2507.02479 null
2025-07-03 System-performance and cost modeling of Large Language Model training and inference Wenzhe Guo et.al. 2507.02456 null
2025-07-03 Introducing a New Brexit-Related Uncertainty Index: Its Evolution and Economic Consequences Ismet Gocer et.al. 2507.02439 null
2025-07-03 Toward a Robust and Generalizable Metamaterial Foundation Model Namjung Kim et.al. 2507.02436 null
2025-07-03 Improving Consistency in Vehicle Trajectory Prediction Through Preference Optimization Caio Azevedo et.al. 2507.02406 null
2025-07-03 Evaluating Language Models For Threat Detection in IoT Security Logs Jorge J. Tejero-Fernández et.al. 2507.02390 null
2025-07-03 JoyTTS: LLM-based Spoken Chatbot With Voice Cloning Fangru Zhou et.al. 2507.02380 null
2025-07-03 Efficient Code LLM Training via Distribution-Consistent and Diversity-Aware Data Selection Weijie Lyu et.al. 2507.02378 null
2025-07-03 UVLM: Benchmarking Video Language Model for Underwater World Understanding Xizhe Xue et.al. 2507.02373 null
2025-07-03 Holistic Tokenizer for Autoregressive Image Generation Anlin Zheng et.al. 2507.02358 null
2025-07-03 Coling-UniA at SciVQA 2025: Few-Shot Example Retrieval and Confidence-Informed Ensembling for Multimodal Large Language Models Christian Jaumann et.al. 2507.02357 null
2025-07-03 DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning Dohoon Kim et.al. 2507.02302 null
2025-07-03 Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization De Cheng et.al. 2507.02288 null
2025-07-03 Misaligned from Within: Large Language Models Reproduce Our Double-Loop Learning Blindness Tim Rogers et.al. 2507.02283 null
2025-07-03 Content filtering methods for music recommendation: A review Terence Zeng et.al. 2507.02282 null
2025-07-03 LaCo: Efficient Layer-wise Compression of Visual Tokens for Multimodal Large Language Models Juntao Liu et.al. 2507.02279 null
2025-07-03 NLP4Neuro: Sequence-to-sequence learning for neural population decoding Jacob J. Morra et.al. 2507.02264 null
2025-07-03 Uncertainty-aware Reward Design Process Yang Yang et.al. 2507.02256 null
2025-07-03 Listwise Preference Alignment Optimization for Tail Item Recommendation Zihao Li et.al. 2507.02255 null
2025-07-03 Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation Jungkoo Kang et.al. 2507.02253 null
2025-07-03 SurgVisAgent: Multimodal Agentic Model for Versatile Surgical Visual Enhancement Zeyu Lei et.al. 2507.02252 null
2025-07-03 VERBA: Verbalizing Model Differences Using Large Language Models Shravan Doda et.al. 2507.02241 null
2025-07-03 DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs Mohammad Akyash et.al. 2507.02226 null
2025-07-03 GDC Cohort Copilot: An AI Copilot for Curating Cohorts from the Genomic Data Commons Steven Song et.al. 2507.02221 null
2025-07-02 ESTR-CoT: Towards Explainable and Accurate Event Stream based Scene Text Recognition with Chain-of-Thought Reasoning Xiao Wang et.al. 2507.02200 null
2025-07-02 EvalAssist: A Human-Centered Tool for LLM-as-a-Judge Zahra Ashktorab et.al. 2507.02186 null
2025-07-02 Computer Science Education in the Age of Generative AI Russell Beale et.al. 2507.02183 null
2025-07-02 Enhancing COBOL Code Explanations: A Multi-Agents Approach Using Large Language Models Fangjian Lei et.al. 2507.02182 null
2025-07-02 The Revolution Has Arrived: What the Current State of Large Language Models in Education Implies for the Future Russell Beale et.al. 2507.02180 null
2025-07-02 Data Diversification Methods In Alignment Enhance Math Performance In LLMs Berkan Dokmeci et.al. 2507.02173 null
2025-07-02 Reasoning or Not? A Comprehensive Evaluation of Reasoning LLMs for Dialogue Summarization Keyan Jin et.al. 2507.02145 null
2025-07-02 When LLMs Disagree: Diagnosing Relevance Filtering Bias and Retrieval Divergence in SDG Search William A. Ingram et.al. 2507.02139 null
2025-07-02 Dissecting the Impact of Mobile DVFS Governors on LLM Inference Performance and Energy Efficiency Zongpu Zhang et.al. 2507.02135 null
2025-07-02 BACTA-GPT: An AI-Based Bayesian Adaptive Clinical Trial Architect Krishna Padmanabhan et.al. 2507.02130 null
2025-07-02 Generative Latent Diffusion for Efficient Spatiotemporal Data Reduction Xiao Li et.al. 2507.02129 null
2025-07-02 CROP: Circuit Retrieval and Optimization with Parameter Guidance using LLMs Jingyu Pan et.al. 2507.02128 null
2025-07-02 SAKURAONE: Empowering Transparent and Open AI Platforms through Private-Sector HPC Investment in Japan Fumikazu Konishi et.al. 2507.02124 null
2025-07-02 PAL: Designing Conversational Agents as Scalable, Cooperative Patient Simulators for Palliative-Care Training Neil K. R. Sehgal et.al. 2507.02122 null
2025-07-02 What Neuroscience Can Teach AI About Learning in Continuously Changing Environments Daniel Durstewitz et.al. 2507.02103 null
2025-07-02 The Future is Agentic: Definitions, Perspectives, and Open Challenges of Multi-Agent Recommender Systems Reza Yousefi Maragheh et.al. 2507.02097 null
2025-07-02 Sample Complexity Bounds for Linear Constrained MDPs with a Generative Model Xingtu Liu et.al. 2507.02089 null
2025-07-02 McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models Tian Lan et.al. 2507.02088 null
2025-07-02 Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions Eitan Anzenberg et.al. 2507.02087 null
2025-07-02 Measuring Scientific Capabilities of Language Models with a Systems Biology Dry Lab Haonan Duan et.al. 2507.02083 null
2025-07-02 Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs Mohammad Ali Alomrani et.al. 2507.02076 null
2025-07-02 Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges Sanjeda Akter et.al. 2507.02074 null
2025-07-02 MGC: A Compiler Framework Exploiting Compositional Blindness in Aligned LLMs for Malware Generation Lu Yan et.al. 2507.02057 null
2025-07-02 How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Rahul Ramachandran et.al. 2507.01955 null
2025-07-02 Test-Time Scaling with Reflective Generative Model Zixiao Wang et.al. 2507.01951 null
2025-07-02 Kwai Keye-VL Technical Report Kwai Keye Team et.al. 2507.01949 null
2025-07-02 LongAnimation: Long Animation Generation with Dynamic Global-Local Memory Nan Chen et.al. 2507.01945 null
2025-07-02 SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars Xiaosheng Zhao et.al. 2507.01939 null
2025-07-02 The Thin Line Between Comprehension and Persuasion in LLMs Adrian de Wynter et.al. 2507.01936 null
2025-07-02 Large Language Model-Driven Closed-Loop UAV Operation with Semantic Observations Wenhao Wang et.al. 2507.01930 null
2025-07-02 A Survey on Vision-Language-Action Models: An Action Tokenization Perspective Yifan Zhong et.al. 2507.01925 null
2025-07-02 Decision-oriented Text Evaluation Yu-Shiang Huang et.al. 2507.01923 null
2025-07-02 Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models Chengao Li et.al. 2507.01915 null
2025-07-02 Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning Qingdong He et.al. 2507.01908 null
2025-07-02 AI4Research: A Survey of Artificial Intelligence for Scientific Research Qiguang Chen et.al. 2507.01903 null
2025-07-02 High-Layer Attention Pruning with Rescaling Songtao Liu et.al. 2507.01900 null
2025-07-02 MiCoTA: Bridging the Learnability Gap with Intermediate CoT and Teacher Assistants Dongyi Ding et.al. 2507.01887 null
2025-07-02 Improving GANs by leveraging the quantum noise from real hardware Hongni Jin et.al. 2507.01886 null
2025-07-02 A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs Niccolò McConnell et.al. 2507.01881 null
2025-07-02 Towards Foundation Auto-Encoders for Time-Series Anomaly Detection Gastón García González et.al. 2507.01875 null
2025-07-02 DIY-MKG: An LLM-Based Polyglot Language Learning System Kenan Tang et.al. 2507.01872 null
2025-07-02 Bridging UI Design and chatbot Interactions: Applying Form-Based Principles to Conversational Agents Sanjay Krishna Anbalagan et.al. 2507.01862 null
2025-07-02 TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types Yuhao Lin et.al. 2507.01857 null
2025-07-02 Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages Samridhi Raj Sinha et.al. 2507.01853 null
2025-07-02 Low-Perplexity LLM-Generated Sequences and Where To Find Them Arthur Wuhrmann et.al. 2507.01844 null
2025-07-02 Out-of-Distribution Detection Methods Answer the Wrong Questions Yucen Lily Li et.al. 2507.01831 null
2025-07-02 APRMCTS: Improving LLM-based Automated Program Repair with Iterative Tree Search Haichuan Hu et.al. 2507.01827 null
2025-07-02 LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs Reza Arabpour et.al. 2507.01806 null
2025-07-02 Towards Decentralized and Sustainable Foundation Model Training with the Edge Leyang Xue et.al. 2507.01803 null
2025-07-02 HCNQA: Enhancing 3D VQA with Hierarchical Concentration Narrowing Supervision Shengli Zhou et.al. 2507.01800 null
2025-07-02 Robust brain age estimation from structural MRI with contrastive learning Carlo Alberto Barbano et.al. 2507.01794 null
2025-07-02 Machine learning prediction of a chemical reaction over 8 decades of energy Daniel Julian et.al. 2507.01793 null
2025-07-02 FreeLoRA: Enabling Training-Free LoRA Fusion for Autoregressive Multi-Subject Personalization Peng Zheng et.al. 2507.01792 null
2025-07-02 MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining Zhixun Chen et.al. 2507.01785 null
2025-07-02 Frontiers of Generative AI for Network Optimization: Theories, Limits, and Visions Bo Yang et.al. 2507.01773 null
2025-07-02 Enhanced Generative Model Evaluation with Clipped Density and Coverage Nicolas Salvy et.al. 2507.01761 null
2025-07-02 Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis Peng Zheng et.al. 2507.01756 null
2025-07-02 Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training Ismail Labiad et.al. 2507.01752 null
2025-07-02 LLMs for Legal Subsumption in German Employment Contracts Oliver Wardas et.al. 2507.01734 null
2025-07-02 Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach Hao Wei et.al. 2507.01728 null
2025-07-02 Generative flow-based warm start of the variational quantum eigensolver Hang Zou et.al. 2507.01726 null
2025-07-02 Agent Ideate: A Framework for Product Idea Generation from Patents Using Agentic AI Gopichand Kanumolu et.al. 2507.01717 null
2025-07-02 Generative modeling of convergence maps based on predicted one-point statistics Vilasini Tinnaneri Sreekanth et.al. 2507.01707 null
2025-07-02 AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness Zixin Chen et.al. 2507.01702 null
2025-07-02 Graph Representation-based Model Poisoning on Federated LLMs in CyberEdge Networks Hanlin Cai et.al. 2507.01694 null
2025-07-02 GPT, But Backwards: Exactly Inverting Language Model Outputs Adrians Skapars et.al. 2507.01693 null
2025-07-02 A generative modeling / Physics-Informed Neural Network approach to random differential equations Georgios Arampatzis et.al. 2507.01687 null
2025-07-02 Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling Zeyu Huang et.al. 2507.01679 null
2025-07-02 AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training Zhenyu Han et.al. 2507.01663 null
2025-07-02 SAILViT: Towards Robust and Generalizable Visual Backbones for MLLMs via Gradual Feature Refinement Weijie Yin et.al. 2507.01643 null
2025-07-02 DaiFu: In-Situ Crash Recovery for Deep Learning Systems Zilong He et.al. 2507.01628 null
2025-07-02 Chart Question Answering from Real-World Analytical Narratives Maeve Hutchinson et.al. 2507.01627 null
2025-07-02 Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems Zhaoyan Sun et.al. 2507.01599 null
2025-07-02 Emotionally Intelligent Task-oriented Dialogue Systems: Architecture, Representation, and Optimisation Shutong Feng et.al. 2507.01594 null
2025-07-02 A Gift from the Integration of Discriminative and Diffusion-based Generative Learning: Boundary Refinement Remote Sensing Semantic Segmentation Hao Wang et.al. 2507.01573 null
2025-07-02 Self-Guided Process Reward Optimization with Masked Step Advantage for Process Reinforcement Learning Wu Fei et.al. 2507.01551 null
2025-07-02 Crafting Hanzi as Narrative Bridges: An AI Co-Creation Workshop for Elderly Migrants Wen Zhan et.al. 2507.01548 null
2025-07-02 MARVIS: Modality Adaptive Reasoning over VISualizations Benjamin Feuer et.al. 2507.01544 null
2025-07-02 Is External Information Useful for Stance Detection with LLMs? Quang Minh Nguyen et.al. 2507.01543 null
2025-07-02 Efficient Out-of-Scope Detection in Dialogue Systems via Uncertainty-Driven LLM Routing Álvaro Zaera et.al. 2507.01541 null
2025-07-02 Loss Functions in Diffusion Models: A Comparative Study Dibyanshu Kumar et.al. 2507.01516 null
2025-07-02 SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism Beitao Chen et.al. 2507.01513 null
2025-07-02 AVC-DPO: Aligned Video Captioning via Direct Preference Optimization Jiyang Tang et.al. 2507.01492 null
2025-07-02 Agent-as-Tool: A Study on the Hierarchical Decision Making with Reinforcement Learning Yanfei Zhang et.al. 2507.01489 null
2025-07-02 BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments Yibo Qiu et.al. 2507.01485 null
2025-07-02 Evaluating the Effectiveness of Direct Preference Optimization for Personalizing German Automatic Text Simplifications for Persons with Intellectual Disabilities Yingqiang Gao et.al. 2507.01479 null
2025-07-02 Representation Entanglement for Generation:Training Diffusion Transformers Is Much Easier Than You Think Ge Wu et.al. 2507.01467 null
2025-07-02 NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation Max Gandyra et.al. 2507.01463 null
2025-07-02 Using multi-agent architecture to mitigate the risk of LLM hallucinations Abd Elrahman Amer et.al. 2507.01446 null
2025-07-02 A Large Language Model for Chemistry and Retrosynthesis Predictions Yueqing Zhang et.al. 2507.01444 null
2025-07-02 EdgeLoRA: An Efficient Multi-Tenant LLM Serving System on Edge Devices Zheyu Shen et.al. 2507.01438 null
2025-07-02 Challenges & Opportunities with LLM-Assisted Visualization Retargeting Luke S. Snyder et.al. 2507.01436 null
2025-07-02 Pensieve Grader: An AI-Powered, Ready-to-Use Platform for Effortless Handwritten STEM Grading Yoonseok Yang et.al. 2507.01431 null
2025-07-02 TriVLA: A Unified Triple-System-Based Unified Vision-Language-Action Model for General Robot Control Zhenyang Liu et.al. 2507.01424 null
2025-07-02 Evaluating LLM Agent Collusion in Double Auctions Kushal Agrawal et.al. 2507.01413 null
2025-07-02 BronchoGAN: Anatomically consistent and domain-agnostic image-to-image translation for video bronchoscopy Ahmad Soliman et.al. 2507.01387 null
2025-07-02 RALLY: Role-Adaptive LLM-Driven Yoked Navigation for Agentic UAV Swarms Ziyao Wang et.al. 2507.01378 null
2025-07-02 AI Agents and Agentic AI-Navigating a Plethora of Concepts for Future Manufacturing Yinwang Ren et.al. 2507.01376 null
2025-07-02 Activation Reward Models for Few-Shot Model Alignment Tianning Chai et.al. 2507.01368 null
2025-07-02 Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Chris Yuhao Liu et.al. 2507.01352 null
2025-07-02 SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech Cheng Zhuangfei et.al. 2507.01348 null
2025-07-02 LEDOM: An Open and Fundamental Reverse Language Model Xunjian Yin et.al. 2507.01335 null
2025-07-02 Symbolic or Numerical? Understanding Physics Problem Solving in Reasoning LLMs Nifu Dan et.al. 2507.01334 null
2025-07-02 Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy Xiaoyun Zhang et.al. 2507.01327 null
2025-07-02 ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks Zhiyao Ren et.al. 2507.01321 null
2025-07-02 La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation Kai Liu et.al. 2507.01299 null
2025-07-02 Beyond Black-Box AI: Interpretable Hybrid Systems for Dementia Care Matthew JY Kang et.al. 2507.01282 null
2025-07-02 Rethinking All Evidence: Enhancing Trustworthy Retrieval-Augmented Generation via Conflict-Driven Summarization Juan Chen et.al. 2507.01281 null
2025-07-02 Evaluating Large Language Models for Multimodal Simulated Ophthalmic Decision-Making in Diabetic Retinopathy and Glaucoma Screening Cindy Lie Tabuse et.al. 2507.01278 null
2025-07-02 AI Meets Maritime Training: Precision Analytics for Enhanced Safety and Performance Vishakha Lall et.al. 2507.01274 null
2025-07-02 PULSE: Practical Evaluation Scenarios for Large Multimodal Model Unlearning Tatsuki Kawakami et.al. 2507.01271 null
2025-07-02 LLM-based Realistic Safety-Critical Driving Video Generation Yongjie Fu et.al. 2507.01264 null
2025-07-02 GAIus: Combining Genai with Legal Clauses Retrieval for Knowledge-based Assistant Michał Matak et.al. 2507.01259 null
2025-07-01 Beyond First-Order: Training LLMs with Stochastic Conjugate Subgradients and AdamW Di Zhang et.al. 2507.01241 null
2025-07-01 PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning Xingke Yang et.al. 2507.01216 null
2025-07-01 2024 NASA SUITS Report: LLM-Driven Immersive Augmented Reality User Interface for Robotics and Space Exploration Kathy Zhuang et.al. 2507.01206 null
2025-07-01 Escaping Platos Cave: JAM for Aligning Independently Trained Vision and Language Models Hyoseo et.al. 2507.01201 null
2025-07-01 Are Large Brainwave Foundation Models Capable Yet? Insights from Fine-tuning Na Lee et.al. 2507.01196 null
2025-07-01 FlashDP: Private Training Large Language Models with Efficient DP-SGD Liangyu Wang et.al. 2507.01154 null
2025-07-01 SonoGym: High Performance Simulation for Challenging Surgical Tasks with Robotic Ultrasound Yunke Ao et.al. 2507.01152 null
2025-07-01 Geometry-aware 4D Video Generation for Robot Manipulation Zeyi Liu et.al. 2507.01099 null
2025-07-01 A theoretical prediction for the dipole in nearby distances using cosmography Hayley J. Macpherson et.al. 2507.01095 null
2025-07-02 GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning GLM-V Team et.al. 2507.01006 null
2025-07-01 Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives Sixun Dong et.al. 2506.24124 null
2025-06-30 Calligrapher: Freestyle Text Image Customization Yue Ma et.al. 2506.24123 null
2025-06-30 TextMesh4D: High-Quality Text-to-4D Mesh Generation Sisi Dai et.al. 2506.24121 null
2025-06-30 Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime Yuqing Wang et.al. 2506.24120 null
2025-06-30 DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World Xiangtai Li et.al. 2506.24102 null
2025-06-30 Imagine for Me: Creative Conceptual Blending of Real Images and Text via Blended Attention Wonwoong Cho et.al. 2506.24085 null
2025-06-30 Logit-Gap Steering: Efficient Short-Suffix Jailbreaks for Aligned Large Language Models Tung-Ling Li et.al. 2506.24056 null
2025-06-30 Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC Xinming Wei et.al. 2506.24045 null
2025-06-30 A Survey on Vision-Language-Action Models for Autonomous Driving Sicong Jiang et.al. 2506.24044 null
2025-06-30 Foundation Models for Zero-Shot Segmentation of Scientific Images without AI-Ready Data Shubhabrata Mukherjee et.al. 2506.24039 null
2025-06-30 Minimally dissipative multi-bit logical operations Jérémie Klinger et.al. 2506.24021 null
2025-06-30 Ella: Embodied Social Agents with Lifelong Memory Hongxin Zhang et.al. 2506.24019 null
2025-06-30 EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations Hyunjong Kim et.al. 2506.24016 null
2025-06-30 Large Language Models Don't Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective Anselm R. Strohmaier et.al. 2506.24006 null
2025-06-30 Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning Seungjun Yi et.al. 2506.23998 null
2025-06-30 TaP: A Taxonomy-Guided Framework for Automated and Scalable Preference Data Generation Renren Jin et.al. 2506.23979 null
2025-06-30 Visual and Memory Dual Adapter for Multi-Modal Object Tracking Boyue Xu et.al. 2506.23972 null
2025-06-30 UMA: A Family of Universal Models for Atoms Brandon M. Wood et.al. 2506.23971 null
2025-06-30 Unveiling Decision-Making in LLMs for Text Classification : Extraction of influential and interpretable concepts with Sparse Autoencoders Mathis Le Bail et.al. 2506.23951 null
2025-06-30 AI Risk-Management Standards Profile for General-Purpose AI (GPAI) and Foundation Models Anthony M. Barrett et.al. 2506.23949 null
2025-07-01 Graft: Integrating the Domain Knowledge via Efficient Parameter Synergy for MLLMs Yang Dai et.al. 2506.23940 null
2025-06-30 Leveraging the Potential of Prompt Engineering for Hate Speech Detection in Low-Resource Languages Ruhina Tabasshum Prome et.al. 2506.23930 null
2025-06-30 IMPACT: Inflectional Morphology Probes Across Complex Typologies Mohammed J. Saeed et.al. 2506.23929 null
2025-06-30 Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice Akshit Kumar et.al. 2506.23924 null
2025-06-30 The Trilemma of Truth in Large Language Models Germans Savcisens et.al. 2506.23921 null
2025-06-30 World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation Haonan Chen et.al. 2506.23919 null
2025-06-30 Advancing Multi-Step Mathematical Reasoning in Large Language Models through Multi-Layered Self-Reflection with Auto-Prompting André de Souza Loureiro et.al. 2506.23888 null
2025-06-30 Scaling Self-Supervised Representation Learning for Symbolic Piano Performance Louis Bradshaw et.al. 2506.23869 null
2025-06-30 Large Language Models for Statistical Inference: Context Augmentation with Applications to the Two-Sample Problem and Regression Marc Ratkovic et.al. 2506.23862 null
2025-06-30 Email as the Interface to Generative AI Models: Seamless Administrative Automation Andres Navarro et.al. 2506.23850 null
2025-06-30 A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents Hang Su et.al. 2506.23844 null
2025-06-30 Refine Any Object in Any Scene Ziwei Chen et.al. 2506.23835 null
2025-06-30 Towards the "Digital Me": A vision of authentic Conversational Agents powered by personal Human Digital Twins Lluís C. Coll et.al. 2506.23826 null
2025-06-30 Flash-VStream: Efficient Real-Time Understanding for Long Video Streams Haoji Zhang et.al. 2506.23825 null
2025-07-01 The Impact of AI on Educational Assessment: A Framework for Constructive Alignment Patrick Stokkink et.al. 2506.23815 null
2025-06-30 Leveraging a Multi-Agent LLM-Based System to Educate Teachers in Hate Incidents Management Ewelina Gajewska et.al. 2506.23774 null
2025-06-30 Software Engineering for Large Language Models: Research Status, Challenges and the Road Ahead Hongzhou Rao et.al. 2506.23762 null
2025-06-30 A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications Boyang Yang et.al. 2506.23749 null
2025-07-01 Positional Bias in Binary Question Answering: How Uncertainty Shapes Model Preferences Tiziano Labruna et.al. 2506.23743 null
2025-06-30 AutoEvoEval: An Automated Framework for Evolving Close-Ended LLM Evaluation Data JiaRu Wu et.al. 2506.23735 null
2025-06-30 Radioactive Watermarks in Diffusion and Autoregressive Image Generative Models Michel Meintz et.al. 2506.23731 null
2025-06-30 System-Embedded Diffusion Bridge Models Bartlomiej Sobieski et.al. 2506.23726 null
2025-06-30 PAC Bench: Do Foundation Models Understand Prerequisites for Executing Manipulation Policies? Atharva Gundawar et.al. 2506.23725 null
2025-06-30 MDPG: Multi-domain Diffusion Prior Guidance for MRI Reconstruction Lingtong Zhang et.al. 2506.23701 null
2025-06-30 MedSAM-CA: A CNN-Augmented ViT with Attention-Enhanced Multi-Scale Fusion for Medical Image Segmentation Peiting Tian et.al. 2506.23700 null
2025-06-30 If You Had to Pitch Your Ideal Software -- Evaluating Large Language Models to Support User Scenario Writing for User Experience Experts and Laypersons Patrick Stadler et.al. 2506.23694 null
2025-06-30 Agent4S: The Transformation of Research Paradigms from the Perspective of Large Language Models Boyuan Zheng et.al. 2506.23692 null
2025-06-30 SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation Shuai Tan et.al. 2506.23690 null
2025-06-30 PokéAI: A Goal-Generating, Battle-Optimizing Multi-agent System for Pokemon Red Zihao Liu et.al. 2506.23689 null
2025-06-30 Interactive Reasoning: Visualizing and Controlling Chain-of-Thought Reasoning in Large Language Models Rock Yuren Pang et.al. 2506.23678 null
2025-06-30 Efficient Interleaved Speech Modeling through Knowledge Distillation Mohammadmahdi Nouriborji et.al. 2506.23670 null
2025-06-30 L0: Reinforcement Learning to Become General Agents Junjie Zhang et.al. 2506.23667 null
2025-06-30 On the Domain Robustness of Contrastive Vision-Language Models Mario Koddenbrock et.al. 2506.23663 null
2025-06-30 Multiscale Turbulence Synthesis: Validation in 2D Hydrodynamics Pierre Lesaffre et.al. 2506.23659 null
2025-06-30 Act-With-Think: Chunk Auto-Regressive Modeling for Generative Recommendation Yifan Wang et.al. 2506.23643 null
2025-06-30 VAP-Diffusion: Enriching Descriptions with MLLMs for Enhanced Medical Image Generation Peng Huang et.al. 2506.23641 null
2025-06-30 Unified Multimodal Understanding via Byte-Pair Visual Encoding Wanpeng Zhang et.al. 2506.23639 null
2025-06-30 Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model Mu-Chi Chen et.al. 2506.23635 null
2025-06-30 TurboVSR: Fantastic Video Upscalers and Where to Find Them Zhongdao Wang et.al. 2506.23618 null
2025-06-30 Evaluating the Simulation of Human Personality-Driven Susceptibility to Misinformation with LLMs Manuel Pratelli et.al. 2506.23610 null
2025-06-30 PGOV3D: Open-Vocabulary 3D Semantic Segmentation with Partial-to-Global Curriculum Shiqi Zhang et.al. 2506.23607 null
2025-06-30 SG-LDM: Semantic-Guided LiDAR Generation via Latent-Aligned Diffusion Zhengkang Xiang et.al. 2506.23606 null
2025-06-30 AI-Generated Lecture Slides for Improving Slide Element Detection and Retrieval Suyash Maniyar et.al. 2506.23605 null
2025-06-30 SoK: Semantic Privacy in Large Language Models Baihe Ma et.al. 2506.23603 null
2025-06-30 Semantic-guided Diverse Decoding for Large Language Model Weijie Shi et.al. 2506.23601 null
2025-06-30 Transition Matching: Scalable and Flexible Generative Modeling Neta Shaul et.al. 2506.23589 null
2025-06-30 Dataset Distillation via Vision-Language Category Prototype Yawen Zou et.al. 2506.23580 null
2025-06-30 Evaluating Multi-Agent Defences Against Jailbreaking Attacks on Large Language Models Maria Carolina Cornelia Wit et.al. 2506.23576 null
2025-06-30 MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI Huanjin Yao et.al. 2506.23563 null
2025-06-30 JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching Mingi Kwon et.al. 2506.23552 null
2025-06-30 Neural Langevin Machine: a local asymmetric learning rule can be creative Zhendong Yu et.al. 2506.23546 null
2025-06-30 Comparative Analysis of the Code Generated by Popular Large Language Models (LLMs) for MISRA C++ Compliance Malik Muhammad Umer et.al. 2506.23535 null
2025-06-30 On Recipe Memorization and Creativity in Large Language Models: Is Your Model a Creative Cook, a Bad Cook, or Merely a Plagiator? Jan Kvapil et.al. 2506.23527 null
2025-06-30 NEU-ESC: A Comprehensive Vietnamese dataset for Educational Sentiment analysis and topic Classification toward multitask learning Phan Quoc Hung Mai et.al. 2506.23524 null
2025-07-01 ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data Yu Zhang et.al. 2506.23520 null
2025-06-30 Reinforcement Fine-Tuning Enables MLLMs Learning Novel Tasks Stably Zhihao Zhang et.al. 2506.23508 null
2025-06-30 LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching Mengxiao Tian et.al. 2506.23502 null
2025-06-30 Thought-Augmented Planning for LLM-Powered Interactive Recommender Agent Haocheng Yu et.al. 2506.23485 null
2025-06-30 MTADiffusion: Mask Text Alignment Diffusion Model for Object Inpainting Jun Huang et.al. 2506.23482 null
2025-06-30 Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks Xian Zhang et.al. 2506.23481 null
2025-06-30 What to Keep and What to Drop: Adaptive Table Filtering Framework Jang Won June et.al. 2506.23463 null
2025-06-30 Can We Predict the Unpredictable? Leveraging DisasterNet-LLM for Multimodal Disaster Classification Manaswi Kulahara et.al. 2506.23462 null
2025-06-30 General Signal Model and Capacity Limit for Rydberg Quantum Information System Jieao Zhu et.al. 2506.23455 null
2025-06-30 PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions Mahesh Bhosale et.al. 2506.23440 null
2025-06-29 TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs Felipe Nuti et.al. 2506.23423 null
2025-06-29 Datasets for Fairness in Language Models: An In-Depth Survey Jiale Zhang et.al. 2506.23411 null
2025-06-29 Do LLMs Dream of Discrete Algorithms? Claudionor Coelho Jr et.al. 2506.23408 null
2025-06-29 Perspective Dial: Measuring Perspective of Text and Guiding LLM Outputs Taejin Kim et.al. 2506.23377 null
2025-06-29 Federated Timeline Synthesis: Scalable and Private Methodology For Model Training and Deployment Pawel Renc et.al. 2506.23358 null
2025-06-29 GeoProg3D: Compositional Visual Reasoning for City-Scale 3D Language Fields Shunsuke Yasuki et.al. 2506.23352 null
2025-06-29 ATGen: A Framework for Active Text Generation Akim Tsvigun et.al. 2506.23342 null
2025-06-29 Information Loss in LLMs' Multilingual Translation: The Role of Training Data, Language Proximity, and Language Family Yumeng Lin et.al. 2506.23340 null
2025-06-29 VALID-Mol: a Systematic Framework for Validated LLM-Assisted Molecular Design Malikussaid et.al. 2506.23339 null
2025-06-29 XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs Yitian Gong et.al. 2506.23325 null
2025-06-29 GATSim: Urban Mobility Simulation with Generative Agents Qi Liu et.al. 2506.23306 null
2025-07-01 Exposing and Mitigating Calibration Biases and Demographic Unfairness in MLLM Few-Shot In-Context Learning for Medical Image Classification Xing Shen et.al. 2506.23298 null
2025-06-29 Two Spelling Normalization Approaches Based on Large Language Models Miguel Domingo et.al. 2506.23288 null
2025-06-29 MoMa: Modulating Mamba for Adapting Image Foundation Models to Video Recognition Yuhuan Yang et.al. 2506.23283 null
2025-06-29 Autoregressive Denoising Score Matching is a Good Video Anomaly Detector Hanwen Zhang et.al. 2506.23282 null
2025-06-29 Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games David Guzman Piedrahita et.al. 2506.23276 null
2025-06-27 Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy Yuhao Liu et.al. 2506.22432 null
2025-06-27 The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements Bingchen Zhao et.al. 2506.22419 null
2025-06-27 HyperCLOVA X THINK Technical Report NAVER Cloud HyperCLOVA X Team et.al. 2506.22403 null
2025-06-27 Refining Czech GEC: Insights from a Multi-Experiment Approach Petr Pechman et.al. 2506.22402 null
2025-06-27 QuickSilver -- Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization Danush Khanna et.al. 2506.22396 null
2025-06-27 What Makes ChatGPT Effective for Software Issue Resolution? An Empirical Study of Developer-ChatGPT Conversations in GitHub Ramtin Ehsani et.al. 2506.22390 null
2025-06-27 Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment Yue Zhang et.al. 2506.22385 null
2025-06-27 Probabilistic Optimality for Inference-time Scaling Youkang Wang et.al. 2506.22376 null
2025-06-27 Towards Fair Rankings: Leveraging LLMs for Gender Bias Detection and Measurement Maryam Mousavian et.al. 2506.22372 null
2025-06-27 Can Large Language Models Help Students Prove Software Correctness? An Experimental Study with Dafny Carolina Carreira et.al. 2506.22370 null
2025-06-27 Concept-Level AI for Telecom: Moving Beyond Large Language Models Viswanath Kumarskandpriya et.al. 2506.22359 null
2025-06-27 Optimal Estimation of Watermark Proportions in Hybrid AI-Human Texts Xiang Li et.al. 2506.22343 null
2025-06-27 Evaluating Scoring Bias in LLM-as-a-Judge Qingquan Li et.al. 2506.22316 null
2025-06-27 Detection of Personal Data in Structured Datasets Using a Large Language Model Albert Agisha Ntwali et.al. 2506.22305 null
2025-06-27 Unfolding Generative Flows with Koopman Operators: Fast and Interpretable Sampling Erkan Turan et.al. 2506.22304 null
2025-06-27 Rethinking Visual Token Reduction in LVLMs under Cross-modal Misalignment Rui Xu et.al. 2506.22283 null
2025-06-27 Public Service Algorithm: towards a transparent, explainable, and scalable content curation for news content based on editorial values Ahmad Mel et.al. 2506.22270 null
2025-06-27 Towards Operational Data Analytics Chatbots -- Virtual Knowledge Graph is All You Need Junaid Ahmed Khan et.al. 2506.22267 null
2025-06-27 Projected Compression: Trainable Projection for Efficient Transformer Compression Maciej Stefaniak et.al. 2506.22255 null
2025-06-27 Adapting University Policies for Generative AI: Opportunities, Challenges, and Policy Solutions in Higher Education Russell Beale et.al. 2506.22231 null
2025-06-27 Cardiovascular disease classification using radiomics and geometric features from cardiac CT Ajay Mittal et.al. 2506.22226 null
2025-06-27 Hybrid Generative Modeling for Incomplete Physics: Deep Grey-Box Meets Optimal Transport Gurjeet Sangra Singh et.al. 2506.22204 null
2025-06-27 EFRame: Deeper Reasoning via Exploration-Filtering-Replay Reinforcement Learning Framework Chen Wang et.al. 2506.22200 null
2025-06-27 Exploring Modularity of Agentic Systems for Drug Discovery Laura van Weesep et.al. 2506.22189 null
2025-06-27 A Different Approach to AI Safety: Proceedings from the Columbia Convening on Openness in Artificial Intelligence and AI Safety Camille François et.al. 2506.22183 null
2025-06-27 Training Language Model to Critique for Better Refinement Tianshu Yu et.al. 2506.22157 null
2025-06-27 RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models Ronald Fecso et.al. 2506.22149 null
2025-06-27 SAGE: Spliced-Audio Generated Data for Enhancing Foundational Models in Low-Resource Arabic-English Code-Switched Speech Recognition Muhammad Umar Farooq et.al. 2506.22143 null
2025-06-27 Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs Shaojie Zhang et.al. 2506.22139 null
2025-06-27 Reasoning in machine vision: learning to think fast and slow Shaheer U. Saeed et.al. 2506.22075 null
2025-06-27 Query as Test: An Intelligent Driving Test and Data Storage Method for Integrated Cockpit-Vehicle-Road Scenarios Shengyue Yao et.al. 2506.22068 null
2025-06-27 Lost at the Beginning of Reasoning Baohao Liao et.al. 2506.22058 null
2025-06-27 Decoding Machine Translationese in English-Chinese News: LLMs vs. NMTs Delu Kong et.al. 2506.22050 null
2025-06-27 GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling Tianhao Chen et.al. 2506.22049 null
2025-06-27 Few-Shot Identity Adaptation for 3D Talking Heads via Global Gaussian Field Hong Nie et.al. 2506.22044 null
2025-06-27 UniCA: Adapting Time Series Foundation Model to General Covariate-Aware Forecasting Lu Han et.al. 2506.22039 null
2025-06-27 Can Peter Pan Survive MT? A Stylometric Study of LLMs, NMTs, and HTs in Children's Literature Translation Delu Kong et.al. 2506.22038 null
2025-06-27 SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference Yongchao He et.al. 2506.22033 null
2025-06-27 LMPVC and Policy Bank: Adaptive voice control for industrial robots with code generating LLMs and reusable Pythonic policies Ossi Parikka et.al. 2506.22028 null
2025-06-27 RoboEnvision: A Long-Horizon Video Generation Model for Multi-Task Robot Manipulation Liudi Yang et.al. 2506.22007 null
2025-06-27 LeanConjecturer: Automatic Generation of Mathematical Conjectures for Theorem Proving Naoto Onda et.al. 2506.22005 null
2025-06-27 R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning Biao Wang et.al. 2506.21980 null
2025-06-27 TASeg: Text-aware RGB-T Semantic Segmentation based on Fine-tuning Vision Foundation Models Meng Yu et.al. 2506.21975 null
2025-06-27 Don't Trust Generative Agents to Mimic Communication on Social Networks Unless You Benchmarked their Empirical Realism Simon Münker et.al. 2506.21974 null
2025-06-27 Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses Mohamed Ahmed et.al. 2506.21972 null
2025-06-27 Using Large Language Models to Suggest Informative Prior Distributions in Bayesian Statistics Michael A. Riegler et.al. 2506.21964 null
2025-06-27 PapersPlease: A Benchmark for Evaluating Motivational Values of Large Language Models Based on ERG Theory Junho Myung et.al. 2506.21961 null
2025-06-27 Optimal Return-to-Go Guided Decision Transformer for Auto-Bidding in Advertisement Hao Jiang et.al. 2506.21956 null
2025-06-27 Universal Modelling of Autocovariance Functions via Spline Kernels Lachlan Astfalck et.al. 2506.21953 null
2025-06-27 CAL-RAG: Retrieval-Augmented Multi-Agent Generation for Content-Aware Layout Design Najmeh Forouzandehmehr et.al. 2506.21934 null
2025-06-27 ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation Reza Yousefi Maragheh et.al. 2506.21931 null
2025-06-27 A Survey of LLM Inference Systems James Pan et.al. 2506.21901 null
2025-06-27 Bias, Accuracy, and Trust: Gender-Diverse Perspectives on Large Language Models Aimen Gaba et.al. 2506.21898 null
2025-06-27 Exploring Task-Solving Paradigm for Generalized Cross-Domain Face Anti-Spoofing via Reinforcement Fine-Tuning Fangling Jiang et.al. 2506.21895 null
2025-06-27 Integrating Multi-Modal Sensors: A Review of Fusion Techniques for Intelligent Vehicles Chuheng Wei et.al. 2506.21885 null
2025-06-27 A Dual-Layered Evaluation of Geopolitical and Cultural Bias in LLMs Sean Kim et.al. 2506.21881 null
2025-06-27 WildSpeech-Bench: Benchmarking Audio LLMs in Natural Speech Conversation Jian Zhang et.al. 2506.21875 null
2025-06-27 On the Feasibility of Poisoning Text-to-Image AI Models via Adversarial Mislabeling Stanley Wu et.al. 2506.21874 null
2025-06-27 Grounding-Aware Token Pruning: Recovering from Drastic Performance Drops in Visual Grounding Caused by Pruning Tzu-Chun Chien et.al. 2506.21873 null
2025-06-27 RiverEcho: Real-Time Interactive Digital System for Ancient Yellow River Culture Haofeng Wang et.al. 2506.21865 null
2025-06-27 DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE Hang Shao et.al. 2506.21864 null
2025-06-27 LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs Boyuan Sun et.al. 2506.21862 null
2025-06-27 SPADE: Spatial Transcriptomics and Pathology Alignment Using a Mixture of Data Experts for an Expressive Latent Space Ekaterina Redekop et.al. 2506.21857 null
2025-06-27 Skill-Nav: Enhanced Navigation with Versatile Quadrupedal Locomotion via Waypoint Interface Dewei Wang et.al. 2506.21853 null
2025-06-27 The Consistency Hypothesis in Uncertainty Quantification for Large Language Models Quan Xiao et.al. 2506.21849 null
2025-06-27 Adversarial Threats in Quantum Machine Learning: A Survey of Attacks and Defenses Archisman Ghosh et.al. 2506.21842 null
2025-06-27 PARSI: Persian Authorship Recognition via Stylometric Integration Kourosh Shahnazari et.al. 2506.21840 null
2025-06-27 ProSAM: Enhancing the Robustness of SAM-based Visual Reference Segmentation with Probabilistic Prompts Xiaoqi Wang et.al. 2506.21835 null
2025-06-27 TaleForge: Interactive Multimodal System for Personalized Story Creation Minh-Loi Nguyen et.al. 2506.21832 null
2025-06-27 Few-Shot Segmentation of Historical Maps via Linear Probing of Vision Foundation Models Rafael Sterzinger et.al. 2506.21826 null
2025-06-26 Exploring the change in scientific readability following the release of ChatGPT Abdulkareem Alsudais et.al. 2506.21825 null
2025-06-26 Exploring the Structure of AI-Induced Language Change in Scientific English Riley Galpin et.al. 2506.21817 null
2025-06-26 CAT-SG: A Large Dynamic Scene Graph Dataset for Fine-Grained Understanding of Cataract Surgery Felix Holm et.al. 2506.21813 null
2025-06-26 Towards Transparent AI: A Survey on Explainable Large Language Models Avash Palikhe et.al. 2506.21812 null
2025-06-26 CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation Nicolas Bougie et.al. 2506.21805 null
2025-06-26 Multi-task parallelism for robust pre-training of graph foundation models on multi-source, multi-fidelity atomistic modeling data Massimiliano Lupo Pasini et.al. 2506.21788 null
2025-06-26 MobiVerse: Scaling Urban Mobility Simulation with Hybrid Lightweight Domain-Specific Generator and Large Language Models Yifan Liu et.al. 2506.21784 null
2025-06-26 Evaluating List Construction and Temporal Understanding capabilities of Large Language Models Alexandru Dumitru et.al. 2506.21783 null
2025-06-26 M3PO: Massively Multi-Task Model-Based Policy Optimization Aditya Narendra et.al. 2506.21782 null
2025-06-26 THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning? Xin Wang et.al. 2506.21763 null
2025-06-26 (Fact) Check Your Bias Eivind Morris Bakke et.al. 2506.21745 null
2025-06-26 Hierarchical Reasoning Model Guan Wang et.al. 2506.21734 null
2025-06-26 Exploring Image Generation via Mutually Exclusive Probability Spaces and Local Correlation Hypothesis Chenqiu Zhao et.al. 2506.21731 null
2025-06-26 FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering Liangyu Zhong et.al. 2506.21710 null
2025-06-26 TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360° Panorama Generation Hakan Çapuk et.al. 2506.21681 null
2025-06-26 Infrared foundations for quantum geometry I: Catalogue of totally symmetric rank-three field theories Will Barker et.al. 2506.21662 null
2025-06-26 APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization Minjie Hong et.al. 2506.21655 null
2025-06-26 Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Ziyue Li et.al. 2506.21551 null
2025-06-26 mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale Xiaona Zhou et.al. 2506.21550 null
2025-06-26 SAM4D: Segment Anything in Camera and LiDAR Streams Jianyun Xu et.al. 2506.21547 null
2025-06-26 PsyLite Technical Report Fangjun Ding et.al. 2506.21536 null
2025-06-26 Exploring the Design Space of 3D MLLMs for CT Report Generation Mohammed Baharoon et.al. 2506.21535 null
2025-06-26 "What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets Akshay Paruchuri et.al. 2506.21532 null
2025-06-26 Potemkin Understanding in Large Language Models Marina Mancoridis et.al. 2506.21521 null
2025-06-26 Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge Boyu Gou et.al. 2506.21506 null
2025-06-26 Bridging Offline and Online Reinforcement Learning for LLMs Jack Lanchantin et.al. 2506.21495 null
2025-06-26 Global and Local Entailment Learning for Natural World Imagery Srikumar Sastry et.al. 2506.21476 null
2025-06-26 Efficient and Reuseable Cloud Configuration Search Using Discovery Spaces Michael Johnston et.al. 2506.21467 null
2025-06-26 ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing Huadai Liu et.al. 2506.21448 null
2025-06-26 Controllable 3D Placement of Objects with Scene-Aware Diffusion Models Mohamed Omran et.al. 2506.21446 null
2025-06-26 Text2Cypher Across Languages: Evaluating Foundational Models Beyond English Makbule Gulcin Ozsoy et.al. 2506.21445 null
2025-06-26 Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset Evaluation Sweta Banerjee et.al. 2506.21444 null
2025-06-26 Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection Ali Şenol et.al. 2506.21443 null
2025-06-26 Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning Prajwal Koirala et.al. 2506.21427 null
2025-06-26 XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation Bowen Chen et.al. 2506.21416 null
2025-06-26 Distributed Cross-Channel Hierarchical Aggregation for Foundation Models Aristeidis Tsaris et.al. 2506.21411 null
2025-06-26 Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference Colin Samplawski et.al. 2506.21408 null
2025-06-26 TableMoE: Neuro-Symbolic Routing for Structured Expert Reasoning in Multimodal Table Understanding Junwen Zhang et.al. 2506.21393 null
2025-06-26 Early Stopping Tabular In-Context Learning Jaris Küken et.al. 2506.21387 null
2025-06-26 Leveraging LLM-Assisted Query Understanding for Live Retrieval-Augmented Generation Guanting Dong et.al. 2506.21384 null
2025-06-26 Canonical Quantization of a Memristive Leaky Integrate-and-Fire Neuron Circuit Dean Brand et.al. 2506.21363 null
2025-06-26 Structuralist Approach to AI Literary Criticism: Leveraging Greimas Semiotic Square for Large Language Models Fangzhou Dong et.al. 2506.21360 null
2025-06-26 CoPa-SG: Dense Scene Graphs with Parametric and Proto-Relations Julian Lorenz et.al. 2506.21357 null
2025-06-26 SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning Melanie Rieff et.al. 2506.21355 null
2025-06-26 DynamicBench: Evaluating Real-Time Report Generation in Large Language Models Jingyao Li et.al. 2506.21343 null
2025-06-26 Active Inference AI Systems for Scientific Discovery Karthik Duraisamy et.al. 2506.21329 null
2025-06-26 Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts Jiajie Yang et.al. 2506.21328 null
2025-06-26 DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images Badri Vishal Kasuba et.al. 2506.21316 null
2025-06-26 Exploring Adapter Design Tradeoffs for Low Resource Music Generation Atharva Mehta et.al. 2506.21298 null
2025-06-26 Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models Bram Willemsen et.al. 2506.21294 null
2025-06-26 Small Encoders Can Rival Large Decoders in Detecting Groundedness Istabrak Abbes et.al. 2506.21288 null
2025-06-26 Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning Xin Xu et.al. 2506.21285 null
2025-06-26 Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution Lukas Sablica et.al. 2506.21278 null
2025-06-26 HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context Qize Yang et.al. 2506.21277 null
2025-06-26 Cat and Mouse -- Can Fake Text Generation Outpace Detector Systems? Andrea McGlinchey et.al. 2506.21274 null
2025-06-26 DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster Ji Qi et.al. 2506.21263 null
2025-06-26 Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents Tianyi Men et.al. 2506.21252 null
2025-06-26 ACTLLM: Action Consistency Tuned Large Language Model Jing Bi et.al. 2506.21250 null
2025-06-26 GANet-Seg: Adversarial Learning for Brain Tumor Segmentation with Hybrid Generative Models Qifei Cui et.al. 2506.21245 null
2025-06-26 Zero-Shot Learning for Obsolescence Risk Forecasting Elie Saad et.al. 2506.21240 null
2025-06-26 Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval Yongchan Chun et.al. 2506.21222 null
2025-06-26 Complexity-aware fine-tuning Andrey Goncharov et.al. 2506.21220 null
2025-06-26 Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? Haoang Chi et.al. 2506.21215 null
2025-06-26 $T^3$ : Multi-level Tree-based Automatic Program Repair with Large Language Models Quanming Liu et.al. 2506.21211 null
2025-06-26 BitMark for Infinity: Watermarking Bitwise Autoregressive Image Generative Models Louis Kerner et.al. 2506.21209 null
2025-06-26 MedPrompt: LLM-CNN Fusion with Weight Routing for Medical Image Segmentation and Classification Shadman Sobhan et.al. 2506.21199 null
2025-06-26 Prompt-Guided Turn-Taking Prediction Koji Inoue et.al. 2506.21191 null
2025-06-26 GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding Zijun Lin et.al. 2506.21188 null
2025-06-26 Task-Aware KV Compression For Cost-Effective Long Video Understanding Minghao Qin et.al. 2506.21184 null
2025-06-26 Generative Adversarial Evasion and Out-of-Distribution Detection for UAV Cyber-Attacks Deepak Kumar Panda et.al. 2506.21142 null
2025-06-26 How Good Are Synthetic Requirements ? Evaluating LLM-Generated Datasets for AI4RE Abdelkarim El-Hajjami et.al. 2506.21138 null
2025-06-26 IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes Yujia Liang et.al. 2506.21116 null
2025-06-26 OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography Caoshuo Li et.al. 2506.21101 null
2025-06-26 Enhancing LLM Tool Use with High-quality Instruction Data from Knowledge Graph Jingwei Wang et.al. 2506.21071 null
2025-06-26 MT2-CSD: A New Dataset and Multi-Semantic Knowledge Fusion Method for Conversational Stance Detection Fuqiang Niu et.al. 2506.21053 null
2025-06-26 V2X-REALM: Vision-Language Model-Based Robust End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling Junwei You et.al. 2506.21041 null
2025-06-26 Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning Haodong Lu et.al. 2506.21035 null
2025-06-26 BLOCKS: Blockchain-supported Cross-Silo Knowledge Sharing for Efficient LLM Services Zhaojiacheng Zhou et.al. 2506.21033 null
2025-06-26 Large Language Models Acing Chartered Accountancy Jatin Gupta et.al. 2506.21031 null
2025-06-26 STEP Planner: Constructing cross-hierarchical subgoal tree as an embodied long-horizon task planner Zhou Tianxing et.al. 2506.21030 null
2025-06-26 Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation Ze Wang et.al. 2506.21022 null
2025-06-26 Multimodal Prompt Alignment for Facial Expression Recognition Fuyan Ma et.al. 2506.21017 null
2025-06-26 HybridQ: Hybrid Classical-Quantum Generative Adversarial Network for Skin Disease Image Generation Qingyue Jiao et.al. 2506.21015 null
2025-06-26 Distilling Normalizing Flows Steven Walton et.al. 2506.21003 null
2025-06-26 SAC: A Framework for Measuring and Inducing Personality Traits in LLMs with Dynamic Intensity Control Adithya Chittem et.al. 2506.20993 null
2025-06-26 Segment Anything in Pathology Images with Natural Language Zhixuan Chen et.al. 2506.20988 null
2025-06-26 Our Coding Adventure: Using LLMs to Personalise the Narrative of a Tangible Programming Robot for Preschoolers Martin Ruskov et.al. 2506.20982 null
2025-06-26 Response Quality Assessment for Retrieval-Augmented Generation via Conditional Conformal Factuality Naihe Feng et.al. 2506.20978 null
2025-06-26 Where is AIED Headed? Key Topics and Emerging Frontiers (2020-2024) Shihui Feng et.al. 2506.20971 null
2025-06-26 Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and Trends Tian-Yu Xiang et.al. 2506.20966 null
2025-06-26 Evidence-based diagnostic reasoning with multi-agent copilot for human pathology Chengkuan Chen et.al. 2506.20964 null
2025-06-26 EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora Fangyuan Zhang et.al. 2506.20963 null
2025-06-26 Hierarchical Sub-action Tree for Continuous Sign Language Recognition Dejie Yang et.al. 2506.20947 null
2025-06-26 Consistent Zero-shot 3D Texture Synthesis Using Geometry-aware Diffusion and Temporal Video Models Donggoo Kang et.al. 2506.20946 null
2025-06-26 E-FreeM2: Efficient Training-Free Multi-Scale and Cross-Modal News Verification via MLLMs Van-Hoang Phan et.al. 2506.20944 null
2025-06-26 Model State Arithmetic for Machine Unlearning Keivan Rezaei et.al. 2506.20941 null
2025-06-26 ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks Joshua H. Davis et.al. 2506.20938 null
2025-06-26 LLM-guided Chemical Process Optimization with a Multi-Agent Approach Tong Zeng et.al. 2506.20921 null
2025-06-26 FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Guilherme Penedo et.al. 2506.20920 null
2025-06-26 Metadata Enrichment of Long Text Documents using Large Language Models Manika Lamba et.al. 2506.20918 null
2025-06-26 ZKPROV: A Zero-Knowledge Approach to Dataset Provenance for Large Language Models Mina Namazi et.al. 2506.20915 null
2025-06-26 FaSTA $^*$ : Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing Advait Gupta et.al. 2506.20911 null
2025-06-25 Omniwise: Predicting GPU Kernels Performance with LLMs Zixian Wang et.al. 2506.20886 null
2025-06-25 MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans Shubhankar Borse et.al. 2506.20879 null
2025-06-25 3DGH: 3D Head Generation with Composable Hair and Face Chengan He et.al. 2506.20875 null
2025-06-25 Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation Md Toufique Hasan et.al. 2506.20869 null
2025-06-25 Leaner Training, Lower Leakage: Revisiting Memorization in LLM Fine-Tuning with LoRA Fei Wang et.al. 2506.20856 null
2025-06-25 Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision Yuting He et.al. 2506.20850 null
2025-06-25 Uncovering Hidden Violent Tendencies in LLMs: A Demographic Analysis via Behavioral Vignettes Quintin Myers et.al. 2506.20822 null
2025-06-25 MultiFinRAG: An Optimized Multimodal Retrieval-Augmented Generation (RAG) Framework for Financial Question Answering Chinmay Gondhalekar et.al. 2506.20821 null
2025-06-25 GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization Martin Andrews et.al. [2506.20807](http://arxiv.org/abs

About

Automatically update arXiv papers about LLM Reasoning, LLM Evaluation, LLM & MLLM and Video Understanding using Github Actions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages