Table of Contents
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-07-23 | InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation | Shuai Yang et.al. | 2507.17520 | null |
| 2025-07-23 | MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs | Alexander R. Fabbri et.al. | 2507.17476 | null |
| 2025-07-23 | HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs | Zhaolin Cai et.al. | 2507.17394 | null |
| 2025-07-23 | Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance | Rishi Parekh et.al. | 2507.17273 | null |
| 2025-07-22 | Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning | Junhao Shen et.al. | 2507.16814 | null |
| 2025-07-22 | Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning | Ang Li et.al. | 2507.16746 | null |
| 2025-07-23 | WAKENLLM: Evaluating Reasoning Potential and Stability in LLMs via Fine-Grained Benchmarking | Zipeng Ling et.al. | 2507.16199 | null |
| 2025-07-21 | Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization | Shengchao Liu et.al. | 2507.16110 | null |
| 2025-07-21 | The Impact of Language Mixing on Bilingual LLM Reasoning | Yihao Li et.al. | 2507.15849 | null |
| 2025-07-21 | EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent | Jiaao Li et.al. | 2507.15428 | null |
| 2025-07-20 | LEKIA: A Framework for Architectural Alignment via Expert Knowledge Injection | Boning Zhao et.al. | 2507.14944 | null |
| 2025-07-18 | A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning | Licheng Liu et.al. | 2507.14295 | null |
| 2025-07-18 | Team of One: Cracking Complex Video QA with Model Synergy | Jun Xie et.al. | 2507.13820 | null |
| 2025-07-17 | The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner | Zhouqi Hua et.al. | 2507.13332 | null |
| 2025-07-17 | Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark | Junsu Kim et.al. | 2507.13314 | null |
| 2025-07-17 | HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models | Ashray Gupta et.al. | 2507.13238 | null |
| 2025-07-17 | Probabilistic Soundness Guarantees in LLM Reasoning Chains | Weiqiu You et.al. | 2507.12948 | null |
| 2025-07-16 | Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize? | Yanjian Zhang et.al. | 2507.11423 | null |
| 2025-07-15 | KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning? | Soumadeep Saha et.al. | 2507.11408 | null |
| 2025-07-15 | Guiding LLM Decision-Making with Fairness Reward Models | Zara Hall et.al. | 2507.11344 | null |
| 2025-07-15 | MSA at ImageCLEF 2025 Multimodal Reasoning: Multilingual Multimodal Reasoning With Ensemble Vision Language Models | Seif Ahmed et.al. | 2507.11114 | null |
| 2025-07-15 | Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation | Yanbo Wang et.al. | 2507.11001 | null |
| 2025-07-15 | Modeling Understanding of Story-Based Analogies Using Large Language Models | Kalit Inani et.al. | 2507.10957 | null |
| 2025-07-14 | Foundation Model Driven Robotics: A Comprehensive Review | Muhammad Tayyab Khan et.al. | 2507.10087 | null |
| 2025-07-13 | Reframing SAR Target Recognition as Visual Reasoning: A Chain-of-Thought Dataset with Multimodal LLMs | Chaoran Li et.al. | 2507.09535 | null |
| 2025-07-11 | GraphRunner: A Multi-Stage Framework for Efficient and Accurate Graph-Based Retrieval | Savini Kashmira et.al. | 2507.08945 | null |
| 2025-07-11 | Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning | Xingguang Ji et.al. | 2507.08649 | null |
| 2025-07-11 | ChainEdit: Propagating Ripple Effects in LLM Knowledge Editing through Logical Rule-Guided Chains | Zilu Dong et.al. | 2507.08427 | null |
| 2025-07-10 | ALCo-FM: Adaptive Long-Context Foundation Model for Accident Prediction | Pinaki Prasad Guha Neogi et.al. | 2507.08153 | null |
| 2025-07-10 | MIRA: A Novel Framework for Fusing Modalities in Medical RAG | Jinhong Wang et.al. | 2507.07902 | null |
| 2025-07-10 | The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs | Jierun Chen et.al. | 2507.07562 | null |
| 2025-07-10 | RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning | Hongzhi Zhang et.al. | 2507.07451 | null |
| 2025-07-11 | StarDojo: Benchmarking Open-Ended Behaviors of Agentic Multimodal LLMs in Production-Living Simulations with Stardew Valley | Weihao Tan et.al. | 2507.07445 | null |
| 2025-07-09 | MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning | Chengfei Wu et.al. | 2507.07297 | null |
| 2025-07-07 | DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning | Shreyas Vinaya Sathyanarayana et.al. | 2507.07060 | null |
| 2025-07-09 | First Return, Entropy-Eliciting Explore | Tianyu Zheng et.al. | 2507.07017 | null |
| 2025-07-09 | Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs | Yahan Yu et.al. | 2507.06999 | null |
| 2025-07-09 | Are They All Good? Evaluating the Quality of CoTs in LLM-based Code Generation | Binquan Zhang et.al. | 2507.06980 | null |
| 2025-07-10 | Rethinking Verification for LLM Code Generation: From Generation to Testing | Zihan Ma et.al. | 2507.06920 | null |
| 2025-07-09 | From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization | Xinjie Chen et.al. | 2507.06573 | null |
| 2025-07-13 | Perception-Aware Policy Optimization for Multimodal Reasoning | Zhenhailong Wang et.al. | 2507.06448 | null |
| 2025-07-08 | Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling | Prahitha Movva et.al. | 2507.06183 | null |
| 2025-07-10 | Skywork-R1V3 Technical Report | Wei Shen et.al. | 2507.06167 | null |
| 2025-07-08 | KERAG_R: Knowledge-Enhanced Retrieval-Augmented Generation for Recommendation | Zeyuan Meng et.al. | 2507.05863 | null |
| 2025-07-09 | Measuring how changes in code readability attributes affect code quality evaluation by Large Language Models | Igor Regis da Silva Simoes et.al. | 2507.05289 | null |
| 2025-07-07 | Spatio-Temporal LLM: Reasoning about Environments and Actions | Haozhen Zheng et.al. | 2507.05258 | null |
| 2025-07-07 | Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning | Yana Wei et.al. | 2507.05255 | null |
| 2025-07-07 | MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction | Kaleem Ullah Qasim et.al. | 2507.04893 | null |
| 2025-07-17 | DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge | Wenyao Zhang et.al. | 2507.04447 | null |
| 2025-07-05 | CoT-Segmenter: Enhancing OOD Detection in Dense Road Scenes via Chain-of-Thought Reasoning | Jeonghyo Song et.al. | 2507.03984 | null |
| 2025-07-04 | Effects of structure on reasoning in instance-level Self-Discover | Sachith Gunasekara et.al. | 2507.03347 | null |
| 2025-07-03 | RCA Copilot: Transforming Network Data into Actionable Insights via Large Language Models | Alexander Shan et.al. | 2507.03224 | null |
| 2025-07-03 | Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization | Marco Simoni et.al. | 2507.03051 | null |
| 2025-07-02 | Look-Back: Implicit Visual Re-focusing in MLLM Reasoning | Shuo Yang et.al. | 2507.03019 | null |
| 2025-07-01 | From Answers to Rationales: Self-Aligning Multimodal Reasoning with Answer-Oriented Chain-of-Thought | Wentao Tan et.al. | 2507.02984 | null |
| 2025-06-26 | Large Language Model Agent for Modular Task Execution in Drug Discovery | Janghoon Ock et.al. | 2507.02925 | null |
| 2025-07-03 | MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs | Purbesh Mitra et.al. | 2507.02851 | null |
| 2025-07-03 | Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation | Jungkoo Kang et.al. | 2507.02253 | null |
| 2025-07-02 | Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs | Mohammad Ali Alomrani et.al. | 2507.02076 | null |
| 2025-07-02 | GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning | GLM-V Team et.al. | 2507.01006 | null |
| 2025-07-01 | HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning | Zhi Jing et.al. | 2507.00833 | null |
| 2025-07-01 | Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning | Maggie Huan et.al. | 2507.00432 | null |
| 2025-07-01 | Causal Prompting for Implicit Sentiment Analysis with Large Language Models | Jing Ren et.al. | 2507.00389 | null |
| 2025-06-22 | TalentMine: LLM-Based Extraction and Question-Answering from Multimodal Talent Tables | Varun Mannam et.al. | 2507.00041 | null |
| 2025-07-03 | Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers | Zhaochen Su et.al. | 2506.23918 | null |
| 2025-06-30 | Interactive Reasoning: Visualizing and Controlling Chain-of-Thought Reasoning in Large Language Models | Rock Yuren Pang et.al. | 2506.23678 | null |
| 2025-06-30 | MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI | Huanjin Yao et.al. | 2506.23563 | null |
| 2025-06-29 | Are Large Language Models Capable of Deep Relational Reasoning? Insights from DeepSeek-R1 and Benchmark Comparisons | Chi Chiu So et.al. | 2506.23128 | null |
| 2025-06-29 | Decoding Memes: Benchmarking Narrative Role Classification across Multilingual and Multimodal Models | Shivam Sharma et.al. | 2506.23122 | null |
| 2025-06-28 | MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning | Yulun Jiang et.al. | 2506.22992 | null |
| 2025-06-26 | APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization | Minjie Hong et.al. | 2506.21655 | null |
| 2025-06-24 | FrankenBot: Brain-Morphic Modular Orchestration for Robotic Manipulation with Vision-Language Models | Shiyi Wang et.al. | 2506.21627 | null |
| 2025-06-30 | FinEval-KR: A Financial Domain Evaluation Framework for Large Language Models' Knowledge and Reasoning | Shaoyu Dou et.al. | 2506.21591 | null |
| 2025-06-11 | Debunk and Infer: Multimodal Fake News Detection via Diffusion-Generated Evidence and LLM Reasoning | Kaiying Yan et.al. | 2506.21557 | null |
| 2025-06-26 | HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context | Qize Yang et.al. | 2506.21277 | null |
| 2025-06-26 | Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? | Haoang Chi et.al. | 2506.21215 | null |
| 2025-06-25 | MultiFinRAG: An Optimized Multimodal Retrieval-Augmented Generation (RAG) Framework for Financial Question Answering | Chinmay Gondhalekar et.al. | 2506.20821 | null |
| 2025-06-25 | Generative AI for Vulnerability Detection in 6G Wireless Networks: Advances, Case Study, and Future Directions | Shuo Yang et.al. | 2506.20488 | null |
| 2025-06-24 | KnowMap: Efficient Knowledge-Driven Task Adaptation for LLMs | Kelin Fu et.al. | 2506.19527 | null |
| 2025-06-24 | MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models | Yinan Xia et.al. | 2506.19257 | null |
| 2025-06-25 | Thought Anchors: Which LLM Reasoning Steps Matter? | Paul C. Bogdan et.al. | 2506.19143 | null |
| 2025-06-23 | Finding Clustering Algorithms in the Transformer Architecture | Kenneth L. Clarkson et.al. | 2506.19125 | null |
| 2025-06-23 | Human-Aligned Faithfulness in Toxicity Explanations of LLMs | Ramaravind K. Mothilal et.al. | 2506.19113 | null |
| 2025-06-23 | Baba is LLM: Reasoning in a Game with Dynamic Rules | Fien van Wetten et.al. | 2506.19095 | null |
| 2025-06-23 | OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization | Yiyou Sun et.al. | 2506.18880 | null |
| 2025-06-24 | ReDit: Reward Dithering for Improved LLM Policy Optimization | Chenxing Wei et.al. | 2506.18631 | null |
| 2025-06-22 | Adapting Vision-Language Models for Evaluating World Models | Mariya Hendriksen et.al. | 2506.17967 | null |
| 2025-06-20 | Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling? | Mingyuan Wu et.al. | 2506.17417 | null |
| 2025-06-14 | CORONA: A Coarse-to-Fine Framework for Graph-based Recommendation with Large Language Models | Junze Chen et.al. | 2506.17281 | null |
| 2025-06-25 | No Free Lunch: Rethinking Internal Feedback for LLM Reasoning | Yanzhi Zhang et.al. | 2506.17219 | null |
| 2025-06-20 | Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens | Zeyuan Yang et.al. | 2506.17218 | link |
| 2025-06-20 | MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation | Shoubin Yu et.al. | 2506.17113 | link |
| 2025-06-20 | MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models | Xiaolong Wang et.al. | 2506.17046 | null |
| 2025-06-20 | LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation | Tongtian Yue et.al. | 2506.16691 | null |
| 2025-06-19 | GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View | Fenghua Cheng et.al. | 2506.16633 | null |
| 2025-06-19 | History-Augmented Vision-Language Models for Frontier-Based Zero-Shot Object Navigation | Mobin Habibpour et.al. | 2506.16623 | null |
| 2025-06-19 | How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering? | Giuseppe Lando et.al. | 2506.16450 | null |
| 2025-06-19 | TrajSceneLLM: A Multimodal Perspective on Semantic GPS Trajectory Analysis | Chunhou Ji et.al. | 2506.16401 | link |
| 2025-07-17 | SHREC: A Framework for Advancing Next-Generation Computational Phenotyping with Large Language Models | Sarah Pungitore et.al. | 2506.16359 | null |
| 2025-06-19 | GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning | Yi Chen et.al. | 2506.16141 | link |
| 2025-06-23 | SLR: An Automated Synthesis Framework for Scalable Logical Reasoning | Lukas Helff et.al. | 2506.15787 | null |
| 2025-06-18 | CC-LEARN: Cohort-based Consistency Learning | Xiao Ye et.al. | 2506.15662 | null |
| 2025-06-18 | MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering | Xinqi Fan et.al. | 2506.15298 | null |
| 2025-06-17 | Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective | Zhoujun Cheng et.al. | 2506.14965 | link |
| 2025-06-17 | Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework | Mohna Chakraborty et.al. | 2506.14948 | null |
| 2025-06-17 | PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning | Yizhen Zhang et.al. | 2506.14907 | link |
| 2025-06-12 | FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models | Yao Zhang et.al. | 2506.14824 | null |
| 2025-06-17 | RadFabric: Agentic AI System with Reasoning Capability for Radiology | Wenting Chen et.al. | 2506.14142 | null |
| 2025-06-17 | A Hierarchical Test Platform for Vision Language Model (VLM)-Integrated Real-World Autonomous Driving | Yupeng Zhou et.al. | 2506.14100 | null |
| 2025-06-16 | How Does LLM Reasoning Work for Code? A Survey and a Call to Action | Ira Ceka et.al. | 2506.13932 | null |
| 2025-06-16 | VL-GenRM: Enhancing Vision-Language Verification via Vision Experts and Iterative Training | Jipeng Zhang et.al. | 2506.13888 | null |
| 2025-06-16 | LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning | Miho Koda et.al. | 2506.13841 | link |
| 2025-06-16 | Steering LLM Thinking with Budget Guidance | Junyan Li et.al. | 2506.13752 | link |
| 2025-06-16 | Decompositional Reasoning for Graph Retrieval with Large Language Models | Valentin Six et.al. | 2506.13380 | null |
| 2025-07-10 | Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models | James Chua et.al. | 2506.13206 | null |
| 2025-06-16 | FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design | Kai Lan et.al. | 2506.13066 | null |
| 2025-06-26 | Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning | Haibo Qiu et.al. | 2506.13056 | null |
| 2025-06-20 | Domain Specific Benchmarks for Evaluating Multimodal Large Language Models | Khizar Anjum et.al. | 2506.12958 | null |
| 2025-06-15 | SciDA: Scientific Dynamic Assessor of LLMs | Junting Zhou et.al. | 2506.12909 | null |
| 2025-06-14 | Graph of Verification: Structured Verification of LLM Reasoning with Directed Acyclic Graphs | Jiwei Fang et.al. | 2506.12509 | null |
| 2025-06-14 | Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics | Asifullah khan et.al. | 2506.12365 | null |
| 2025-06-22 | MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval | Mingjun Xu et.al. | 2506.12364 | null |
| 2025-06-13 | Tracing LLM Reasoning Processes with Strategic Games: A Framework for Planning, Revision, and Resource-Constrained Decision Making | Xiaopeng Yuan et.al. | 2506.12012 | null |
| 2025-06-22 | How Visual Representations Map to Language Feature Space in Multimodal LLMs | Constantin Venhoff et.al. | 2506.11976 | null |
| 2025-06-13 | LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? | Zihan Zheng et.al. | 2506.11928 | null |
| 2025-06-13 | EasyARC: Evaluating Vision Language Models on True Visual Reasoning | Mert Unsal et.al. | 2506.11595 | null |
| 2025-06-13 | VFaith: Do Large Multimodal Models Really Reason on Seen Images Rather than Previous Memories? | Jiachen Yu et.al. | 2506.11571 | null |
| 2025-07-04 | LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment | Shipeng Li et.al. | 2506.11480 | null |
| 2025-06-09 | KokushiMD-10: Benchmark for Evaluating Large Language Models on Ten Japanese National Healthcare Licensing Examinations | Junyu Liu et.al. | 2506.11114 | null |
| 2025-06-13 | MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning | Yuxuan Luo et.al. | 2506.10963 | null |
| 2025-06-12 | Improving Named Entity Transcription with Contextual LLM-based Revision | Viet Anh Trinh et.al. | 2506.10779 | null |
| 2025-06-12 | NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors | Numaan Naeem et.al. | 2506.10627 | link |
| 2025-06-25 | Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning | Yuhao Zhou et.al. | 2506.10521 | null |
| 2025-06-12 | Reliable Reasoning Path: Distilling Effective Guidance for LLM Reasoning with Knowledge Graphs | Yilin Xiao et.al. | 2506.10508 | null |
| 2025-06-16 | Specification and Evaluation of Multi-Agent LLM Systems -- Prototype and Cybersecurity Applications | Felix Härer et.al. | 2506.10467 | link |
| 2025-06-12 | Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty | Zehui Ling et.al. | 2506.10446 | null |
| 2025-06-12 | Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts | Zaijing Li et.al. | 2506.10357 | null |
| 2025-06-12 | Code Execution as Grounded Supervision for LLM Reasoning | Dongwon Jung et.al. | 2506.10343 | link |
| 2025-06-11 | ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering | Caijun Jia et.al. | 2506.10116 | null |
| 2025-06-19 | Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing | Junfei Wu et.al. | 2506.09965 | link |
| 2025-06-11 | Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning | Xiangning Yu et.al. | 2506.09853 | null |
| 2025-06-11 | AD^2-Bench: A Hierarchical CoT Benchmark for MLLM in Autonomous Driving under Adverse Conditions | Zhaoyang Wei et.al. | 2506.09557 | null |
| 2025-06-11 | Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models | Shuai Wang et.al. | 2506.09532 | null |
| 2025-06-13 | e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs | Amrith Setlur et.al. | 2506.09026 | null |
| 2025-06-10 | Learning to Reason Across Parallel Samples for LLM Reasoning | Jianing Qi et.al. | 2506.09014 | null |
| 2025-06-10 | SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning | Xiao Liang et.al. | 2506.08989 | link |
| 2025-06-10 | Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning | Kongcheng Zhang et.al. | 2506.08745 | link |
| 2025-06-10 | Safe and Economical UAV Trajectory Planning in Low-Altitude Airspace: A Hybrid DRL-LLM Approach with Compliance Awareness | Yanwei Gong et.al. | 2506.08532 | null |
| 2025-06-10 | Reinforce LLM Reasoning through Multi-Agent Reflection | Yurun Yuan et.al. | 2506.08379 | null |
| 2025-06-18 | Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency | Chenlong Wang et.al. | 2506.08343 | null |
| 2025-06-09 | From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium | Xie Yi et.al. | 2506.08292 | link |
| 2025-06-09 | Automatic Generation of Inference Making Questions for Reading Comprehension Assessments | Wanjing Anya Ma et.al. | 2506.08260 | link |
| 2025-06-12 | Play to Generalize: Learning to Reason Through Game Play | Yunfei Xie et.al. | 2506.08011 | link |
| 2025-06-11 | Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations | Yizhen Li et.al. | 2506.07943 | null |
| 2025-06-09 | WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning | Jie Yang et.al. | 2506.07905 | link |
| 2025-06-10 | Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation | Jiaxiang Chen et.al. | 2506.07820 | null |
| 2025-06-11 | AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking | Silin Gao et.al. | 2506.07751 | null |
| 2025-06-10 | Synthesis by Design: Controlled Data Generation via Structural Guidance | Lei Xu et.al. | 2506.07664 | null |
| 2025-06-11 | SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems | Peiran Li et.al. | 2506.07564 | null |
| 2025-06-09 | SELT: Self-Evaluation Tree Search for LLMs with Task Decomposition | Mengsong Wu et.al. | 2506.07557 | null |
| 2025-06-09 | Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions | Lu Ma et.al. | 2506.07527 | link |
| 2025-06-11 | MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models | Philip R. Liu et.al. | 2506.07400 | link |
| 2025-06-09 | Improving LLM Reasoning through Interpretable Role-Playing Steering | Anyi Wang et.al. | 2506.07335 | null |
| 2025-06-08 | Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs | Roy Eisenstadt et.al. | 2506.07240 | null |
| 2025-06-08 | Advancing Multimodal Reasoning Capabilities of Multimodal Large Language Models via Visual Perception Reward | Tong Xiao et.al. | 2506.07218 | null |
| 2025-06-08 | Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs | Wenrui Zhou et.al. | 2506.07180 | null |
| 2025-06-08 | Learning Compact Vision Tokens for Efficient Large Multimodal Models | Hao Tang et.al. | 2506.07138 | link |
| 2025-06-08 | Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models | Samir Abdaljalil et.al. | 2506.07106 | null |
| 2025-06-12 | Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation | Jaechul Roh et.al. | 2506.06971 | link |
| 2025-06-07 | Boosting LLM Reasoning via Spontaneous Self-Correction | Xutong Zhao et.al. | 2506.06923 | null |
| 2025-06-07 | Harnessing Vision-Language Models for Time Series Anomaly Detection | Zelin He et.al. | 2506.06836 | null |
| 2025-06-07 | VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs | Can Li et.al. | 2506.06727 | null |
| 2025-06-07 | Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning | Shubham Parashar et.al. | 2506.06632 | null |
| 2025-06-14 | RARL: Improving Medical VLM Reasoning and Generalization with Reinforcement Learning and LoRA under Data and Hardware Constraints | Tan-Hanh Pham et.al. | 2506.06600 | null |
| 2025-06-06 | SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation | Yanwei Ren et.al. | 2506.06470 | null |
| 2025-06-06 | Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance | Ruizhong Qiu et.al. | 2506.06444 | link |
| 2025-06-06 | PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts | Hengzhi Li et.al. | 2506.06211 | null |
| 2025-06-06 | Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router | Chenyang Shao et.al. | 2506.05901 | null |
| 2025-06-06 | BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions | Saptarshi Sengupta et.al. | 2506.05766 | null |
| 2025-06-05 | MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning | Zikui Cai et.al. | 2506.05523 | null |
| 2025-06-05 | DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning | Tanmay Parekh et.al. | 2506.05128 | null |
| 2025-06-09 | Reason-to-Recommend: Using Interaction-of-Thought Reasoning to Enhance LLM Recommendation | Keyu Zhao et.al. | 2506.05069 | null |
| 2025-06-12 | Context Is Not Comprehension | Alex Pan et.al. | 2506.04907 | null |
| 2025-06-05 | ICPC-Eval: Probing the Frontiers of LLM Reasoning with Competitive Programming Contests | Shiyi Xu et.al. | 2506.04894 | link |
| 2025-06-10 | Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design | Lin Sun et.al. | 2506.04734 | null |
| 2025-06-05 | Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation | Yuyang Wanyan et.al. | 2506.04614 | null |
| 2025-06-05 | MuSciClaims: Multimodal Scientific Claim Verification | Yash Kumar Lal et.al. | 2506.04585 | null |
| 2025-06-04 | Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences | Hadi Hosseini et.al. | 2506.04478 | null |
| 2025-06-04 | RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought | Yi Lu et.al. | 2506.04277 | null |
| 2025-06-04 | Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning | Shuang Chen et.al. | 2506.04207 | null |
| 2025-06-04 | R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning | Qingfei Zhao et.al. | 2506.04185 | link |
| 2025-06-04 | MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos | Kejian Zhu et.al. | 2506.04141 | null |
| 2025-06-04 | Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM Reasoning | Junqi Gao et.al. | 2506.03939 | link |
| 2025-06-04 | Reason from Future: Reverse Thought Chain Enhances LLM Reasoning | Yinlong Xu et.al. | 2506.03673 | null |
| 2025-06-16 | Zero-Shot Temporal Interaction Localization for Egocentric Videos | Erhang Zhang et.al. | 2506.03662 | link |
| 2025-06-04 | MiMo-VL Technical Report | Xiaomi LLM-Core Team et.al. | 2506.03569 | link |
| 2025-06-04 | Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback | Xiaoying Zhang et.al. | 2506.03106 | null |
| 2025-06-04 | Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning | Chen Qian et.al. | 2506.02867 | link |
| 2025-06-14 | TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression | Zhong-Zhi Li et.al. | 2506.02678 | link |
| 2025-06-03 | A Smart Multimodal Healthcare Copilot with Powerful LLM Reasoning | Xuejiao Zhao et.al. | 2506.02470 | link |
| 2025-06-02 | Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts | Haizhong Zheng et.al. | 2506.02177 | null |
| 2025-06-02 | Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains | Juncheng Wu et.al. | 2506.02126 | null |
| 2025-06-02 | Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning | Shenzhi Wang et.al. | 2506.01939 | null |
| 2025-06-02 | Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books | Chen Zhang et.al. | 2506.01796 | null |
| 2025-06-02 | R2SM: Referring and Reasoning for Selective Masks | Yu-Lin Shih et.al. | 2506.01795 | null |
| 2025-06-02 | SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning | Zhongwei Wan et.al. | 2506.01713 | null |
| 2025-06-02 | K12Vista: Exploring the Boundaries of MLLMs in K-12 Education | Chong Li et.al. | 2506.01676 | null |
| 2025-06-02 | EvolveNav: Self-Improving Embodied Reasoning for LLM-Based Vision-Language Navigation | Bingqian Lin et.al. | 2506.01551 | null |
| 2025-06-02 | Compiler Optimization via LLM Reasoning for Efficient Model Serving | Sujun Tang et.al. | 2506.01374 | null |
| 2025-06-02 | The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning | Xinyu Zhu et.al. | 2506.01347 | link |
| 2025-06-01 | GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking | Yufei Zhan et.al. | 2506.01078 | link |
| 2025-06-01 | Enhancing LLM Reasoning for Time Series Classification by Tailored Thinking and Fused Decision | Jiahui Zhou et.al. | 2506.00807 | null |
| 2025-05-31 | Beyond Context to Cognitive Appraisal: Emotion Reasoning as a Theory of Mind Benchmark for Large Language Models | Gerard Christopher Yeo et.al. | 2506.00334 | null |
| 2025-05-30 | Tournament of Prompts: Evolving LLM Instructions Through Structured Debates and Elo Ratings | Anirudh Nair et.al. | 2506.00178 | null |
| 2025-05-30 | Werewolf: A Straightforward Game Framework with TTS for Improved User Engagement | Qihui Fan et.al. | 2506.00160 | null |
| 2025-05-28 | Rethinking Hybrid Retrieval: When Small Embeddings and LLM Re-ranking Beat Bigger Models | Arjun Rao et.al. | 2506.00049 | null |
| 2025-05-30 | Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents | Yaxin Luo et.al. | 2505.24878 | link |
| 2025-05-30 | Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks | Tajamul Ashraf et.al. | 2505.24876 | link |
| 2025-05-30 | Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning | Shuyao Xu et.al. | 2505.24850 | link |
| 2025-05-30 | Random Rule Forest (RRF): Interpretable Ensembles of LLM-Generated Questions for Predicting Startup Success | Ben Griffin et.al. | 2505.24622 | null |
| 2025-06-10 | Can Slow-thinking LLMs Reason Over Time? Empirical Studies in Time Series Forecasting | Jiahao Wang et.al. | 2505.24511 | link |
| 2025-05-30 | Reason-SVG: Hybrid Reward RL for Aha-Moments in Vector Graphics Generation | Ximing Xing et.al. | 2505.24499 | null |
| 2025-05-30 | How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning | Hongyi James Cai et.al. | 2505.24273 | null |
| 2025-06-02 | MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM | Bowen Dong et.al. | 2505.24238 | null |
| 2025-05-30 | Semi-structured LLM Reasoners Can Be Rigorously Audited | Jixuan Leng et.al. | 2505.24217 | null |
| 2025-05-30 | HardTests: Synthesizing High-Quality Test Cases for LLM Coding | Zhongmou He et.al. | 2505.24098 | null |
| 2025-05-29 | Preemptive Hallucination Reduction: An Input-Level Approach for Multimodal Language Model | Nokimul Hasan Arif et.al. | 2505.24007 | null |
| 2025-05-29 | VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL | Yichen Feng et.al. | 2505.23977 | null |
| 2025-05-29 | Infi-Med: Low-Resource Medical MLLMs with Robust Reasoning Evaluation | Zeyu Liu et.al. | 2505.23867 | null |
| 2025-05-29 | Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought | Yunze Man et.al. | 2505.23766 | null |
| 2025-06-03 | DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning | Ziyin Zhang et.al. | 2505.23754 | link |
| 2025-05-29 | Don't Take the Premise for Granted: Evaluating the Premise Critique Ability of Large Language Models | Jinzhe Li et.al. | 2505.23715 | link |
| 2025-05-29 | Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation | Ziling Cheng et.al. | 2505.23701 | null |
| 2025-05-29 | Probability-Consistent Preference Optimization for Enhanced LLM Reasoning | Yunqiao Yang et.al. | 2505.23540 | link |
| 2025-05-29 | Diversity-Aware Policy Optimization for Large Language Model Reasoning | Jian Yao et.al. | 2505.23433 | null |
| 2025-05-29 | GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning | Jusheng Zhang et.al. | 2505.23399 | null |
| 2025-06-05 | MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence Calibration | Zhitao He et.al. | 2505.23224 | link |
| 2025-05-29 | Elicit and Enhance: Advancing Multimodal Reasoning in Medical Scenarios | Linjie Mu et.al. | 2505.23118 | null |
| 2025-06-06 | Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models | Zeyu Liu et.al. | 2505.23091 | null |
| 2025-05-29 | Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction | Guangyi Liu et.al. | 2505.23034 | null |
| 2025-05-29 | StrucSum: Graph-Structured Reasoning for Long Document Extractive Summarization with LLMs | Haohan Yuan et.al. | 2505.22950 | null |
| 2025-05-28 | VidText: Towards Comprehensive Evaluation for Video Text Understanding | Zhoufaran Yang et.al. | 2505.22810 | link |
| 2025-05-28 | Decomposing Elements of Problem Solving: What "Math" Does RL Teach? | Tian Qin et.al. | 2505.22756 | link |
| 2025-05-28 | AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models | Feng Luo et.al. | 2505.22662 | null |
| 2025-05-28 | SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning | Jiaqi Huang et.al. | 2505.22596 | null |
| 2025-05-28 | ClaimPKG: Enhancing Claim Verification via Pseudo-Subgraph Generation with Lightweight Specialized LLM | Hoang Pham et.al. | 2505.22552 | null |
| 2025-05-28 | Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO | Lai Wei et.al. | 2505.22453 | link |
| 2025-05-29 | Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition | Hanting Chen et.al. | 2505.22375 | null |
| 2025-05-28 | Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start | Lai Wei et.al. | 2505.22334 | link |
| 2025-05-28 | If Pigs Could Fly... Can LLMs Logically Reason Through Counterfactuals? | Ishwar B Balappanawar et.al. | 2505.22318 | null |
| 2025-05-28 | Rethinking the Unsolvable: When In-Context Search Meets Test-Time Scaling | Fanzeng Xia et.al. | 2505.22290 | null |
| 2025-05-28 | What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning | Gangwei Jiang et.al. | 2505.22148 | null |
| 2025-05-28 | OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning | Shifang Zhao et.al. | 2505.22039 | null |
| 2025-05-27 | Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation | Tharindu Kumarage et.al. | 2505.21784 | null |
| 2025-05-27 | Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models | Sohyun An et.al. | 2505.21765 | null |
| 2025-05-27 | R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing | Tianyu Fu et.al. | 2505.21600 | link |
| 2025-05-31 | More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models | Chengzhi Liu et.al. | 2505.21523 | null |
| 2025-05-27 | Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning? | Junhao Cheng et.al. | 2505.21374 | link |
| 2025-05-27 | MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs | Jiakang Yuan et.al. | 2505.21327 | null |
| 2025-05-27 | Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning | Mingyang Song et.al. | 2505.21178 | null |
| 2025-05-27 | DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response | Junjue Wang et.al. | 2505.21089 | null |
| 2025-06-04 | LLMs Think, But Not In Your Flow: Reasoning-Level Personalization for Black-Box Large Language Models | Jieyong Kim et.al. | 2505.21082 | null |
| 2025-05-27 | Def-DTS: Deductive Reasoning for Open-domain Dialogue Topic Segmentation | Seungmin Lee et.al. | 2505.21033 | null |
| 2025-05-27 | Reason-Align-Respond: Aligning LLM Reasoning with Knowledge Graphs for KGQA | Xiangqing Shen et.al. | 2505.20971 | null |
| 2025-05-28 | VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models | Kui Wu et.al. | 2505.20718 | null |
| 2025-05-27 | Accelerating RL for LLM Reasoning with Optimal Advantage Regression | Kianté Brantley et.al. | 2505.20686 | null |
| 2025-05-27 | Can Past Experience Accelerate LLM Reasoning? | Bo Pan et.al. | 2505.20643 | null |
| 2025-05-26 | Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning | Shenao Zhang et.al. | 2505.20561 | null |
| 2025-05-26 | Enhancing Logical Reasoning in Language Models via Symbolically-Guided Monte Carlo Process Supervision | Xingwei Tan et.al. | 2505.20415 | null |
| 2025-05-23 | Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence | Amirhosein Ghasemabadi et.al. | 2505.20325 | null |
| 2025-05-26 | KnowTrace: Bootstrapping Iterative Retrieval-Augmented Generation with Structured Knowledge Tracing | Rui Li et.al. | 2505.20245 | link |
| 2025-06-04 | DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning | Qi Cao et.al. | 2505.20241 | null |
| 2025-05-26 | THiNK: Can Large Language Models Think-aloud? | Yongan Yu et.al. | 2505.20184 | link |
| 2025-05-26 | Visual Abstract Thinking Empowers Multimodal Reasoning | Dairu Liu et.al. | 2505.20164 | link |
| 2025-05-26 | Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning | Jaehun Jung et.al. | 2505.20161 | null |
| 2025-05-26 | Agentic 3D Scene Generation with Spatially Contextualized VLMs | Xinhang Liu et.al. | 2505.20129 | null |
| 2025-05-26 | REARANK: Reasoning Re-ranking Agent via Reinforcement Learning | Le Zhang et.al. | 2505.20046 | link |
| 2025-05-26 | An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning | Andrew Zamai et.al. | 2505.19954 | null |
| 2025-05-26 | Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval | Rong-Cheng Tu et.al. | 2505.19952 | null |
| 2025-05-26 | Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions | Siqi Kou et.al. | 2505.19949 | null |
| 2025-05-26 | HS-STAR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation | Feng Xiong et.al. | 2505.19866 | null |
| 2025-05-26 | Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective | Junnan Liu et.al. | 2505.19815 | link |
| 2025-05-26 | MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval | Rong-Cheng Tu et.al. | 2505.19707 | null |
| 2025-05-26 | Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning | Minheng Ni et.al. | 2505.19702 | null |
| 2025-05-26 | Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models | Lachlan McGinness et.al. | 2505.19676 | null |
| 2025-05-26 | Interleaved Reasoning for Large Language Models via Reinforcement Learning | Roy Xie et.al. | 2505.19640 | null |
| 2025-05-26 | Self-Reflective Planning with Knowledge Graphs: Enhancing LLM Reasoning Reliability for Question Answering | Jiajun Zhu et.al. | 2505.19410 | null |
| 2025-05-25 | SituatedThinker: Grounding LLM Reasoning with Real-World through Situated Thinking | Junnan Liu et.al. | 2505.19300 | link |
| 2025-05-28 | VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use | Mingyuan Wu et.al. | 2505.19255 | null |
| 2025-05-25 | ASPO: Adaptive Sentence-Level Preference Optimization for Fine-Grained Multimodal Reasoning | Yeyuan Wang et.al. | 2505.19100 | null |
| 2025-05-30 | SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning | Kun Xiang et.al. | 2505.19099 | link |
| 2025-05-25 | SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards | Chuming Shen et.al. | 2505.19094 | link |
| 2025-05-25 | ReFineVLA: Reasoning-Aware Teacher-Guided Transfer Fine-Tuning | Tuan Van Vo et.al. | 2505.19080 | null |
| 2025-05-25 | Can Large Language Models Infer Causal Relationships from Real-World Text? | Ryan Saklad et.al. | 2505.18931 | null |
| 2025-05-24 | Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation | Jiwan Chung et.al. | 2505.18842 | null |
| 2025-05-24 | Enhancing LLMs' Reasoning-Intensive Multimedia Search Capabilities through Fine-Tuning and Reinforcement Learning | Jinzheng Li et.al. | 2505.18831 | null |
| 2025-05-24 | How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark | Minglai Yang et.al. | 2505.18761 | link |
| 2025-05-24 | GainRAG: Preference Alignment in Retrieval-Augmented Generation through Gain Signal Synthesis | Yi Jiang et.al. | 2505.18710 | link |
| 2025-05-24 | Steering LLM Reasoning Through Bias-Only Adaptation | Viacheslav Sinii et.al. | 2505.18706 | null |
| 2025-05-31 | ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation | Zhen Li et.al. | 2505.18668 | link |
| 2025-05-24 | Unraveling Misinformation Propagation in LLM Reasoning | Yiyang Feng et.al. | 2505.18555 | link |
| 2025-05-23 | One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration | Jinbang Huang et.al. | 2505.18382 | null |
| 2025-05-23 | Seeing Beyond Words: MatVQA for Challenging Visual-Scientific Reasoning in Materials Science | Sifan Wu et.al. | 2505.18319 | null |
| 2025-05-23 | Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL | Che Liu et.al. | 2505.17952 | null |
| 2025-05-23 | Stepwise Reasoning Checkpoint Analysis: A Test Time Scaling Method to Enhance LLMs' Reasoning | Zezhong Wang et.al. | 2505.17829 | null |
| 2025-05-23 | Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning | Michael Hassid et.al. | 2505.17813 | null |
| 2025-05-23 | Towards General Continuous Memory for Vision-Language Models | Wenyi Wu et.al. | 2505.17670 | null |
| 2025-05-23 | EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications | Ancheng Xu et.al. | 2505.17654 | null |
| 2025-05-29 | Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective | Deyang Kong et.al. | 2505.17652 | null |
| 2025-05-27 | Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration | Jingtong Gao et.al. | 2505.17621 | null |
| 2025-05-23 | MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation | Jihan Yao et.al. | 2505.17613 | null |
| 2025-05-23 | On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning | Yifan Zhang et.al. | 2505.17508 | null |
| 2025-05-23 | From Reasoning to Generalization: Knowledge-Augmented LLMs for ARC Benchmark | Chao Lei et.al. | 2505.17482 | null |
| 2025-05-23 | Hydra: Structured Cross-Source Enhanced Large Language Model Reasoning | Xingyu Tan et.al. | 2505.17464 | null |
| 2025-05-23 | LeTS: Learning to Think-and-Search via Process-and-Outcome Reward Hybridization | Qi Zhang et.al. | 2505.17447 | null |
| 2025-05-23 | Misaligning Reasoning with Answers -- A Framework for Assessing LLM CoT Robustness | Enyi Jiang et.al. | 2505.17406 | null |
| 2025-05-22 | LiloDriver: A Lifelong Learning Framework for Closed-loop Motion Planning in Long-tail Autonomous Driving Scenarios | Huaiyuan Yao et.al. | 2505.17209 | link |
| 2025-05-21 | NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation | Weiming Wu et.al. | 2505.17121 | null |
| 2025-05-21 | Systematic Evaluation of Machine-Generated Reasoning and PHQ-9 Labeling for Depression Detection Using Large Language Models | Zongru Shao et.al. | 2505.17119 | null |
| 2025-05-21 | Swarm Intelligence Enhanced Reasoning: A Density-Driven Framework for LLM-Based Multi-Agent Optimization | Ying Zhu et.al. | 2505.17115 | null |
| 2025-05-21 | CAMA: Enhancing Multimodal In-Context Learning with Context-Aware Modulated Attention | Yanshu Li et.al. | 2505.17097 | null |
| 2025-05-22 | ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark | Sara Ghaboura et.al. | 2505.17021 | link |
| 2025-05-22 | SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward | Kaixuan Fan et.al. | 2505.17018 | link |
| 2025-05-22 | Runyang You et.al. | 2505.16994 | link | |
| 2025-05-22 | Don't "Overthink" Passage Reranking: Is Reasoning Truly Necessary? | Nour Jedidi et.al. | 2505.16886 | null |
| 2025-05-26 | DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation | Bowen Zheng et.al. | 2505.16810 | null |
| 2025-05-22 | Two-way Evidence self-Alignment based Dual-Gated Reasoning Enhancement | Kexin Zhang et.al. | 2505.16806 | null |
| 2025-05-22 | Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning | Xinghao Chen et.al. | 2505.16782 | link |
| 2025-05-22 | Collaboration among Multiple Large Language Models for Medical Question Answering | Kexin Shang et.al. | 2505.16648 | null |
| 2025-05-27 | Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains | Wenhui Tan et.al. | 2505.16552 | null |
| 2025-05-22 | SATURN: SAT-based Reinforcement Learning to Unleash Language Model Reasoning | Huanyu Liu et.al. | 2505.16368 | link |
| 2025-05-22 | EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning | Jiawei Liu et.al. | 2505.16312 | link |
| 2025-05-22 | Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA | Rishabh Maheshwary et.al. | 2505.16293 | null |
| 2025-05-22 | Training-Free Reasoning and Reflection in MLLMs | Hongchen Wei et.al. | 2505.16151 | null |
| 2025-05-22 | Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning | Shicheng Xu et.al. | 2505.16142 | null |
| 2025-05-26 | Abstractions-of-Thought: Intermediate Representations for LLM Reasoning in Hardware Design | Matthew DeLorenzo et.al. | 2505.15873 | null |
| 2025-05-21 | LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models | Ruilin Yao et.al. | 2505.15616 | null |
| 2025-05-21 | Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL | Xintong Zhang et.al. | 2505.15436 | null |
| 2025-05-21 | Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning | Yurun Yuan et.al. | 2505.15311 | null |
| 2025-05-21 | Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs | Jie Ma et.al. | 2505.15210 | link |
| 2025-05-21 | Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning | Jinghui Lu et.al. | 2505.15154 | null |
| 2025-05-21 | The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning | Shivam Agarwal et.al. | 2505.15134 | link |
| 2025-05-21 | Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision | Eric Hanchen Jiang et.al. | 2505.14999 | null |
| 2025-05-20 | Self-Evolving Curriculum for LLM Reasoning | Xiaoyin Chen et.al. | 2505.14970 | null |
| 2025-05-20 | MORALISE: A Structured Benchmark for Moral Alignment in Visual Language Models | Xiao Lin et.al. | 2505.14728 | null |
| 2025-05-18 | KGAlign: Joint Semantic-Structural Knowledge Encoding for Multimodal Fake News Detection | Tuan-Vinh La et.al. | 2505.14714 | link |
| 2025-05-23 | Emerging Properties in Unified Multimodal Pretraining | Chaorui Deng et.al. | 2505.14683 | null |
| 2025-05-27 | General-Reasoner: Advancing LLM Reasoning Across All Domains | Xueguang Ma et.al. | 2505.14652 | null |
| 2025-05-22 | TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning | Zhangchen Xu et.al. | 2505.14625 | link |
| 2025-05-20 | SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas | Anjiang Wei et.al. | 2505.14615 | null |
| 2025-05-21 | KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation | Jiajun Shi et.al. | 2505.14552 | link |
| 2025-05-23 | Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning | Zhaohui Yang et.al. | 2505.14403 | null |
| 2025-05-26 | DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning | Ziwei Zheng et.al. | 2505.14362 | link |
| 2025-05-20 | Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning | Minwu Kim et.al. | 2505.14216 | link |
| 2025-05-20 | RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning | Qianyue Hao et.al. | 2505.14140 | null |
| 2025-05-20 | Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning | Jingqi Tong et.al. | 2505.13886 | link |
| 2025-05-20 | Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning | Jiwon Song et.al. | 2505.13866 | link |
| 2025-05-18 | RAGXplain: From Explainable Evaluation to Actionable Guidance of RAG Pipelines | Dvir Cohen et.al. | 2505.13538 | null |
| 2025-05-16 | IRLBench: A Multi-modal, Culturally Grounded, Parallel Irish-English Benchmark for Open-Ended LLM Reasoning Evaluation | Khanh-Tung Tran et.al. | 2505.13498 | link |
| 2025-05-19 | MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision | Lingxiao Du et.al. | 2505.13427 | link |
| 2025-05-19 | MR. Judge: Multimodal Reasoner as a Judge | Renjie Pi et.al. | 2505.13403 | null |
| 2025-05-20 | Sense and Sensitivity: Examining the Influence of Semantic Recall on Long Context Code Reasoning | Adam Štorek et.al. | 2505.13353 | null |
| 2025-05-19 | Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately | Yuhang Wang et.al. | 2505.13326 | null |
| 2025-05-19 | Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space | Hengli Li et.al. | 2505.13308 | link |
| 2025-05-19 | RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning | Qiguang Chen et.al. | 2505.13307 | link |
| 2025-05-19 | Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning | Mingrui Chen et.al. | 2505.13261 | null |
| 2025-05-23 | SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information | Chih-Kai Yang et.al. | 2505.13237 | link |
| 2025-05-21 | Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model | Yong Ren et.al. | 2505.13062 | null |
| 2025-05-25 | Fractured Chain-of-Thought Reasoning | Baohao Liao et.al. | 2505.12992 | null |
| 2025-05-19 | DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management | Xuerui Su et.al. | 2505.12951 | null |
| 2025-05-19 | The Traitors: Deception and Trust in Multi-Agent Language Model Simulations | Pedro M. P. Curvo et.al. | 2505.12923 | link |
| 2025-05-19 | AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning | Kai Zhang et.al. | 2505.12782 | null |
| 2025-05-19 | Incentivizing Multimodal Reasoning in Large Models for Direct Robot Manipulation | Weiliang Tang et.al. | 2505.12744 | null |
| 2025-05-18 | Reasoning-CV: Fine-tuning Powerful Reasoning LLMs for Knowledge-Assisted Claim Verification | Zhi Zheng et.al. | 2505.12348 | link |
| 2025-05-18 | LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images? | Maoyuan Ye et.al. | 2505.12307 | link |
| 2025-05-18 | MMS-VPR: Multimodal Street-Level Visual Place Recognition Dataset and Benchmark | Yiwei Ou et.al. | 2505.12254 | null |
| 2025-05-17 | Do Code LLMs Do Static Analysis? | Chia-Yi Su et.al. | 2505.12118 | link |
| 2025-05-17 | Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier | Jianyuan Zhong et.al. | 2505.11966 | null |
| 2025-05-22 | PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging | Quoc-Huy Trinh et.al. | 2505.11872 | null |
| 2025-05-17 | Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning | Yansong Ning et.al. | 2505.11827 | link |
| 2025-05-16 | REMOR: Automated Peer Review Generation with LLM Reasoning and Multi-Objective Reinforcement Learning | Pawin Taechoyotin et.al. | 2505.11718 | null |
| 2025-05-16 | Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner | Wenchuan Zhang et.al. | 2505.11404 | link |
| 2025-05-23 | SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning | Zheng Li et.al. | 2505.11274 | null |
| 2025-05-24 | Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans | Yansheng Qiu et.al. | 2505.11141 | null |
| 2025-05-16 | Scaling Reasoning can Improve Factuality in Large Language Models | Mike Zhang et.al. | 2505.11140 | link |
| 2025-05-16 | Humans expect rationality and cooperation from LLM opponents in strategic games | Darija Barak et.al. | 2505.11011 | null |
| 2025-05-16 | Vaiage: A Multi-Agent Solution to Personalized Travel Planning | Binwen Liu et.al. | 2505.10922 | null |
| 2025-05-15 | Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning | Yoichi Ishibashi et.al. | 2505.10182 | null |
| 2025-05-15 | XRAG: Cross-lingual Retrieval-Augmented Generation | Wei Liu et.al. | 2505.10089 | null |
| 2025-05-13 | The Truth Becomes Clearer Through Debate! Multi-Agent Systems with Large Language Models Unmask Fake News | Yuhan Liu et.al. | 2505.08532 | null |
| 2025-05-13 | Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation | Enci Zhang et.al. | 2505.08364 | null |
| 2025-05-12 | KAQG: A Knowledge-Graph-Enhanced RAG for Difficulty-Controlled Question Generation | Ching Han Chen et.al. | 2505.07618 | null |
| 2025-05-12 | How well do LLMs reason over tabular data, really? | Cornelius Wolff et.al. | 2505.07453 | null |
| 2025-05-12 | Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning | Xiaokun Wang et.al. | 2505.07263 | null |
| 2025-05-12 | Critique Before Thinking: Mitigating Hallucination through Rationale-Augmented Instruction Tuning | Zexian Yang et.al. | 2505.07172 | null |
| 2025-05-11 | Seed1.5-VL Technical Report | Dong Guo et.al. | 2505.07062 | null |
| 2025-05-17 | Bridging AI and Carbon Capture: A Dataset for LLMs in Ionic Liquids and CBE Research | Gaurab Sarkar et.al. | 2505.06964 | link |
| 2025-05-11 | UniDiffGrasp: A Unified Framework Integrating VLM Reasoning and VLM-Guided Part Diffusion for Open-Vocabulary Constrained Grasping with Dual Arms | Xueyang Guo et.al. | 2505.06832 | null |
| 2025-05-11 | Overview of the NLPCC 2025 Shared Task 4: Multi-modal, Multilingual, and Multi-hop Medical Instructional Video Question Answering Challenge | Bin Li et.al. | 2505.06814 | null |
| 2025-05-10 | STRIVE: Structured Representation Integrating VLM Reasoning for Efficient Object Navigation | Haokun Zhu et.al. | 2505.06729 | null |
| 2025-05-17 | Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Representation Learning | Hang Gao et.al. | 2505.06321 | link |
| 2025-05-07 | Q-Heart: ECG Question Answering via Knowledge-Informed Multimodal LLMs | Hung Manh Pham et.al. | 2505.06296 | null |
| 2025-05-09 | From Millions of Tweets to Actionable Insights: Leveraging LLMs for User Profiling | Vahid Rahimzadeh et.al. | 2505.06184 | null |
| 2025-05-12 | APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning | Azim Ospanov et.al. | 2505.05758 | null |
| 2025-05-09 | Evolutionary thoughts: integration of large language models and evolutionary algorithms | Antonio Jimeno Yepes et.al. | 2505.05756 | link |
| 2025-05-08 | Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models | Yunxin Li et.al. | 2505.04921 | link |
| 2025-05-07 | Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers | Kusha Sareen et.al. | 2505.04842 | null |
| 2025-05-06 | Advancing Conversational Diagnostic AI with Multimodal Reasoning | Khaled Saab et.al. | 2505.04653 | null |
| 2025-05-07 | SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios | Ning Cheng et.al. | 2505.04201 | null |
| 2025-05-20 | On-Device LLM for Context-Aware Wi-Fi Roaming | Ju-Hyung Lee et.al. | 2505.04174 | link |
| 2025-05-06 | X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains | Qianchu Liu et.al. | 2505.03981 | null |
| 2025-04-30 | When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator | Md Fahim Anjum et.al. | 2505.03786 | link |
| 2025-05-06 | The Steganographic Potentials of Language Models | Artem Karpov et.al. | 2505.03439 | null |
| 2025-05-12 | Geospatial Mechanistic Interpretability of Large Language Models | Stef De Sabbata et.al. | 2505.03368 | link |
| 2025-05-03 | Accelerating Large Language Model Reasoning via Speculative Search | Zhihai Wang et.al. | 2505.02865 | null |
| 2025-05-05 | HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking | Runquan Gui et.al. | 2505.02322 | null |
| 2025-05-04 | DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving | Xinmeng Hou et.al. | 2505.02123 | link |
| 2025-05-04 | R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation | Meng-Hao Guo et.al. | 2505.02018 | null |
| 2025-05-02 | VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos | Zongxia Li et.al. | 2505.01481 | link |
| 2025-05-01 | Reasoning Capabilities and Invariability of Large Language Models | Alessandro Raganato et.al. | 2505.00776 | link |
| 2025-04-30 | Audo-Sight: Enabling Ambient Interaction For Blind And Visually Impaired Individuals | Bhanuja Ainary et.al. | 2505.00153 | null |
| 2025-05-02 | Rosetta-PL: Propositional Logic as a Benchmark for Large Language Model Reasoning | Shaun Baek et.al. | 2505.00001 | null |
| 2025-05-21 | Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models | Guanghao Zhou et.al. | 2504.21277 | null |
| 2025-05-09 | Token-Efficient RL for LLM Reasoning | Alan Lee et.al. | 2504.20834 | null |
| 2025-04-29 | Token-Efficient Prompt Injection Attack: Provoking Cessation in LLM Reasoning via Adaptive Token Compression | Yu Cui et.al. | 2504.20493 | null |
| 2025-04-30 | VideoMultiAgents: A Multi-Agent Framework for Video Question Answering | Noriyuki Kugo et.al. | 2504.20091 | link |
| 2025-04-28 | From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review | Mohamed Amine Ferrag et.al. | 2504.19678 | null |
| 2025-05-17 | SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning | Jiaqi Chen et.al. | 2504.19162 | null |
| 2025-04-27 | CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges | Yu Li et.al. | 2504.19093 | null |
| 2025-04-24 | Training Large Language Models to Reason via EM Policy Gradient | Tianbing Xu et.al. | 2504.18587 | null |
| 2025-05-08 | MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind | Zheng Zhang et.al. | 2504.18039 | null |
| 2025-05-13 | DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training | Xiaoyu Tian et.al. | 2504.17565 | null |
| 2025-04-25 | Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning | Chris et.al. | 2504.16656 | link |
| 2025-04-27 | Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL | Simone Papicchio et.al. | 2504.15077 | null |
| 2025-04-20 | a1: Steep Test-time Scaling Law via Environment Augmented Generation | Lingrui Mei et.al. | 2504.14597 | null |
| 2025-04-20 | CoLoTa: A Dataset for Entity-based Commonsense Reasoning over Long-Tail Knowledge | Armin Toroghi et.al. | 2504.14462 | null |
| 2025-04-19 | Improving RL Exploration for LLM Reasoning through Retrospective Replay | Shihan Dou et.al. | 2504.14363 | null |
| 2025-05-21 | An Empirical Study of LLM Reasoning Ability Under Strict Output Length Constraint | Yi Sun et.al. | 2504.14350 | null |
| 2025-04-22 | SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM | Xiaojiang Zhang et.al. | 2504.14286 | null |
| 2025-04-19 | CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations | Man Ho Lam et.al. | 2504.14119 | null |
| 2025-04-18 | Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods | Junlin Wang et.al. | 2504.14047 | null |
| 2025-03-26 | 3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark | Ivan Sviridov et.al. | 2504.13861 | link |
| 2025-05-16 | Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? | Yang Yue et.al. | 2504.13837 | null |
| 2025-04-18 | Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning | Jianing Wang et.al. | 2504.13500 | link |
| 2025-04-17 | Can LLMs reason over extended multilingual contexts? Towards long-context evaluation beyond retrieval and haystacks | Amey Hengle et.al. | 2504.12845 | null |
| 2025-05-19 | GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks | Hao Xu et.al. | 2504.12764 | link |
| 2025-04-17 | Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning | Baining Zhao et.al. | 2504.12680 | link |
| 2025-04-17 | VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization | Menglan Chen et.al. | 2504.12661 | null |
| 2025-04-24 | GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning | Liangyu Xu et.al. | 2504.12597 | null |
| 2025-04-13 | HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation | Pei Liu et.al. | 2504.12330 | link |
| 2025-04-16 | d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning | Siyan Zhao et.al. | 2504.12216 | null |
| 2025-04-16 | Could Thinking Multilingually Empower LLM Reasoning? | Changjiang Gao et.al. | 2504.11833 | link |
| 2025-04-15 | A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce | Wei Xiong et.al. | 2504.11343 | link |
| 2025-04-15 | MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique | Shuhang Liu et.al. | 2504.11009 | null |
| 2025-05-14 | CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives | Ayoung Lee et.al. | 2504.10823 | null |
| 2025-04-14 | Weight-of-Thought Reasoning: Exploring Neural Network Weights for Enhanced LLM Reasoning | Saif Punjwani et.al. | 2504.10646 | link |
| 2025-04-30 | VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge | Yueqi Song et.al. | 2504.10342 | null |
| 2025-04-14 | SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model | Zongcan Ding et.al. | 2504.10320 | null |
| 2025-04-14 | PRM-BAS: Enhancing Multimodal Reasoning through PRM-guided Beam Annealing Search | Pengfei Hu et.al. | 2504.10222 | null |
| 2025-04-15 | Breaking the Data Barrier -- Building GUI Agents Through Task Generalization | Junlei Zhang et.al. | 2504.10127 | link |
| 2025-04-14 | CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation | Jia Li et.al. | 2504.10046 | null |
| 2025-04-13 | Short-Path Prompting in LLMs: Analyzing Reasoning Instability and Solutions for Robust Performance | Zuoli Tang et.al. | 2504.09586 | null |
| 2025-04-13 | Draw with Thought: Unleashing Multimodal Reasoning for Scientific Diagram Generation | Zhiqing Cui et.al. | 2504.09479 | null |
| 2025-04-12 | NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding | Aniket Pal et.al. | 2504.09249 | null |
| 2025-04-12 | A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems | Zixuan Ke et.al. | 2504.09037 | null |
| 2025-04-11 | Mixed Signals: Decoding VLMs' Reasoning and Underlying Bias in Vision-Language Conflict | Pouya Pezeshkpour et.al. | 2504.08974 | null |
| 2025-05-08 | VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning | Haozhe Wang et.al. | 2504.08837 | null |
| 2025-04-06 | AdaptRec: A Self-Adaptive Framework for Sequential Recommendations with Large Language Models | Tong Zhang et.al. | 2504.08786 | null |
| 2025-04-01 | Accelerating Causal Network Discovery of Alzheimer Disease Biomarkers via Scientific Literature-based Retrieval Augmented Generation | Xiaofan Zhou et.al. | 2504.08768 | null |
| 2025-04-11 | Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning | Fangzhi Xu et.al. | 2504.08672 | link |
| 2025-04-11 | VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering | Qi Zhi Lim et.al. | 2504.08269 | null |
| 2025-04-15 | Kimi-VL Technical Report | Kimi Team et.al. | 2504.07491 | link |
| 2025-04-02 | DeepSeek-R1 Thoughtology: Let's about LLM Reasoning | Sara Vera Marjanović et.al. | 2504.07128 | null |
| 2025-04-09 | KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs | Elan Markowitz et.al. | 2504.07087 | null |
| 2025-04-09 | DeduCE: Deductive Consistency as a Framework to Evaluate LLM Reasoning | Atharva Pandey et.al. | 2504.07080 | null |
| 2025-04-09 | To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning | Tian Qin et.al. | 2504.07052 | null |
| 2025-04-09 | SCI-Reason: A Dataset with Chain-of-Thought Rationales for Complex Multimodal Reasoning in Academic Areas | Chenghao Ma et.al. | 2504.06637 | null |
| 2025-04-08 | FEABench: Evaluating Language Models on Multiphysics Reasoning Ability | Nayantara Mudur et.al. | 2504.06260 | link |
| 2025-04-23 | Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization | Qingyang Zhang et.al. | 2504.05812 | link |
| 2025-04-08 | MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models | Pengfei Zhou et.al. | 2504.05782 | link |
| 2025-04-08 | Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought | Yi Peng et.al. | 2504.05599 | null |
| 2025-04-06 | ZeroED: Hybrid Zero-shot Error Detection through Large Language Model Reasoning | Wei Ni et.al. | 2504.05345 | null |
| 2025-04-07 | Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning | Sugyeong Eo et.al. | 2504.05047 | null |
| 2025-04-07 | LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts | Yimu Wang et.al. | 2504.04653 | null |
| 2025-04-06 | Hierarchical Planning for Complex Tasks with Knowledge Graph-RAG and Symbolic Verification | Cristina Cornelio et.al. | 2504.04578 | null |
| 2025-04-06 | Planning Safety Trajectories with Dual-Phase, Physics-Informed, and Transportation Knowledge-Driven Large Language Models | Rui Gan et.al. | 2504.04562 | link |
| 2025-04-06 | Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning | Xuerui Su et.al. | 2504.04524 | link |
| 2025-04-06 | Geo-OLM: Enabling Sustainable Earth Observation Studies with Cost-Efficient Open Language Models & State-Driven Workflows | Dimitrios Stamoulis et.al. | 2504.04319 | null |
| 2025-04-04 | Language Models Are Implicitly Continuous | Samuele Marro et.al. | 2504.03933 | link |
| 2025-04-04 | Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition | Rishi Hazra et.al. | 2504.03930 | null |
| 2025-04-07 | MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models | Wulin Xie et.al. | 2504.03641 | null |
| 2025-04-04 | Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) | Jing Bi et.al. | 2504.03151 | null |
| 2025-04-04 | LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph | Tu Ao et.al. | 2504.03137 | null |
| 2025-04-25 | Generative Evaluation of Complex Reasoning in Large Language Models | Haowei Lin et.al. | 2504.02810 | link |
| 2025-04-10 | Affordable AI Assistants with Knowledge Graph of Thoughts | Maciej Besta et.al. | 2504.02670 | null |
| 2025-04-03 | LexPam: Legal Procedure Awareness-Guided Mathematical Reasoning | Kepu Zhang et.al. | 2504.02590 | null |
| 2025-04-03 | AnesBench: Multi-Dimensional Evaluation of LLM Reasoning in Anesthesiology | Xiang Feng et.al. | 2504.02404 | link |
| 2025-04-02 | A Survey of Scaling in Large Language Model Reasoning | Zihan Chen et.al. | 2504.02181 | null |
| 2025-04-02 | Exploring LLM Reasoning Through Controlled Prompt Variations | Giannis Chatziveroglou et.al. | 2504.02111 | link |
| 2025-04-02 | Advancing AI-Scientist Understanding: Making LLM Think Like a Physicist with Interpretable Reasoning | Yinggan Xu et.al. | 2504.01911 | null |
| 2025-04-02 | TransientTables: Evaluating LLMs' Reasoning on Temporally Evolving Semi-structured Tables | Abhilash Shankarampeta et.al. | 2504.01879 | null |
| 2025-04-02 | Cross-Lingual Consistency: A Novel Inference Framework for Advancing Reasoning in Large Language Models | Zhiwei Yu et.al. | 2504.01857 | null |
| 2025-04-03 | GTR: Graph-Table-RAG for Cross-Table Question Answering | Jiaru Zou et.al. | 2504.01346 | null |
| 2025-04-01 | When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning | Nishad Singhi et.al. | 2504.01005 | null |
| 2025-04-01 | How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study | Yunjie Ji et.al. | 2504.00829 | null |
| 2025-04-02 | FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning | Jie Ma et.al. | 2504.00487 | link |
| 2025-04-01 | Agentic Multimodal AI for Hyperpersonalized B2B and B2C Advertising in Competitive Markets: An AI-Driven Competitive Advertising Framework | Sakhinana Sagar Srinivas et.al. | 2504.00338 | null |
| 2025-03-31 | Do Large Language Models Exhibit Spontaneous Rational Deception? | Samuel M. Taylor et.al. | 2504.00285 | null |
| 2025-03-31 | SVLA: A Unified Speech-Vision-Language Assistant with Multimodal Reasoning and Speech Generation | Ngoc Dung Huynh et.al. | 2503.24164 | null |
| 2025-03-31 | Boosting MLLM Reasoning with Text-Debiased Hint-GRPO | Qihan Huang et.al. | 2503.23905 | null |
| 2025-03-31 | WinoWhat: A Parallel Corpus of Paraphrased WinoGrande Sentences with Common Sense Categorization | Ine Gevers et.al. | 2503.23779 | null |
| 2025-03-30 | Evolutionary Prompt Optimization Discovers Emergent Multimodal Reasoning Strategies in Vision-Language Models | Sid Bharthulwar et.al. | 2503.23503 | null |
| 2025-03-29 | The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction | Yihuai Hong et.al. | 2503.23084 | null |
| 2025-04-03 | Cognitive Prompts Using Guilford's Structure of Intellect Model | Oliver Kramer et.al. | 2503.22036 | null |
| 2025-03-27 | SWI: Speaking with Intent in Large Language Models | Yuwei Yin et.al. | 2503.21544 | link |
| 2025-03-27 | Cultivating Game Sense for Yourself: Making VLMs Gaming Experts | Wenxuan Lu et.al. | 2503.21263 | null |
| 2025-03-27 | Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning | Huajie Tan et.al. | 2503.20752 | null |
| 2025-03-26 | Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging | Han Wu et.al. | 2503.20641 | link |
| 2025-03-25 | Gemini Robotics: Bringing AI into the Physical World | Gemini Robotics Team et.al. | 2503.20020 | null |
| 2025-03-25 | VisualQuest: A Diverse Image Dataset for Evaluating Visual Recognition in LLMs | Kelaiti Xiao et.al. | 2503.19936 | null |
| 2025-04-06 | A Multi-Agent Framework Integrating Large Language Models and Generative AI for Accelerated Metamaterial Design | Jie Tian et.al. | 2503.19889 | null |
| 2025-03-25 | Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking | Xiaoyu Tian et.al. | 2503.19855 | null |
| 2025-03-24 | Training-Free Personalization via Retrieval and Reasoning on Fingerprints | Deepayan Das et.al. | 2503.18623 | null |
| 2025-03-23 | Mind with Eyes: from Language Reasoning to Multimodal Reasoning | Zhiyu Lin et.al. | 2503.18071 | null |
| 2025-04-19 | Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning | Chenyu Zhang et.al. | 2503.17987 | null |
| 2025-03-23 | MedPlan:A Two-Stage RAG-Based System for Personalized Medical Plan Generation | Hsin-Ling Hsu et.al. | 2503.17900 | null |
| 2025-03-22 | A Modular Dataset to Demonstrate LLM Abstraction Capability | Adam Atanas et.al. | 2503.17645 | null |
| 2025-03-22 | ConSol: Sequential Probability Ratio Testing to Find Consistent LLM Reasoning Paths Efficiently | Jaeyeon Lee et.al. | 2503.17587 | link |
| 2025-03-21 | LEMMA: Learning from Errors for MatheMatical Advancement in LLMs | Zhuoshi Pan et.al. | 2503.17439 | link |
| 2025-03-21 | V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms | Javier J. Poveda Rodrigo et.al. | 2503.17422 | null |
| 2025-03-21 | Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique | Yansi Li et.al. | 2503.17363 | null |
| 2025-03-21 | OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement | Yihe Deng et.al. | 2503.17352 | link |
| 2025-03-21 | LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language | Kun Chu et.al. | 2503.17309 | link |
| 2025-03-21 | Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study | Li Zhang et.al. | 2503.16788 | link |
| 2025-03-20 | Towards Agentic Recommender Systems in the Era of Multimodal Large Language Models | Chengkai Huang et.al. | 2503.16734 | null |
| 2025-03-21 | MKG-Rank: Enhancing Large Language Models with Knowledge Graph for Multilingual Medical Question Answering | Feiyang Li et.al. | 2503.16131 | null |
| 2025-03-20 | Entropy-based Exploration Conduction for Multi-step Reasoning | Jinghan Zhang et.al. | 2503.15848 | null |
| 2025-03-19 | LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning | Federico Cocchi et.al. | 2503.15621 | link |
| 2025-03-19 | EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models | Yinan Liang et.al. | 2503.15369 | null |
| 2025-04-01 | Envisioning an AI-Enhanced Mental Health Ecosystem | Kellie Yu Hui Sim et.al. | 2503.14883 | null |
| 2025-03-19 | Think Like Human Developers: Harnessing Community Knowledge for Structured Code Reasoning | Chengran Yang et.al. | 2503.14838 | null |
| 2025-03-18 | Temporal Consistency for LLM Reasoning Process Error Identification | Jiacheng Guo et.al. | 2503.14495 | link |
| 2025-03-21 | Bridging Social Psychology and LLM Reasoning: Conflict-Aware Meta-Review Generation via Cognitive Alignment | Wei Chen et.al. | 2503.13879 | null |
| 2025-03-18 | Empowering GraphRAG with Knowledge Filtering and Integration | Kai Guo et.al. | 2503.13804 | null |
| 2025-03-15 | Cognitive Activation and Chaotic Dynamics in Large Language Models: A Quasi-Lyapunov Analysis of Reasoning Mechanisms | Xiaojian Li et.al. | 2503.13530 | null |
| 2025-03-14 | RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration | Hong Qing Yu et.al. | 2503.13514 | null |
| 2025-03-17 | A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives | Weiqiang Jin et.al. | 2503.13415 | null |
| 2025-03-17 | MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research | James Burgess et.al. | 2503.13399 | link |
| 2025-03-17 | Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning | Hai-Long Sun et.al. | 2503.13360 | null |
| 2025-03-17 | Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning | Junming Liu et.al. | 2503.12972 | null |
| 2025-03-17 | R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization | Jingyi Zhang et.al. | 2503.12937 | link |
| 2025-03-28 | Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation | Songjun Tu et.al. | 2503.12854 | link |
| 2025-03-18 | DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding | Xinyu Ma et.al. | 2503.12797 | link |
| 2025-03-16 | MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification | Zhaopan Xu et.al. | 2503.12505 | null |
| 2025-03-31 | Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition | Xiaoying Zhang et.al. | 2503.12303 | link |
| 2025-03-20 | Applications of Large Language Model Reasoning in Feature Generation | Dharani Chandra et.al. | 2503.11989 | null |
| 2025-03-14 | Neutralizing Bias in LLM Reasoning using Entailment Graphs | Liang Cheng et.al. | 2503.11614 | link |
| 2025-03-14 | VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity | Jing Bi et.al. | 2503.11557 | null |
| 2025-03-14 | RESPONSE: Benchmarking the Ability of Language Models to Undertake Commonsense Reasoning in Crisis Situation | Aissatou Diallo et.al. | 2503.11348 | null |
| 2025-03-13 | Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data | Paul Quinlan et.al. | 2503.10883 | null |
| 2025-03-18 | R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization | Yi Yang et.al. | 2503.10615 | link |
| 2025-03-15 | VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search | Yiming Jia et.al. | 2503.10582 | null |
| 2025-03-13 | VisualPRM: An Effective Process Reward Model for Multimodal Reasoning | Weiyun Wang et.al. | 2503.10291 | null |
| 2025-03-18 | "Well, Keep Thinking": Enhancing LLM Reasoning with Adaptive Injection Decoding | Hyunbin Jin et.al. | 2503.10167 | null |
| 2025-03-13 | How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game | Ziyue Wang et.al. | 2503.10042 | link |
| 2025-04-08 | Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning | Bowen Jin et.al. | 2503.09516 | link |
| 2025-03-12 | MindGYM: Enhancing Vision-Language Models via Synthetic Self-Challenging Questions | Zhe Xu et.al. | 2503.09499 | link |
| 2025-03-12 | A Survey on Enhancing Causal Reasoning Ability of Large Language Models | Xin Li et.al. | 2503.09326 | null |
| 2025-03-11 | Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework | Zhuo Zhi et.al. | 2503.08308 | null |
| 2025-03-11 | FASIONAD++ : Integrating High-Level Instruction and Information Bottleneck in FAt-Slow fusION Systems for Enhanced Safety in Autonomous Driving with Adaptive Feedback | Kangan Qian et.al. | 2503.08162 | null |
| 2025-03-05 | An Optimization Algorithm for Multimodal Data Alignment | Wei Zhang et.al. | 2503.07636 | null |
| 2025-03-11 | LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL | Yingzhe Peng et.al. | 2503.07536 | null |
| 2025-03-10 | MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning | Fanqing Meng et.al. | 2503.07365 | link |
| 2025-03-10 | Dynamic Path Navigation for Motion Agents with LLM Reasoning | Yubo Zhao et.al. | 2503.07323 | null |
| 2025-03-11 | Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models | Wenxuan Huang et.al. | 2503.06749 | link |
| 2025-03-09 | Graph Retrieval-Augmented LLM for Conversational Recommendation Systems | Zhangchi Qiu et.al. | 2503.06430 | null |
| 2025-03-08 | Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models? | Kun Xiang et.al. | 2503.06252 | link |
| 2025-03-15 | Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning | Yanjun Chen et.al. | 2503.06232 | null |
| 2025-03-08 | KnowLogic: A Benchmark for Commonsense Reasoning via Knowledge-Driven Data Synthesis | Weidong Zhan et.al. | 2503.06218 | link |
| 2025-03-07 | Extracting and Emulsifying Cultural Explanation to Improve Multilingual Capability of LLMs | Hamin Koo et.al. | 2503.05846 | null |
| 2025-03-07 | Memory-augmented Query Reconstruction for LLM-based Knowledge Graph Reasoning | Mufan Xu et.al. | 2503.05193 | null |
| 2025-03-07 | Rewarding Curse: Analyze and Mitigate Reward Modeling Issues for LLM Reasoning | Jiachun Li et.al. | 2503.05188 | null |
| 2025-03-07 | Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching | Simon A. Aytes et.al. | 2503.05179 | link |
| 2025-03-10 | R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model | Hengguang Zhou et.al. | 2503.05132 | link |
| 2025-03-04 | Learning from Failures in Multi-Attempt Reinforcement Learning | Stephen Chung et.al. | 2503.04808 | null |
| 2025-03-15 | Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference | Thanh Le-Cong et.al. | 2503.04779 | null |
| 2025-03-06 | Better Process Supervision with Bi-directional Rewarding Signals | Wenxiang Chen et.al. | 2503.04618 | null |
| 2025-04-02 | SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning | Chen Li et.al. | 2503.04530 | null |
| 2025-03-07 | Question-Aware Gaussian Experts for Audio-Visual Question Answering | Hongyeob Kim et.al. | 2503.04459 | link |
| 2025-03-06 | Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English | Runtao Zhou et.al. | 2503.04099 | null |
| 2025-03-06 | ReasonGraph: Visualisation of Reasoning Paths | Zongqian Li et.al. | 2503.03979 | link |
| 2025-03-05 | Process-based Self-Rewarding Language Models | Shimao Zhang et.al. | 2503.03746 | link |
| 2025-03-05 | COSINT-Agent: A Knowledge-Driven Multimodal Agent for Chinese Open Source Intelligence | Wentao Li et.al. | 2503.03215 | null |
| 2025-03-04 | The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models | Ke Ji et.al. | 2503.02875 | null |
| 2025-03-04 | Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models | Zhifei Xie et.al. | 2503.02318 | null |
| 2025-03-04 | LLM-TabFlow: Synthetic Tabular Data Generation with Inter-column Logical Relationship Preservation | Yunbo Long et.al. | 2503.02161 | null |
| 2025-03-03 | CorrA: Leveraging Large Language Models for Dynamic Obstacle Avoidance of Autonomous Vehicles | Shanting Wang et.al. | 2503.02076 | null |
| 2025-03-03 | Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning | Wenjie Wu et.al. | 2503.01642 | null |
| 2025-03-03 | Pragmatic Inference Chain (PIC) Improving LLMs' Reasoning of Authentic Implicit Toxic Language | Xi Chen et.al. | 2503.01539 | null |
| 2025-03-03 | CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs | Artem Lykov et.al. | 2503.01378 | null |
| 2025-03-06 | SRAG: Structured Retrieval-Augmented Generation for Multi-Entity Question Answering over Wikipedia Graph | Teng Lin et.al. | 2503.01346 | null |
| 2025-03-03 | MINT: Multi-modal Chain of Thought in Unified Generative Models for Enhanced Image Generation | Yi Wang et.al. | 2503.01298 | null |
| 2025-02-28 | Personalized Causal Graph Reasoning for LLMs: A Case Study on Dietary Recommendations | Zhongqi Yang et.al. | 2503.00134 | null |
| 2025-02-28 | Contextualizing biological perturbation experiments through language | Menghua Wu et.al. | 2502.21290 | link |
| 2025-02-28 | Rectifying Belief Space via Unlearning to Harness LLMs' Reasoning | Ayana Niwa et.al. | 2502.20620 | null |
| 2025-02-27 | FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving | Guizhen Chen et.al. | 2502.20238 | link |
| 2025-02-27 | Collaborative Stance Detection via Small-Large Language Model Consistency Verification | Yu Yan et.al. | 2502.19954 | link |
| 2025-02-27 | Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models | Yuan Sui et.al. | 2502.19918 | null |
| 2025-02-27 | Order Doesn't Matter, But Reasoning Does: Training LLMs with Order-Centric Augmentation | Qianxi He et.al. | 2502.19907 | null |
| 2025-03-21 | Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention | Weiyan Shi et.al. | 2502.19877 | null |
| 2025-03-05 | Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning | Yanan Chen et.al. | 2502.19622 | null |
| 2025-02-26 | General Reasoning Requires Learning to Reason from the Get-go | Seungwook Han et.al. | 2502.19402 | null |
| 2025-02-26 | BIG-Bench Extra Hard | Mehran Kazemi et.al. | 2502.19187 | link |
| 2025-02-25 | Scalable Best-of-N Selection for Large Language Models via Self-Certainty | Zhewei Kang et.al. | 2502.18581 | link |
| 2025-02-25 | SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution | Yuxiang Wei et.al. | 2502.18449 | null |
| 2025-02-25 | Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning | Wenkai Yang et.al. | 2502.18080 | null |
| 2025-02-21 | Improving Value-based Process Verifier via Structural Prior Injection | Zetian Sun et.al. | 2502.17498 | null |
| 2025-02-24 | Making LLMs Reason? The Intermediate Language Problem in Neurosymbolic Approaches | Alexander Beiser et.al. | 2502.17216 | null |
| 2025-02-24 | Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI | Syed Abdul Gaffar Shakhadri et.al. | 2502.17092 | null |
| 2025-02-24 | Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology | Longchao Da et.al. | 2502.17026 | null |
| 2025-02-24 | All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark | Davide Testa et.al. | 2502.16989 | null |
| 2025-02-24 | AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models | Qin Zhu et.al. | 2502.16906 | link |
| 2025-02-24 | The Blessing of Reasoning: LLM-Based Contrastive Explanations in Black-Box Recommender Systems | Yuyan Wang et.al. | 2502.16759 | null |
| 2025-02-23 | Reasoning about Affordances: Causal and Compositional Reasoning in LLMs | Magnus F. Gjerde et.al. | 2502.16606 | null |
| 2025-02-22 | ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning | Shulin Huang et.al. | 2502.16268 | null |
| 2025-02-27 | Dynamic Parallel Tree Search for Efficient LLM Reasoning | Yifu Ding et.al. | 2502.16235 | null |
| 2025-02-22 | Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations | Chunyang Li et.al. | 2502.16169 | link |
| 2025-03-04 | Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models | Qianqi Yan et.al. | 2502.16033 | null |
| 2025-02-21 | MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use | Zaid Khan et.al. | 2502.15872 | null |
| 2025-02-21 | Do Multilingual LLMs Think In English? | Lisa Schut et.al. | 2502.15603 | null |
| 2025-02-21 | Evaluating Social Biases in LLM Reasoning | Xuyang Wu et.al. | 2502.15361 | null |
| 2025-02-21 | Stepwise Informativeness Search for Improving LLM Reasoning | Siyuan Wang et.al. | 2502.15335 | null |
| 2025-02-21 | Latent Factor Models Meets Instructions:Goal-conditioned Latent Factor Discovery without Task Supervision | Zhouhang Xie et.al. | 2502.15147 | null |
| 2025-02-19 | SIFT: Grounding LLM Reasoning in Contexts via Stickers | Zihao Zeng et.al. | 2502.14922 | link |
| 2025-02-18 | Think Inside the JSON: Reinforcement Strategy for Strict LLM Schema Adherence | Bhavik Agarwal et.al. | 2502.14905 | null |
| 2025-03-04 | Exploring Advanced Techniques for Visual Question Answering: A Comprehensive Comparison | Aiswarya Baby et.al. | 2502.14827 | null |
| 2025-02-20 | Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning | Tian Xie et.al. | 2502.14768 | link |
| 2025-02-19 | Enhancing LLM-Based Recommendations Through Personalized Reasoning | Jiahao Liu et.al. | 2502.13845 | link |
| 2025-02-19 | MCTS-KBQA: Monte Carlo Tree Search for Knowledge Base Question Answering | Guanming Xiong et.al. | 2502.13428 | null |
| 2025-02-19 | MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification | Linzhuang Sun et.al. | 2502.13383 | link |
| 2025-02-22 | Grounding LLM Reasoning with Knowledge Graphs | Alfonso Amayuelas et.al. | 2502.13247 | null |
| 2025-02-18 | Theorem Prover as a Judge for Synthetic Data Generation | Joshua Ong Jun Leang et.al. | 2502.13137 | null |
| 2025-02-18 | Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options | Lakshmi Nair et.al. | 2502.12929 | link |
| 2025-02-18 | S |
Ruotian Ma et.al. | 2502.12853 | link |
| 2025-02-18 | CutPaste&Find: Efficient Multimodal Hallucination Detector with Visual-aid Knowledge Base | Cong-Duy Nguyen et.al. | 2502.12591 | null |
| 2025-02-18 | Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights | Shubham Parashar et.al. | 2502.12521 | null |
| 2025-02-18 | HopRAG: Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation | Hao Liu et.al. | 2502.12442 | null |
| 2025-02-17 | Evaluating Step-by-step Reasoning Traces: A Survey | Jinu Lee et.al. | 2502.12289 | null |
| 2025-02-17 | SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs | Yige Xu et.al. | 2502.12134 | link |
| 2025-02-17 | TokenSkip: Controllable Chain-of-Thought Compression in LLMs | Heming Xia et.al. | 2502.12067 | link |
| 2025-02-17 | Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models | Hyunwoo Kim et.al. | 2502.11881 | null |
| 2025-02-17 | Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities | Hanbin Wang et.al. | 2502.11829 | link |
| 2025-02-17 | Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning | Yuqi Pang et.al. | 2502.11751 | link |
| 2025-02-17 | DeFiScope: Detecting Various DeFi Price Manipulations with LLM Reasoning | Juantao Zhong et.al. | 2502.11521 | null |
| 2025-02-16 | Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls | Ante Wang et.al. | 2502.11183 | link |
| 2025-02-16 | LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning | Tianshi Zheng et.al. | 2502.11176 | null |
| 2025-02-15 | A Tutorial on LLM Reasoning: Relevant Methods behind ChatGPT o1 | Jun Wang et.al. | 2502.10867 | null |
| 2025-02-28 | USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions | Hamed Rahimi et.al. | 2502.10636 | null |
| 2025-02-14 | Do Large Language Models Reason Causally Like Us? Even Better? | Hanna M. Dettki et.al. | 2502.10215 | null |
| 2025-02-14 | MathConstruct: Challenging LLM Reasoning with Constructive Proofs | Mislav Balunović et.al. | 2502.10197 | null |
| 2025-02-13 | MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency | Dongzhi Jiang et.al. | 2502.09621 | null |
| 2025-02-14 | EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges | Clinton J. Wang et.al. | 2502.08859 | null |
| 2025-02-11 | CIRCUIT: A Benchmark for Circuit Interpretation and Reasoning Capabilities of LLMs | Lejla Skelic et.al. | 2502.07980 | null |
| 2025-02-05 | Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment | Cheryl Li et.al. | 2502.07803 | null |
| 2025-02-17 | Bag of Tricks for Inference-time Computation of LLM Reasoning | Fan Liu et.al. | 2502.07191 | link |
| 2025-02-15 | Self-Supervised Prompt Optimization | Jinyu Xiang et.al. | 2502.06855 | link |
| 2025-02-06 | Vision-Integrated LLMs for Autonomous Driving Assistance : Human Performance Comparison and Trust Evaluation | Namhee Kim et.al. | 2502.06843 | null |
| 2025-02-04 | Policy Guided Tree Search for Enhanced LLM Reasoning | Yang Li et.al. | 2502.06813 | null |
| 2025-03-11 | ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates | Ling Yang et.al. | 2502.06772 | link |
| 2025-02-10 | Resurrecting saturated LLM benchmarks with adversarial encoding | Igor Ivanov et.al. | 2502.06738 | null |
| 2025-02-13 | LawGPT: Knowledge-Guided Data Generation and Its Application to Legal LLM | Zhi Zhou et.al. | 2502.06572 | link |
| 2025-02-09 | A Generative Framework for Bidirectional Image-Report Understanding in Chest Radiography | Nicholas Evans et.al. | 2502.05926 | null |
| 2025-02-08 | Evaluating Vision-Language Models for Emotion Recognition | Sree Bhattacharyya et.al. | 2502.05660 | null |
| 2025-02-07 | GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity? | Yang Zhou et.al. | 2502.05252 | link |
| 2025-02-07 | Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures | Tushar Pandey et.al. | 2502.05078 | link |
| 2025-02-07 | Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research | Junde Wu et.al. | 2502.04644 | link |
| 2025-02-05 | Enhancing Reasoning to Adapt Large Language Models for Domain-Specific Applications | Bo Wen et.al. | 2502.04384 | link |
| 2025-02-05 | Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning | Jonathan Kim et.al. | 2502.04381 | null |
| 2025-02-04 | Investigating the Robustness of Deductive Reasoning with Large Language Models | Fabian Hoppe et.al. | 2502.04352 | null |
| 2025-02-04 | Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search | Maohao Shen et.al. | 2502.02508 | null |
| 2025-02-04 | CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning | Jianfeng Pan et.al. | 2502.02390 | null |
| 2025-02-08 | Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking | Jinyang Wu et.al. | 2502.02339 | null |
| 2025-02-04 | Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration | Younan Zhu et.al. | 2502.01969 | null |
| 2025-01-31 | Improving Rule-based Reasoning in LLMs via Neurosymbolic Representations | Varun Dhanraj et.al. | 2502.01657 | null |
| 2025-02-03 | Position: Empowering Time Series Reasoning with Multimodal LLMs | Yaxuan Kong et.al. | 2502.01477 | null |
| 2025-02-03 | ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning | Bill Yuchen Lin et.al. | 2502.01100 | null |
| 2025-02-16 | Learning Autonomous Code Integration for Math Language Models | Haozhe Wang et.al. | 2502.00691 | null |
| 2025-02-13 | Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning | Zhi Zhou et.al. | 2502.00511 | null |
| 2025-02-14 | Reward-Guided Speculative Decoding for Efficient LLM Reasoning | Baohao Liao et.al. | 2501.19324 | null |
| 2025-01-31 | Efficient Reasoning with Hidden Thinking | Xuan Shen et.al. | 2501.19201 | link |
| 2025-01-31 | BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning | Han Zhong et.al. | 2501.18858 | null |
| 2025-01-28 | A Stochastic Dynamical Theory of LLM Self-Adversariality: Modeling Severity Drift as a Critical Process | Jack David Carson et.al. | 2501.16783 | null |
| 2025-01-27 | Explaining GitHub Actions Failures with Large Language Models: Challenges, Insights, and Limitations | Pablo Valenzuela-Toledo et.al. | 2501.16495 | null |
| 2025-01-27 | Large Models in Dialogue for Active Perception and Anomaly Detection | Tzoulio Chamiti et.al. | 2501.16300 | link |
| 2025-01-26 | TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs | Yuxuan Gu et.al. | 2501.15674 | link |
| 2025-01-28 | Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning | Zeyu Gan et.al. | 2501.15602 | link |
| 2025-01-26 | Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework | Yuhong Sun et.al. | 2501.15581 | null |
| 2025-02-15 | Option-ID Based Elimination For Multiple Choice Questions | Zhenhao Zhu et.al. | 2501.15175 | link |
| 2025-01-24 | Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains | Xu Chu et.al. | 2501.14431 | null |
| 2025-02-12 | GraphSOS: Graph Sampling and Order Selection to Help LLMs Understand Graphs Better | Xu Chu et.al. | 2501.14427 | null |
| 2025-01-23 | Pseudocode-Injection Magic: Enabling LLMs to Tackle Graph Computational Tasks | Chang Gong et.al. | 2501.13731 | null |
| 2025-02-10 | Cognitive Paradigms for Evaluating VLMs on Visual Reasoning Task | Mohit Vaishnav et.al. | 2501.13620 | null |
| 2025-01-22 | EvidenceMap: Unleashing the Power of Small Language Models with Evidence Analysis for Biomedical Question Answering | Chang Zong et.al. | 2501.12746 | null |
| 2025-01-17 | LLM Reasoner and Automated Planner: A new NPC approach | Israel Puerta-Merino et.al. | 2501.10106 | null |
| 2025-01-22 | FRAG: A Flexible Modular Framework for Retrieval-Augmented Generation based on Knowledge Graphs | Zengyi Gao et.al. | 2501.09957 | null |
| 2025-01-17 | Evolving Deeper LLM Thinking | Kuang-Huei Lee et.al. | 2501.09891 | null |
| 2025-01-23 | Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models | Fengli Xu et.al. | 2501.09686 | null |
| 2025-01-15 | Multimodal LLMs Can Reason about Aesthetics in Zero-Shot | Ruixiang Jiang et.al. | 2501.09012 | link |
| 2025-02-10 | Ensemble of Large Language Models for Curated Labeling and Rating of Free-text Data | Jiaxing Qiu et.al. | 2501.08413 | link |
| 2025-01-14 | Reasoning with Graphs: Structuring Implicit Knowledge to Enhance LLMs Reasoning | Haoyu Han et.al. | 2501.07845 | null |
| 2025-01-09 | Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark | Yunzhuo Hao et.al. | 2501.05444 | link |
| 2025-01-08 | Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations | Archita Srivastava et.al. | 2501.04675 | null |
| 2025-01-08 | DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests | Charles Corbière et.al. | 2501.04671 | null |
| 2025-01-08 | Understanding Before Reasoning: Enhancing Chain-of-Thought with Iterative Summarization Pre-Prompting | Dong-Hai Zhu et.al. | 2501.04341 | link |
| 2025-01-07 | Reasoning-Enhanced Self-Training for Long-Form Personalized Text Generation | Alireza Salemi et.al. | 2501.04167 | null |
| 2025-01-07 | Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild | Wanpeng Hu et.al. | 2501.02964 | link |
| 2025-01-06 | KG-CF: Knowledge Graph Completion with Context Filtering under the Guidance of Large Language Models | Zaiyi Zheng et.al. | 2501.02711 | null |
| 2025-01-04 | Table as Thought: Exploring Structured Thoughts in LLM Reasoning | Zhenjie Sun et.al. | 2501.02152 | null |
| 2025-01-03 | Recursive Decomposition of Logical Thoughts: Framework for Superior Reasoning and Knowledge Propagation in Large Language Models | Kaleem Ullah Qasim et.al. | 2501.02026 | null |
| 2025-01-02 | Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search | Shuangtao Li et.al. | 2501.01478 | null |
| 2025-01-02 | HetGCoT-Rec: Heterogeneous Graph-Enhanced Chain-of-Thought LLM Reasoning for Journal Recommendation | Runsong Jia et.al. | 2501.01203 | null |
| 2025-01-03 | Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents | Chengbo He et.al. | 2501.00430 | null |
| 2024-12-31 | EQUATOR: A Deterministic Framework for Evaluating LLM Reasoning with Open-Ended Questions. # v1.0.0-beta | Raymond Bernard et.al. | 2501.00257 | null |
| 2024-12-30 | Efficiently Serving LLM Reasoning Programs with Certaindex | Yichao Fu et.al. | 2412.20993 | null |
| 2024-12-28 | LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning | Shuguang Chen et.al. | 2412.20227 | null |
| 2025-02-17 | Token-Budget-Aware LLM Reasoning | Tingxu Han et.al. | 2412.18547 | link |
| 2024-12-23 | StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs | Hailin Chen et.al. | 2412.18011 | null |
| 2025-02-09 | Evaluating LLM Reasoning in the Operations Research Domain with ORQA | Mahdi Mostajabdaveh et.al. | 2412.17874 | link |
| 2024-12-23 | Diving into Self-Evolving Training for Multimodal Reasoning | Wei Liu et.al. | 2412.17451 | null |
| 2024-12-21 | SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization | Tan-Hanh Pham et.al. | 2412.16771 | null |
| 2024-12-20 | PruneVid: Visual Token Pruning for Efficient Video Large Language Models | Xiaohu Huang et.al. | 2412.16117 | link |
| 2024-12-19 | Eliciting Causal Abilities in Large Language Models for Reasoning Tasks | Yajing Wang et.al. | 2412.15314 | link |
| 2024-12-19 | Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying | Federico Castagna et.al. | 2412.15177 | link |
| 2024-12-19 | Progressive Multimodal Reasoning via Active Retrieval | Guanting Dong et.al. | 2412.14835 | null |
| 2024-12-19 | FiVL: A Framework for Improved Vision-Language Alignment | Estelle Aflalo et.al. | 2412.14672 | null |
| 2024-12-19 | FaultExplainer: Leveraging Large Language Models for Interpretable Fault Detection and Diagnosis | Abdullah Khan et.al. | 2412.14492 | link |
| 2024-12-18 | Cognition Chain for Explainable Psychological Stress Detection on Social Media | Xin Wang et.al. | 2412.14009 | link |
| 2024-12-27 | Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence | Jinghan He et.al. | 2412.13949 | null |
| 2025-02-16 | Do Language Models Understand Time? | Xi Ding et.al. | 2412.13845 | link |
| 2024-12-18 | Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games | Wenye Lin et.al. | 2412.13602 | link |
| 2024-12-17 | ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models | Yuxi Sun et.al. | 2412.12848 | null |
| 2024-12-12 | A NotSo Simple Way to Beat Simple Bench | Soham Sane et.al. | 2412.12173 | null |
| 2024-12-11 | What Makes In-context Learning Effective for Mathematical Reasoning: A Theoretical Analysis | Jiayu Liu et.al. | 2412.12157 | null |
| 2025-02-18 | A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges | Yibo Yan et.al. | 2412.11936 | null |
| 2024-12-24 | Stepwise Reasoning Error Disruption Attack of LLMs | Jingyu Peng et.al. | 2412.11934 | null |
| 2024-12-16 | Leveraging Retrieval-Augmented Tags for Large Vision-Language Understanding in Complex Scenes | Antonio Carlos Rivera et.al. | 2412.11396 | null |
| 2024-12-15 | SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation | Hang Zhang et.al. | 2412.11026 | null |
| 2024-12-15 | Entropy-Regularized Process Reward Model | Hanning Zhang et.al. | 2412.11006 | link |
| 2024-12-14 | Optimizing Vision-Language Interactions Through Decoder-Only Models | Kaito Tanaka et.al. | 2412.10758 | null |
| 2024-12-14 | Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation | Sukai Huang et.al. | 2412.10675 | null |
| 2024-12-14 | Thinking with Knowledge Graphs: Enhancing LLM Reasoning Through Structured Data | Xue Wu et.al. | 2412.10654 | null |
| 2024-12-13 | EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual Editing | Umar Khalid et.al. | 2412.10566 | null |
| 2024-12-13 | Atomic Learning Objectives Labeling: A High-Resolution Approach for Physics Education | Naiming Liu et.al. | 2412.09914 | null |
| 2025-01-18 | Neptune: The Long Orbit to Benchmarking Long Video Understanding | Arsha Nagrani et.al. | 2412.09582 | link |
| 2025-02-14 | Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning | Zhenni Bi et.al. | 2412.09078 | link |
| 2024-12-11 | Training Large Language Models to Reason in a Continuous Latent Space | Shibo Hao et.al. | 2412.06769 | link |
| 2025-01-23 | GameArena: Evaluating LLM Reasoning through Live Computer Games | Lanxiang Hu et.al. | 2412.06394 | null |
| 2024-12-08 | Language hooks: a modular framework for augmenting LLM reasoning that decouples tool usage from the model and its prompt | Damien de Mijolla et.al. | 2412.05967 | null |
| 2024-12-06 | MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale | Jarvis Guo et.al. | 2412.05237 | null |
| 2024-12-05 | Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction | Yiheng Xu et.al. | 2412.04454 | null |
| 2024-12-05 | SocialMind: LLM-based Proactive AR Social Assistive System with Human-like Perception for In-situ Live Interactions | Bufang Yang et.al. | 2412.04036 | null |
| 2024-12-04 | DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation | Qingdong He et.al. | 2412.03255 | null |
| 2024-12-03 | Explainable CTR Prediction via LLM Reasoning | Xiaohan Yu et.al. | 2412.02588 | null |
| 2025-02-12 | NYT-Connections: A Deceptively Simple Text Classification Task that Stumps System-1 Thinkers | Angel Yahir Loredo Lopez et.al. | 2412.01621 | null |
| 2025-01-13 | Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability | Zicheng Lin et.al. | 2411.19943 | link |
| 2024-11-29 | TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension | Zipeng Qiu et.al. | 2411.19504 | link |
| 2024-11-29 | COLD: Causal reasOning in cLosed Daily activities | Abhinav Joshi et.al. | 2411.19500 | link |
| 2024-12-16 | Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning | Di Zhang et.al. | 2411.18203 | null |
| 2024-11-26 | NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects? | Jiaxuan Li et.al. | 2411.17794 | null |
| 2024-11-25 | Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision | Zhiheng Xi et.al. | 2411.16579 | null |
| 2024-11-22 | On the Impact of Fine-Tuning on Chain-of-Thought Reasoning | Elita Lobo et.al. | 2411.15382 | null |
| 2024-11-21 | Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models | Yuhao Dong et.al. | 2411.14432 | link |
| 2024-11-20 | BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games | Davide Paglieri et.al. | 2411.13543 | null |
| 2024-11-20 | Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving | Hao Zhou et.al. | 2411.13076 | null |
| 2024-11-15 | Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination | Haojie Zheng et.al. | 2411.12591 | link |
| 2024-12-23 | Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus | Terufumi Morishita et.al. | 2411.12498 | link |
| 2024-11-18 | Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation | Mingchao Qi et.al. | 2411.11714 | link |
| 2024-12-31 | Enhancing LLM Reasoning with Reward-guided Tree Search | Jinhao Jiang et.al. | 2411.11694 | null |
| 2024-12-15 | A dataset of questions on decision-theoretic reasoning in Newcomb-like problems | Caspar Oesterheld et.al. | 2411.10588 | link |
| 2024-11-15 | Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization | Weiyun Wang et.al. | 2411.10442 | null |
| 2025-01-09 | LLaVA-CoT: Let Vision Language Models Reason Step-by-Step | Guowei Xu et.al. | 2411.10440 | link |
| 2024-11-15 | Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level | Andong Deng et.al. | 2411.09921 | null |
| 2024-11-14 | Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering | Nghia Trung Ngo et.al. | 2411.09213 | null |
| 2024-11-13 | Tree-of-Table: Unleashing the Power of LLMs for Enhanced Large-Scale Table Understanding | Deyi Ji et.al. | 2411.08516 | null |
| 2024-11-18 | What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? | Katie Kang et.al. | 2411.07681 | link |
| 2024-11-27 | Self-Training Meets Consistency: Improving LLMs' Reasoning With Consistency-Driven Rationale Evaluation | Jaehyeok Lee et.al. | 2411.06387 | link |
| 2024-11-09 | A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization | Haoxin Liu et.al. | 2411.06018 | null |
| 2024-11-11 | LLMs as Method Actors: A Model for Prompt Engineering and Architecture | Colin Doyle et.al. | 2411.05778 | link |
| 2024-11-12 | Kwai-STaR: Transform LLMs into State-Transition Reasoners | Xingyu Lu et.al. | 2411.04799 | null |
| 2024-11-21 | Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding | Haolin Chen et.al. | 2411.04282 | link |
| 2024-11-05 | CrowdGenUI: Enhancing LLM-Based UI Widget Generation with a Crowdsourced Preference Library | Yimeng Liu et.al. | 2411.03477 | null |
| 2025-01-27 | MetRex: A Benchmark for Verilog Code Metric Reasoning Using LLMs | Manar Abdelatty et.al. | 2411.03471 | link |
| 2024-11-04 | RuAG: Learned-rule-augmented Generation for Large Language Models | Yudi Zhang et.al. | 2411.03349 | null |
| 2024-10-30 | Vision-Language Models Can Self-Improve Reasoning via Reflection | Kanzhi Cheng et.al. | 2411.00855 | null |
| 2024-11-01 | Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling | Yiwen Ding et.al. | 2411.00750 | link |
| 2024-11-01 | STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing | Jiaru Zou et.al. | 2411.00387 | null |
| 2024-11-08 | GRS-QA -- Graph Reasoning-Structured Question Answering Dataset | Anish Pahilajani et.al. | 2411.00369 | null |
| 2024-10-31 | Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning | Jinghan Zhang et.al. | 2410.24155 | null |
| 2024-10-31 | RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner | Fu-Chieh Chang et.al. | 2410.23912 | null |
| 2024-10-31 | OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models | Junda Wu et.al. | 2410.23703 | null |
| 2024-10-30 | ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning | Millennium Bismay et.al. | 2410.23180 | link |
| 2024-10-30 | On Memorization of Large Language Models in Logical Reasoning | Chulin Xie et.al. | 2410.23123 | null |
| 2024-10-28 | Causal Interventions on Causal Paths: Mapping GPT-2's Reasoning From Syntax to Semantics | Isabelle Lee et.al. | 2410.21353 | null |
| 2024-10-28 | Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments | Sangmim Song et.al. | 2410.20666 | null |
| 2024-10-25 | Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models | Danqing Wang et.al. | 2410.20007 | null |
| 2024-10-25 | Can Stories Help LLMs Reason? Curating Information Space Through Narrative | Vahid Sadiri Javadi et.al. | 2410.19221 | null |
| 2024-10-18 | Make LLMs better zero-shot reasoners: Structure-orientated autonomous reasoning | Pengfei He et.al. | 2410.19000 | link |
| 2024-10-25 | CLR-Bench: Evaluating Large Language Models in College-level Reasoning | Junnan Dong et.al. | 2410.17558 | null |
| 2024-10-28 | Non-myopic Generation of Language Models for Reasoning and Planning | Chang Ma et.al. | 2410.17195 | link |
| 2024-11-06 | Improving Causal Reasoning in Large Language Models: A Survey | Longxuan Yu et.al. | 2410.16676 | link |
| 2024-10-22 | A Statistical Analysis of LLMs' Self-Evaluation Using Proverbs | Ryosuke Sonoda et.al. | 2410.16640 | null |
| 2024-10-21 | Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models' Reasoning with Formal Logic | Jason Chan et.al. | 2410.16502 | null |
| 2024-11-27 | On Designing Effective RL Reward at Training Time for LLM Reasoning | Jiaxuan Gao et.al. | 2410.15115 | null |
| 2025-01-28 | Paths-over-Graph: Knowledge Graph Empowered Large Language Model Reasoning | Xingyu Tan et.al. | 2410.14211 | null |
| 2024-10-21 | Unconstrained Model Merging for Enhanced LLM Reasoning | Yiming Zhang et.al. | 2410.13699 | null |
| 2024-10-16 | Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models | Linhao Luo et.al. | 2410.13080 | link |
| 2024-10-16 | KcMF: A Knowledge-compliant Framework for Schema and Entity Matching with Fine-tuning-free LLMs | Yongqin Xu et.al. | 2410.12480 | null |
| 2024-10-17 | Enhancing LLM Trading Performance with Fact-Subjectivity Aware Reasoning | Qian Wang et.al. | 2410.12464 | link |
| 2024-10-16 | Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up | Jiahao Yuan et.al. | 2410.12323 | link |
| 2024-10-16 | Exploiting LLMs' Reasoning Capability to Infer Implicit Concepts in Legal Information Retrieval | Hai-Long Nguyen et.al. | 2410.12154 | null |
| 2024-10-15 | Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming | Yilun Hao et.al. | 2410.12112 | null |
| 2024-10-12 | OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models | Jun Wang et.al. | 2410.09671 | null |
| 2024-10-11 | P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains | Simeng Han et.al. | 2410.09207 | null |
| 2024-10-11 | Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning | Yunpeng Gao et.al. | 2410.08500 | null |
| 2024-10-10 | SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation | Hang Yin et.al. | 2410.08189 | null |
| 2024-10-10 | Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning | Amrith Setlur et.al. | 2410.08146 | null |
| 2024-10-10 | Automatic Curriculum Expert Iteration for Reliable LLM Reasoning | Zirui Zhao et.al. | 2410.07627 | link |
| 2024-10-09 | Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis | Ahmed Abdullah et.al. | 2410.06841 | null |
| 2024-10-09 | Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning | Xiyao Wang et.al. | 2410.06508 | null |
| 2025-01-02 | Filtering Discomforting Recommendations with Large Language Models | Jiahao Liu et.al. | 2410.05411 | null |
| 2024-10-05 | Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification | Zhenwen Liang et.al. | 2410.05318 | null |
| 2024-10-06 | Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval | Pengcheng Jiang et.al. | 2410.04585 | link |
| 2024-10-03 | The Role of Deductive and Inductive Reasoning in Large Language Models | Chengkun Cai et.al. | 2410.02892 | null |
| 2024-10-02 | Not All LLM Reasoners Are Created Equal | Arian Hosseini et.al. | 2410.01748 | null |
| 2024-12-25 | Interpretable Contrastive Monte Carlo Tree Search Reasoning | Zitian Gao et.al. | 2410.01707 | link |
| 2024-10-02 | VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment | Amirhossein Kazemnejad et.al. | 2410.01679 | link |
| 2024-10-02 | AHP-Powered LLM Reasoning for Multi-Criteria Evaluation of Open-Ended Responses | Xiaotian Lu et.al. | 2410.01246 | null |
| 2024-10-01 | Self-controller: Controlling LLMs with Multi-round Step-by-step Self-awareness | Xiao Peng et.al. | 2410.00359 | null |
| 2024-10-01 | Insight: A Multi-Modal Diagnostic Pipeline using LLMs for Ocular Surface Disease Diagnosis | Chun-Hsiao Yeh et.al. | 2410.00292 | null |
| 2024-10-08 | GUNDAM: Aligning Large Language Models with Graph Understanding | Sheng Ouyang et.al. | 2409.20053 | null |
| 2024-09-27 | Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs | Yanyuan Qiao et.al. | 2409.18794 | null |
| 2024-10-23 | Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning | Debargha Ganguly et.al. | 2409.17270 | null |
| 2024-09-20 | CSCE: Boosting LLM Reasoning by Simultaneous Enhancing of Casual Significance and Consistency | Kangsheng Wang et.al. | 2409.17174 | null |
| 2024-09-20 | Mufu: Multilingual Fused Learning for Low-Resource Translation with LLM | Zheng Wei Lim et.al. | 2409.13949 | null |
| 2024-09-19 | SituationAdapt: Contextual UI Optimization in Mixed Reality with Situation Awareness via LLM Reasoning | Zhipeng Li et.al. | 2409.12836 | null |
| 2024-10-04 | Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning | Jiaxin Wen et.al. | 2409.12452 | link |
| 2024-12-16 | Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data | Jiaming Zhou et.al. | 2409.12437 | link |
| 2024-09-18 | MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning | Justin Chih-Yao Chen et.al. | 2409.12147 | link |
| 2024-11-05 | Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent | Fatemeh Haji et.al. | 2409.11527 | link |
| 2024-09-16 | Enhancing RL Safety with Counterfactual LLM Reasoning | Dennis Gross et.al. | 2409.10188 | link |
| 2024-09-11 | Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation | SeongYeub Chu et.al. | 2409.07355 | link |
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-07-22 | Mind the Gap: Evaluating the Representativeness of Quantitative Medical Language Reasoning LLM Benchmarks for African Disease Burdens | Fred Mutisya et.al. | 2507.16322 | null |
| 2025-07-18 | Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark | Goeric Huybrechts et.al. | 2507.15882 | null |
| 2025-07-21 | Left Leaning Models: AI Assumptions on Economic Policy | Maxim Chupilkin et.al. | 2507.15771 | null |
| 2025-07-21 | From Queries to Criteria: Understanding How Astronomers Evaluate LLMs | Alina Hyk et.al. | 2507.15715 | null |
| 2025-07-21 | Evaluating Text Style Transfer: A Nine-Language Benchmark for Text Detoxification | Vitaly Protasov et.al. | 2507.15557 | null |
| 2025-07-15 | LLM-based ambiguity detection in natural language instructions for collaborative surgical robots | Ana Davila et.al. | 2507.11525 | null |
| 2025-07-15 | DCR: Quantifying Data Contamination in LLMs Evaluation | Cheng Xu et.al. | 2507.11405 | null |
| 2025-07-17 | SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks | Pavel Adamenko et.al. | 2507.11059 | null |
| 2025-07-11 | OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique | Wasi Uddin Ahmad et.al. | 2507.09075 | null |
| 2025-07-18 | From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation | Seokhee Hong et.al. | 2507.08924 | null |
| 2025-07-11 | A Third Paradigm for LLM Evaluation: Dialogue Game-Based Evaluation using clembench | David Schlangen et.al. | 2507.08491 | null |
| 2025-07-07 | Train-before-Test Harmonizes Language Model Rankings | Guanhua Zhang et.al. | 2507.05195 | null |
| 2025-07-13 | SymbolicThought: Integrating Language Models and Symbolic Reasoning for Consistent and Interpretable Human Relationship Understanding | Runcong Zhao et.al. | 2507.04189 | null |
| 2025-07-09 | Skewed Score: A statistical framework to assess autograders | Magda Dubois et.al. | 2507.03772 | null |
| 2025-07-12 | Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages | Samridhi Raj Sinha et.al. | 2507.01853 | null |
| 2025-07-01 | Pitfalls of Evaluating Language Models with Open Benchmarks | Md. Najib Hasan et.al. | 2507.00460 | null |
| 2025-06-30 | AutoEvoEval: An Automated Framework for Evolving Close-Ended LLM Evaluation Data | JiaRu Wu et.al. | 2506.23735 | null |
| 2025-06-27 | WildSpeech-Bench: Benchmarking Audio LLMs in Natural Speech Conversation | Jian Zhang et.al. | 2506.21875 | null |
| 2025-06-25 | DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs | Ruokai Yin et.al. | 2506.20194 | null |
| 2025-06-23 | Smart-LLaMA-DPO: Reinforced Large Language Model for Explainable Smart Contract Vulnerability Detection | Lei Yu et.al. | 2506.18245 | null |
| 2025-06-22 | The Democratic Paradox in Large Language Models' Underestimation of Press Freedom | I. Loaiza et.al. | 2506.18045 | null |
| 2025-06-21 | CodeMorph: Mitigating Data Leakage in Large Language Model Assessment | Hongzhou Rao et.al. | 2506.17627 | null |
| 2025-06-20 | Re-Evaluating Code LLM Benchmarks Under Semantic Mutation | Zhiyuan Pan et.al. | 2506.17369 | null |
| 2025-06-19 | LMR-BENCH: Evaluating LLM Agent's Ability on Reproducing Language Modeling Research | Shuo Yan et.al. | 2506.17335 | null |
| 2025-06-20 | Do We Need Large VLMs for Spotting Soccer Actions? | Ritabrata Chakraborty et.al. | 2506.17144 | null |
| 2025-06-17 | SFT-GO: Supervised Fine-Tuning with Group Optimization for Large Language Models | Gyuhak Kim et.al. | 2506.15021 | null |
| 2025-06-19 | MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation | Xueqing Peng et.al. | 2506.14028 | null |
| 2025-06-18 | The NordDRG AI Benchmark for Large Language Models | Tapio Pitkäranta et.al. | 2506.13790 | link |
| 2025-06-20 | Domain Specific Benchmarks for Evaluating Multimodal Large Language Models | Khizar Anjum et.al. | 2506.12958 | null |
| 2025-06-06 | The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs | Songyang Liu et.al. | 2506.11094 | null |
| 2025-05-22 | NSW-EPNews: A News-Augmented Benchmark for Electricity Price Forecasting with LLMs | Zhaoge Bi et.al. | 2506.11050 | null |
| 2025-04-23 | Impact of Comments on LLM Comprehension of Legacy Code | Rock Sabetto et.al. | 2506.11007 | null |
| 2025-06-12 | LLM-Driven Personalized Answer Generation and Evaluation | Mohammadreza Molavi et.al. | 2506.10829 | null |
| 2025-06-11 | Textual Bayes: Quantifying Uncertainty in LLM-Based Systems | Brendan Leigh Ross et.al. | 2506.10060 | null |
| 2025-06-16 | Metritocracy: Representative Metrics for Lite Benchmarks | Ariel Procaccia et.al. | 2506.09813 | null |
| 2025-06-10 | Breaking the ICE: Exploring promises and challenges of benchmarks for Inference Carbon & Energy estimation for LLMs | Samarth Sikand et.al. | 2506.08727 | null |
| 2025-06-10 | Sample Efficient Demonstration Selection for In-Context Learning | Kiran Purohit et.al. | 2506.08607 | link |
| 2025-06-09 | How Benchmark Prediction from Fewer Data Misses the Mark | Guanhua Zhang et.al. | 2506.07673 | link |
| 2025-06-09 | Beyond Benchmarks: A Novel Framework for Domain-Specific LLM Evaluation and Knowledge Mapping | Nitin Sharma et.al. | 2506.07658 | null |
| 2025-06-09 | Vuyko Mistral: Adapting LLMs for Low-Resource Dialectal Translation | Roman Kyslyi et.al. | 2506.07617 | null |
| 2025-06-05 | LLM-First Search: Self-Guided Exploration of the Solution Space | Nathan Herr et.al. | 2506.05213 | link |
| 2025-06-05 | Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation | Noy Sternlicht et.al. | 2506.05062 | link |
| 2025-06-04 | BEAR: BGP Event Analysis and Reporting | Hanqing Li et.al. | 2506.04514 | link |
| 2025-06-04 | N |
Caleb Chin et.al. | 2506.04166 | link |
| 2025-06-04 | Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis | Kejian Zhu et.al. | 2506.04142 | null |
| 2025-06-03 | NetPress: Dynamically Generated LLM Benchmarks for Network Applications | Yajie Zhou et.al. | 2506.03231 | link |
| 2025-06-04 | PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs | Ze Yu Zhang et.al. | 2506.02965 | null |
| 2025-06-02 | Multilingual Definition Modeling | Edison Marrese-Taylor et.al. | 2506.01489 | null |
| 2025-06-01 | Taming LLMs by Scaling Learning Rates with Gradient Grouping | Siyuan Li et.al. | 2506.01049 | null |
| 2025-06-06 | Data Swarms: Optimizable Generation of Synthetic Evaluation Data | Shangbin Feng et.al. | 2506.00741 | null |
| 2025-05-31 | AgentAuditor: Human-Level Safety and Security Evaluation for LLM Agents | Hanjun Luo et.al. | 2506.00641 | null |
| 2025-05-31 | BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation | Eunsu Kim et.al. | 2506.00482 | null |
| 2025-05-30 | MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs | Gabrielle Kaili-May Liu et.al. | 2505.24858 | link |
| 2025-05-30 | Benchmarking Large Language Models for Cryptanalysis and Mismatched-Generalization | Utsav Maskey et.al. | 2505.24621 | null |
| 2025-05-30 | Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation | Naila Shafirni Hidayat et.al. | 2505.24263 | link |
| 2025-05-29 | Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs | Yinong Oliver Wang et.al. | 2505.23996 | null |
| 2025-05-29 | Revisiting Uncertainty Estimation and Calibration of Large Language Models | Linwei Tao et.al. | 2505.23854 | null |
| 2025-05-28 | Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective | Qingchuan Ma et.al. | 2505.23833 | link |
| 2025-06-24 | MemAscend: System Memory Optimization for SSD-Offloaded LLM Fine-Tuning | Yong-Cheng Liaw et.al. | 2505.23254 | null |
| 2025-07-03 | Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding | Chengyue Wu et.al. | 2505.22618 | null |
| 2025-05-29 | Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition | Hanting Chen et.al. | 2505.22375 | null |
| 2025-05-28 | ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments | Gili Lior et.al. | 2505.22169 | null |
| 2025-05-28 | Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate | Ashim Gupta et.al. | 2505.21999 | null |
| 2025-05-21 | SIMCOPILOT: Evaluating Large Language Models for Copilot-Style Code Generation | Mingchao Jiang et.al. | 2505.21514 | null |
| 2025-05-26 | Dynamically Learned Test-Time Model Routing in Language Model Zoos with Service Level Guarantees | Herbert Woisetschläger et.al. | 2505.19947 | null |
| 2025-05-26 | BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs | Guilong Lu et.al. | 2505.19457 | link |
| 2025-05-25 | Likert or Not: LLM Absolute Relevance Judgments on Fine-Grained Ordinal Scales | Charles Godfrey et.al. | 2505.19334 | null |
| 2025-05-25 | Can Large Language Models Infer Causal Relationships from Real-World Text? | Ryan Saklad et.al. | 2505.18931 | null |
| 2025-05-24 | MedScore: Factuality Evaluation of Free-Form Medical Answers | Heyuan Huang et.al. | 2505.18452 | link |
| 2025-05-23 | How Can I Publish My LLM Benchmark Without Giving the True Answers Away? | Takashi Ishida et.al. | 2505.18102 | null |
| 2025-05-23 | ELSPR: Evaluator LLM Training Data Self-Purification on Non-Transitive Preferences via Tournament Graph Reconstruction | Yan Yu et.al. | 2505.17691 | null |
| 2025-05-22 | CaseReportBench: An LLM Benchmark Dataset for Dense Information Extraction in Clinical Case Reports | Xiao Yu Cindy Zhang et.al. | 2505.17265 | null |
| 2025-05-21 | NEXT-EVAL: Next Evaluation of Traditional and LLM Web Data Record Extraction | Soyeon Kim et.al. | 2505.17125 | null |
| 2025-05-21 | Any Large Language Model Can Be a Reliable Judge: Debiasing with a Reasoning-based Bias Detector | Haoyan Yang et.al. | 2505.17100 | null |
| 2025-05-22 | AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios | Yunjia Qi et.al. | 2505.16944 | link |
| 2025-05-22 | CASTILLO: Characterizing Response Length Distributions of Large Language Models | Daniel F. Perez-Ramirez et.al. | 2505.16881 | link |
| 2025-05-21 | Reverse Engineering Human Preferences with Reinforcement Learning | Lisa Alazraki et.al. | 2505.15795 | null |
| 2025-05-21 | An Empirical Study of the Anchoring Effect in LLMs: Existence, Mechanism, and Potential Mitigations | Yiming Huang et.al. | 2505.15392 | null |
| 2025-05-21 | Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory | Hongli Zhou et.al. | 2505.15055 | link |
| 2025-05-20 | FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain | Rohan Deb et.al. | 2505.14826 | null |
| 2025-05-20 | Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding? | Bo Feng et.al. | 2505.14321 | null |
| 2025-05-29 | YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering | Jennifer D'Souza et.al. | 2505.14279 | null |
| 2025-05-20 | Think-J: Learning to Think for Generative LLM-as-a-Judge | Hui Huang et.al. | 2505.14268 | link |
| 2025-05-19 | 4Hammer: a board-game reinforcement learning environment for the hour long time frame | Massimo Fioravanti et.al. | 2505.13638 | link |
| 2025-05-18 | KG-QAGen: A Knowledge-Graph-Based Framework for Systematic Question Generation and Long-Context LLM Evaluation | Nikita Tatarinov et.al. | 2505.12495 | link |
| 2025-05-17 | Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation | Vincent Koc et.al. | 2505.12058 | link |
| 2025-05-21 | Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization | Ximing Dong et.al. | 2505.10736 | null |
| 2025-05-13 | A suite of LMs comprehend puzzle statements as well as humans | Adele E Goldberg et.al. | 2505.08996 | null |
| 2025-05-13 | Towards Contamination Resistant Benchmarks | Rahmatullah Musawi et.al. | 2505.08389 | null |
| 2025-05-12 | A Case Study Investigating the Role of Generative AI in Quality Evaluations of Epics in Agile Software Development | Werner Geyer et.al. | 2505.07664 | null |
| 2025-05-09 | LLMs Get Lost In Multi-Turn Conversation | Philippe Laban et.al. | 2505.06120 | link |
| 2025-05-15 | Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information | Joshua Harris et.al. | 2505.06046 | null |
| 2025-05-02 | Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs | Ganghua Wang et.al. | 2505.03814 | null |
| 2025-05-29 | am-ELO: A Stable Framework for Arena-based LLM Evaluation | Zirui Liu et.al. | 2505.03475 | null |
| 2025-05-05 | Developing A Framework to Support Human Evaluation of Bias in Generated Free Response Text | Jennifer Healey et.al. | 2505.03053 | null |
| 2025-05-01 | Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation | Vaidehi Patil et.al. | 2505.01456 | link |
| 2025-04-30 | A Report on the llms evaluating the high school questions | Zhu Jiawei et.al. | 2505.00057 | null |
| 2025-04-30 | RDF-Based Structured Quality Assessment Representation of Multilingual LLM Evaluations | Jonas Gwozdz et.al. | 2504.21605 | null |
| 2025-04-30 | Confidence in Large Language Model Evaluation: A Bayesian Approach to Limited-Sample Challenges | Xiao Xiao et.al. | 2504.21303 | null |
| 2025-04-27 | LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations | Laura Dietz et.al. | 2504.19076 | null |
| 2025-04-23 | Agree to Disagree? A Meta-Evaluation of LLM Misgendering | Arjun Subramonian et.al. | 2504.17075 | link |
| 2025-04-23 | IberBench: LLM Evaluation on Iberian Languages | José Ángel González et.al. | 2504.16921 | null |
| 2025-04-23 | Private Federated Learning using Preference-Optimized Synthetic Data | Charlie Hou et.al. | 2504.16438 | link |
| 2025-04-29 | Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark | Jasper Götting et.al. | 2504.16137 | null |
| 2025-05-16 | DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain | Enhao Huang et.al. | 2504.16116 | null |
| 2025-04-22 | Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach | Ruizhe Li et.al. | 2504.15784 | null |
| 2025-04-20 | Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey | Ahsan Bilal et.al. | 2504.14520 | null |
| 2025-04-20 | Information Diffusion and Preferential Attachment in a Network of Large Language Models | Adit Jain et.al. | 2504.14438 | null |
| 2025-04-18 | MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks | Jaime Raldua Veuthey et.al. | 2504.14039 | null |
| 2025-04-17 | ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition | Haidar Khan et.al. | 2504.12562 | link |
| 2025-04-17 | ELAB: Extensive LLM Alignment Benchmark in Persian Language | Zahra Pourbahman et.al. | 2504.12553 | null |
| 2025-04-16 | MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models | Hang Yuan et.al. | 2504.12234 | null |
| 2025-04-17 | Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation | Julia Kreutzer et.al. | 2504.11829 | null |
| 2025-04-14 | HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving | Avinash Kumar et.al. | 2504.10724 | null |
| 2025-05-19 | Large Language Models Could Be Rote Learners | Yuyang Xu et.al. | 2504.08300 | null |
| 2025-05-30 | DeepSeek-R1 vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization? | Daniil Larionov et.al. | 2504.08120 | null |
| 2025-05-15 | Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric | Yixin Cao et.al. | 2504.07440 | link |
| 2025-06-20 | TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models | Sher Badshah et.al. | 2504.07385 | null |
| 2025-04-08 | NativQA Framework: Enabling LLMs with Native, Local, and Everyday Knowledge | Firoj Alam et.al. | 2504.05995 | null |
| 2025-04-09 | How Accurately Do Large Language Models Understand Code? | Sabaat Haroon et.al. | 2504.04372 | null |
| 2025-04-04 | Do LLM Evaluators Prefer Themselves for a Reason? | Wei-Lin Chen et.al. | 2504.03846 | link |
| 2025-04-15 | Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning | Kai Ye et.al. | 2504.03784 | null |
| 2025-04-04 | Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency | Erik Johannes Husom et.al. | 2504.03360 | null |
| 2025-04-02 | YourBench: Easy Custom Evaluation Sets for Everyone | Sumuk Shashidhar et.al. | 2504.01833 | link |
| 2025-04-08 | Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems? | Kai Yan et.al. | 2504.00509 | null |
| 2025-04-01 | HRET: A Self-Evolving LLM Evaluation Toolkit for Korean | Hanwool Lee et.al. | 2503.22968 | null |
| 2025-03-27 | CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers? | Jiefu Ou et.al. | 2503.21717 | link |
| 2025-03-27 | Evaluating book summaries from internal knowledge in Large Language Models: a cross-model and semantic consistency approach | Javier Coronado-Blázquez et.al. | 2503.21613 | null |
| 2025-05-19 | Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models | Haoxiang Sun et.al. | 2503.21380 | link |
| 2025-03-25 | FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models | Dahyun Jung et.al. | 2503.19540 | link |
| 2025-05-30 | LLM Benchmarking with LLaMA2: Evaluating Code Development Performance Across Multiple Programming Languages | Patrick Diehl et.al. | 2503.19217 | null |
| 2025-03-28 | Overtrained Language Models Are Harder to Fine-Tune | Jacob Mitchell Springer et.al. | 2503.19206 | null |
| 2025-03-25 | Decorum: A Language-Based Approach For Style-Conditioned Synthesis of Indoor 3D Scenes | Kelly O. Marshall et.al. | 2503.18155 | null |
| 2025-05-14 | Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark | Zheqing Li et.al. | 2503.17599 | null |
| 2025-03-20 | The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination | Yifan Sun et.al. | 2503.16402 | link |
| 2025-03-20 | Fùxì: A Benchmark for Evaluating Language Models on Ancient Chinese Text Understanding and Generation | Shangqing Zhao et.al. | 2503.15837 | link |
| 2025-06-08 | Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering | Francesco Maria Molfese et.al. | 2503.14996 | null |
| 2025-03-13 | It is Too Many Options: Pitfalls of Multiple-Choice Questions in Generative AI and Medical Education | Shrutika Singh et.al. | 2503.13508 | null |
| 2025-03-17 | REPA: Russian Error Types Annotation for Evaluating Text Generation and Judgment Capabilities | Alexander Pugachev et.al. | 2503.13102 | null |
| 2025-03-14 | V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning | Zixu Cheng et.al. | 2503.11495 | null |
| 2025-06-03 | OASST-ETC Dataset: Alignment Signals from Eye-tracking Analysis of LLM Responses | Angela Lopez-Cardona et.al. | 2503.10927 | link |
| 2025-03-13 | Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data | Paul Quinlan et.al. | 2503.10883 | null |
| 2025-03-13 | Commenting Higher-level Code Unit: Full Code, Reduced Code, or Hierarchical Code Summarization | Weisong Sun et.al. | 2503.10737 | null |
| 2025-03-12 | Medical Large Language Model Benchmarks Should Prioritize Construct Validity | Ahmed Alaa et.al. | 2503.10694 | null |
| 2025-04-17 | ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition | Hisham A. Alyahya et.al. | 2503.10673 | link |
| 2025-05-20 | RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs | Zhongzhan Huang et.al. | 2503.10657 | link |
| 2025-05-26 | MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation | Weihao Xuan et.al. | 2503.10497 | null |
| 2025-03-12 | Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts | Hongyu Chen et.al. | 2503.09347 | null |
| 2025-03-08 | SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant? | Xudong Lu et.al. | 2503.06029 | null |
| 2025-03-07 | SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs | Samir Abdaljalil et.al. | 2503.05980 | null |
| 2025-03-07 | RocketEval: Efficient Automated LLM Evaluation via Grading Checklist | Tianjun Wei et.al. | 2503.05142 | link |
| 2025-02-09 | Peeking Behind Closed Doors: Risks of LLM Evaluation by Private Data Curators | Hritik Bansal et.al. | 2503.04756 | null |
| 2025-03-07 | Human Implicit Preference-Based Policy Fine-tuning for Multi-Agent Reinforcement Learning in USV Swarm | Hyeonjun Kim et.al. | 2503.03796 | null |
| 2025-03-04 | SAGE: Steering and Refining Dialog Generation with State-Action Augmentation | Yizhe Zhang et.al. | 2503.03040 | link |
| 2025-05-28 | Position: Don't Use the CLT in LLM Evals With Fewer Than a Few Hundred Datapoints | Sam Bowyer et.al. | 2503.01747 | null |
| 2025-03-04 | DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation | Eliya Habba et.al. | 2503.01622 | null |
| 2025-03-03 | None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering | Zhi Rui Tam et.al. | 2503.01550 | null |
| 2025-03-03 | SwiLTra-Bench: The Swiss Legal Translation Benchmark | Joel Niklaus et.al. | 2503.01372 | null |
| 2025-03-03 | LLM-Advisor: An LLM Benchmark for Cost-efficient Path Planning across Multiple Terrains | Ling Xiao et.al. | 2503.01236 | null |
| 2025-03-02 | FunBench: Benchmarking Fundus Reading Skills of MLLMs | Qijie Wei et.al. | 2503.00901 | null |
| 2025-03-02 | Towards Efficient Educational Chatbots: Benchmarking RAG Frameworks | Umar Ali Khan et.al. | 2503.00781 | null |
| 2025-04-12 | Evaluating Personalized Tool-Augmented LLMs from the Perspectives of Personalization and Proactivity | Yupu Hao et.al. | 2503.00771 | link |
| 2025-03-01 | U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack | Yunfan Gao et.al. | 2503.00353 | link |
| 2025-02-28 | Jawaher: A Multidialectal Dataset of Arabic Proverbs for LLM Benchmarking | Samar M. Magdy et.al. | 2503.00231 | null |
| 2025-02-28 | Consistency Evaluation of News Article Summaries Generated by Large (and Small) Language Models | Colleen Gilhuly et.al. | 2502.20647 | null |
| 2025-05-23 | Is Your Paper Being Reviewed by an LLM? Benchmarking AI Text Detection in Peer Review | Sungduk Yu et.al. | 2502.19614 | null |
| 2025-02-26 | Exploring Graph Tasks with Pure LLMs: A Comprehensive Benchmark and Investigation | Yuxiang Wang et.al. | 2502.18771 | link |
| 2025-02-23 | Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation | Simin Chen et.al. | 2502.17521 | link |
| 2025-05-23 | Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective | Chengyin Xu et.al. | 2502.17262 | null |
| 2025-02-24 | Detecting Benchmark Contamination Through Watermarking | Tom Sander et.al. | 2502.17259 | null |
| 2025-02-24 | Predicting Liquidity-Aware Bond Yields using Causal GANs and Deep Reinforcement Learning with LLM Evaluation | Jaskaran Singh Walia et.al. | 2502.17011 | null |
| 2025-02-24 | AlphaAgent: LLM-Driven Alpha Mining with Regularized Exploration to Counteract Alpha Decay | Ziyi Tang et.al. | 2502.16789 | link |
| 2025-01-30 | Retrieval Augmented Generation Based LLM Evaluation For Protocol State Machine Inference With Chain-of-Thought Reasoning | Youssef Maklad et.al. | 2502.15727 | null |
| 2025-03-10 | Prompt-to-Leaderboard | Evan Frick et.al. | 2502.14855 | link |
| 2025-03-28 | SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines | M-A-P Team et.al. | 2502.14739 | null |
| 2025-02-20 | SEA-HELM: Southeast Asian Holistic Evaluation of Language Models | Yosephine Susanto et.al. | 2502.14301 | null |
| 2025-02-20 | Transfer-Prompting: Enhancing Cross-Task Adaptation in Large Language Models via Dual-Stage Prompts Optimization | Yupeng Chang et.al. | 2502.14211 | link |
| 2025-02-19 | Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above | Nishant Balepur et.al. | 2502.14127 | null |
| 2025-02-19 | STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models | Narun Raman et.al. | 2502.13119 | null |
| 2025-02-18 | HPSS: Heuristic Prompting Strategy Search for LLM Evaluators | Bosi Wen et.al. | 2502.13031 | null |
| 2025-05-23 | None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks | Eva Sánchez Salido et.al. | 2502.12896 | null |
| 2025-04-08 | Safe at the Margins: A General Approach to Safety Alignment in Low-Resource English Languages -- A Singlish Case Study | Isaac Lim et.al. | 2502.12485 | null |
| 2025-02-17 | Deviation Ratings: A General, Clone-Invariant Rating Method | Luke Marris et.al. | 2502.11645 | null |
| 2025-02-21 | TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking | Shahriar Kabir Nahin et.al. | 2502.11187 | null |
| 2025-02-15 | Rule-Bottleneck Reinforcement Learning: Joint Explanation and Decision Optimization for Resource Allocation with Language Agents | Mauricio Tec et.al. | 2502.10732 | null |
| 2025-03-02 | An Empirical Analysis of Uncertainty in Large Language Model Evaluations | Qiujie Xie et.al. | 2502.10709 | link |
| 2025-02-25 | Accelerating Unbiased LLM Evaluation via Synthetic Feedback | Zhaoyi Zhou et.al. | 2502.10563 | link |
| 2025-02-14 | MathConstruct: Challenging LLM Reasoning with Constructive Proofs | Mislav Balunović et.al. | 2502.10197 | null |
| 2025-02-13 | Enhancing Jailbreak Attacks via Compliance-Refusal-Based Initialization | Amit Levi et.al. | 2502.09755 | null |
| 2025-02-13 | NestQuant: Nested Lattice Quantization for Matrix Products and LLMs | Semyon Savkin et.al. | 2502.09720 | null |
| 2025-02-12 | The Science of Evaluating Foundation Models | Jiayi Yuan et.al. | 2502.09670 | null |
| 2025-02-13 | Copilot Arena: A Platform for Code LLM Evaluation in the Wild | Wayne Chi et.al. | 2502.09328 | null |
| 2025-02-12 | Revisiting 3D LLM Benchmarks: Are We Really Testing 3D Capabilities? | Jiahe Jin et.al. | 2502.08503 | link |
| 2025-02-11 | Forget What You Know about LLMs Evaluations -- LLMs are Like a Chameleon | Nurit Cohen-Inger et.al. | 2502.07445 | link |
| 2025-02-10 | Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph Coloring | Alex Heyman et.al. | 2502.07087 | link |
| 2025-02-10 | Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models | Lujain Ibrahim et.al. | 2502.07077 | null |
| 2025-02-07 | LLM-Supported Natural Language to Bash Translation | Finnian Westenfelder et.al. | 2502.06858 | link |
| 2025-02-15 | Self-Supervised Prompt Optimization | Jinyu Xiang et.al. | 2502.06855 | link |
| 2025-02-10 | Resurrecting saturated LLM benchmarks with adversarial encoding | Igor Ivanov et.al. | 2502.06738 | null |
| 2025-02-10 | Automatic Evaluation of Healthcare LLMs Beyond Question-Answering | Anna Arias-Duart et.al. | 2502.06666 | null |
| 2025-02-10 | Unbiased Evaluation of Large Language Models from a Causal Perspective | Meilin Chen et.al. | 2502.06655 | null |
| 2025-02-10 | LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks | Xin Zhou et.al. | 2502.06215 | null |
| 2025-02-05 | Aero-LLM: A Distributed Framework for Secure UAV Communication and Intelligent Decision-Making | Balakrishnan Dharmalingam et.al. | 2502.05220 | null |
| 2025-02-06 | TruthFlow: Truthful LLM Generation via Representation Flow Correction | Hanyu Wang et.al. | 2502.04556 | null |
| 2025-02-05 | How do Humans and Language Models Reason About Creativity? A Comparative Analysis | Antonio Laverghetta Jr. et.al. | 2502.03253 | null |
| 2025-03-22 | On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation | Nghiem T. Diep et.al. | 2502.03029 | null |
| 2025-02-02 | LLM-Powered Benchmark Factory: Reliable, Generic, and Efficient | Peiwen Yuan et.al. | 2502.01683 | link |
| 2025-02-02 | HASSLE-free: A unified Framework for Sparse plus Low-Rank Matrix Decomposition for LLMs | Mehdi Makni et.al. | 2502.00899 | null |
| 2025-02-01 | DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks | Zhiliang Chen et.al. | 2502.00270 | link |
| 2025-01-30 | Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation | Muhammed Yusuf Kocyigit et.al. | 2501.18771 | null |
| 2025-01-31 | ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation | Minghua He et.al. | 2501.18460 | null |
| 2025-02-01 | LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering | Beiming Liu et.al. | 2501.17183 | null |
| 2025-03-18 | An LLM Benchmark for Addressee Recognition in Multi-modal Multi-party Dialogue | Koji Inoue et.al. | 2501.16643 | null |
| 2025-01-26 | HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI | Tidor-Vlad Pricope et.al. | 2501.15627 | null |
| 2025-01-23 | Question Answering on Patient Medical Records with Private Fine-Tuned LLMs | Sara Kothari et.al. | 2501.13687 | null |
| 2025-01-10 | CodEv: An Automated Grading Framework Leveraging Large Language Models for Consistent and Constructive Feedback | En-Qi Tseng et.al. | 2501.10421 | null |
| 2025-01-15 | Towards Multilingual LLM Evaluation for Baltic and Nordic languages: A study on Lithuanian History | Yevhen Kostiuk et.al. | 2501.09154 | null |
| 2025-01-13 | Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles | Samia Touileb et.al. | 2501.07718 | null |
| 2025-01-03 | FLAME: Financial Large-Language Model Assessment and Metrics Evaluation | Jiayu Guo et.al. | 2501.06211 | link |
| 2025-01-07 | MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems | Yannis Katsis et.al. | 2501.03468 | link |
| 2025-01-05 | Evaluating Large Language Models Against Human Annotators in Latent Content Analysis: Sentiment, Political Leaning, Emotional Intensity, and Sarcasm | Ljubisa Bojic et.al. | 2501.02532 | null |
| 2025-01-04 | LLMzSzŁ: a comprehensive LLM benchmark for Polish | Krzysztof Jassem et.al. | 2501.02266 | null |
| 2025-03-25 | VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM | Yuqian Yuan et.al. | 2501.00599 | link |
| 2025-01-04 | Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation | M. Ali Bayram et.al. | 2501.00593 | null |
| 2024-12-31 | Echoes in AI: Quantifying Lack of Plot Diversity in LLM Outputs | Weijia Xu et.al. | 2501.00273 | null |
| 2024-12-30 | EVOLVE: Emotion and Visual Output Learning via LLM Evaluation | Jordan Sinclair et.al. | 2412.20632 | null |
| 2024-12-24 | Muse: A Multimodal Conversational Recommendation Dataset with Scenario-Grounded User Profiles | Zihan Wang et.al. | 2412.18416 | null |
| 2024-12-24 | A Statistical Framework for Ranking LLM-Based Chatbots | Siavash Ameli et.al. | 2412.18407 | link |
| 2025-01-25 | DeepCRCEval: Revisiting the Evaluation of Code Review Comment Generation | Junyi Lu et.al. | 2412.18291 | null |
| 2024-12-23 | CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language Models | Ruibo Tu et.al. | 2412.17970 | link |
| 2025-01-02 | Baichuan4-Finance Technical Report | Hanyu Zhang et.al. | 2412.15270 | null |
| 2024-12-19 | ObjVariantEnsemble: Advancing Point Cloud LLM Evaluation in Challenging Scenes with Subtly Distinguished Objects | Qihang Cao et.al. | 2412.14837 | null |
| 2024-12-18 | AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge | Xiaobao Wu et.al. | 2412.13670 | link |
| 2025-02-16 | Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning | Eitan Wagner et.al. | 2412.13631 | null |
| 2025-02-17 | OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain | Shuting Wang et.al. | 2412.13018 | link |
| 2024-12-10 | How to Choose a Threshold for an Evaluation Metric for Large Language Models | Bhaskarjit Sarmah et.al. | 2412.12148 | null |
| 2024-12-15 | Dual Traits in Probabilistic Reasoning of Large Language Models | Shenxiong Li et.al. | 2412.11009 | link |
| 2024-12-30 | LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation | Eunsu Kim et.al. | 2412.10424 | null |
| 2024-12-13 | Cultural Evolution of Cooperation among LLM Agents | Aron Vallinder et.al. | 2412.10270 | null |
| 2024-12-12 | Towards Understanding the Robustness of LLM-based Evaluations under Perturbations | Manav Chaudhary et.al. | 2412.09269 | null |
| 2024-12-10 | BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities | Sahal Shaji Mullappilly et.al. | 2412.07769 | link |
| 2025-02-28 | PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models | Qian Zhang et.al. | 2412.06287 | link |
| 2024-12-02 | AI Benchmarks and Datasets for LLM Evaluation | Todor Ivanov et.al. | 2412.01020 | null |
| 2024-11-30 | Evaluating the Consistency of LLM Evaluators | Noah Lee et.al. | 2412.00543 | null |
| 2024-11-29 | MIMDE: Exploring the Use of Synthetic vs Human Data for Evaluating Multi-Insight Multi-Document Extraction Tasks | John Francis et.al. | 2411.19689 | null |
| 2024-11-29 | Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension Ability | Yujin Han et.al. | 2411.19456 | link |
| 2024-11-27 | Is my Meeting Summary Good? Estimating Quality with a Multi-LLM Evaluator | Frederic Kirstein et.al. | 2411.18444 | null |
| 2025-01-17 | CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity | Zhengmin Yu et.al. | 2411.16239 | link |
| 2024-11-25 | SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text | Reshmi Ghosh et.al. | 2411.16077 | null |
| 2024-11-26 | Do LLMs Agree on the Creativity Evaluation of Alternative Uses? | Abdullah Al Rabeyah et.al. | 2411.15560 | null |
| 2025-02-17 | Ranking Unraveled: Recipes for LLM Rankings in Head-to-Head AI Combat | Roland Daynauth et.al. | 2411.14483 | link |
| 2024-11-21 | Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models | Lovish Madaan et.al. | 2411.14103 | null |
| 2024-11-21 | An Evaluation-Driven Approach to Designing LLM Agents: Process and Architecture | Boming Xia et.al. | 2411.13768 | null |
| 2024-11-21 | A Framework for Evaluating LLMs Under Task Indeterminacy | Luke Guerdan et.al. | 2411.13760 | null |
| 2024-11-12 | Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning | Linyang He et.al. | 2411.07533 | null |
| 2024-11-13 | Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models | Yancheng He et.al. | 2411.07140 | null |
| 2024-11-09 | Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models | Xiaojun Wu et.al. | 2411.06272 | link |
| 2025-02-09 | ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding | Israel Abebe Azime et.al. | 2411.05049 | null |
| 2024-11-07 | Bayesian Calibration of Win Rate Estimation with LLM Evaluators | Yicheng Gao et.al. | 2411.04424 | link |
| 2024-11-05 | Enhancing LLM Evaluations: The Garbling Trick | William F. Bradley et.al. | 2411.01533 | null |
| 2025-02-19 | Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models | Seonil Son et.al. | 2411.01281 | null |
| 2025-02-07 | Mastering the Craft of Data Synthesis for CodeLLMs | Meng Chen et.al. | 2411.00005 | link |
| 2024-10-28 | Project MPG: towards a generalized performance benchmark for LLM capabilities | Lucas Spangher et.al. | 2410.22368 | null |
| 2024-10-29 | Self-Preference Bias in LLM-as-a-Judge | Koki Wataoka et.al. | 2410.21819 | null |
| 2024-10-28 | Unveiling Context-Aware Criteria in Self-Assessing LLMs | Taneesh Gupta et.al. | 2410.21545 | null |
| 2024-10-27 | LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization | Jui-Nan Yen et.al. | 2410.20625 | link |
| 2024-10-26 | Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in Expert Knowledge Tasks | Annalisa Szymanski et.al. | 2410.20266 | null |
| 2024-10-23 | MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning | Jingfan Zhang et.al. | 2410.18035 | null |
| 2025-02-21 | Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements | Isamu Isozaki et.al. | 2410.17141 | link |
| 2024-10-21 | CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution | Maosong Cao et.al. | 2410.16256 | link |
| 2025-01-26 | mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation | Nishat Raihan et.al. | 2410.15037 | link |
| 2024-10-19 | CAP: Data Contamination Detection via Consistency Amplification | Yi Zhao et.al. | 2410.15005 | null |
| 2024-10-18 | Enabling Scalable Evaluation of Bias Patterns in Medical LLMs | Hamed Fayyaz et.al. | 2410.14763 | link |
| 2024-11-06 | Diverging Preferences: When do Annotators Disagree and do Models Know? | Michael JQ Zhang et.al. | 2410.14632 | null |
| 2024-10-18 | Combining Entropy and Matrix Nuclear Norm for Enhanced Evaluation of Language Models | James Vo et.al. | 2410.14480 | null |
| 2024-10-21 | BenTo: Benchmark Task Reduction with In-Context Transferability | Hongyu Zhao et.al. | 2410.13804 | link |
| 2024-10-16 | BenchmarkCards: Large Language Model and Risk Reporting | Anna Sokol et.al. | 2410.12974 | null |
| 2025-02-01 | Language Model Preference Evaluation with Multiple Weak Evaluators | Zhengyu Hu et.al. | 2410.12869 | link |
| 2024-10-11 | Enterprise Benchmarks for Large Language Model Evaluation | Bing Zhang et.al. | 2410.12857 | link |
| 2024-10-16 | An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation | Junjie Chen et.al. | 2410.12265 | null |
| 2024-10-15 | Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers | Lorenzo Pacchiardi et.al. | 2410.11672 | link |
| 2024-10-15 | Black-box Uncertainty Quantification Method for LLM-as-a-Judge | Nico Wagner et.al. | 2410.11594 | null |
| 2024-10-14 | Jailbreak Instruction-Tuned LLMs via end-of-sentence MLP Re-weighting | Yifan Luo et.al. | 2410.10150 | null |
| 2024-12-13 | HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics | Jingxuan Fan et.al. | 2410.09988 | link |
| 2024-10-15 | LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models | Han Qiu et.al. | 2410.09962 | link |
| 2024-10-17 | Towards Multilingual LLM Evaluation for European Languages | Klaudia Thellmann et.al. | 2410.08928 | null |
| 2024-10-11 | Test-driven Software Experimentation with LASSO: an LLM Benchmarking Example | Marcus Kessel et.al. | 2410.08911 | null |
| 2024-10-10 | Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks | Mathis Pink et.al. | 2410.08133 | null |
| 2025-02-03 | COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act | Philipp Guldimann et.al. | 2410.07959 | link |
| 2024-11-06 | News Reporter: A Multi-lingual LLM Framework for Broadcast T.V News | Tarun Jain et.al. | 2410.07520 | null |
| 2024-10-09 | Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates | Xiaosen Zheng et.al. | 2410.07137 | link |
| 2024-10-09 | ReIFE: Re-evaluating Instruction-Following Evaluation | Yixin Liu et.al. | 2410.07069 | link |
| 2024-10-08 | Active Evaluation Acquisition for Efficient LLM Benchmarking | Yang Li et.al. | 2410.05952 | null |
| 2024-10-07 | TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles | Qingchen Yu et.al. | 2410.05262 | link |
| 2024-10-01 | Language Enhanced Model for Eye (LEME): An Open-Source Ophthalmology-Specific Large Language Model | Aidan Gilson et.al. | 2410.03740 | null |
| 2024-10-04 | TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation | Jonathan Cook et.al. | 2410.03608 | null |
| 2024-10-04 | Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores | Robert E. Blackwell et.al. | 2410.03492 | null |
| 2024-10-29 | AIME: AI System Optimization via Multiple LLM Evaluators | Bhrij Patel et.al. | 2410.03131 | null |
| 2024-10-02 | Comparing Criteria Development Across Domain Experts, Lay Users, and Models in Large Language Model Evaluation | Annalisa Szymanski et.al. | 2410.02054 | null |
| 2024-10-02 | Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models | Joseph Lee et.al. | 2410.01795 | link |
| 2024-10-03 | Extending Context Window of Large Language Models from a Distributional Perspective | Yingsheng Wu et.al. | 2410.01490 | link |
| 2024-10-02 | ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving | Yifan Qiao et.al. | 2410.01228 | null |
| 2024-10-01 | ViDAS: Vision-based Danger Assessment and Scoring | Pranav Gupta et.al. | 2410.00477 | null |
| 2024-10-01 | PclGPT: A Large Language Model for Patronizing and Condescending Language Detection | Hongbo Wang et.al. | 2410.00361 | link |
| 2024-11-26 | LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models | Haitao Li et.al. | 2409.20288 | link |
| 2024-09-29 | Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems | Xuyang Wu et.al. | 2409.19804 | link |
| 2024-10-19 | Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models | Xin Li et.al. | 2409.19667 | link |
| 2024-10-05 | IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation | Fan Lin et.al. | 2409.18892 | link |
| 2024-12-13 | A Character-Centric Creative Story Generation via Imagination | Kyeongman Park et.al. | 2409.16667 | null |
| 2024-09-25 | Judgment of Thoughts: Courtroom of the Binary Logical Reasoning in Large Language Models | Sungjune Park et.al. | 2409.16635 | null |
| 2024-12-18 | Kalahi: A handcrafted, grassroots cultural LLM evaluation suite for Filipino | Jann Railey Montalan et.al. | 2409.15380 | link |
| 2024-12-16 | MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators | Qingyu Lu et.al. | 2409.14335 | link |
| 2024-09-21 | ChemEval: A Comprehensive Multi-Level Chemical Evaluation for Large Language Models | Yuqing Huang et.al. | 2409.13989 | link |
| 2024-12-17 | AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs | Basel Mousi et.al. | 2409.11404 | null |
| 2024-10-02 | LLM-as-a-Judge & Reward Model: What They Can and Cannot Do | Guijin Son et.al. | 2409.11239 | null |
| 2024-12-08 | Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges | Vinay Samuel et.al. | 2409.09927 | link |
| 2024-09-13 | Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia | Fajri Koto et.al. | 2409.08564 | null |
| 2024-09-09 | Assessing SPARQL capabilities of Large Language Models | Lars-Peter Meyer et.al. | 2409.05925 | link |
| 2024-10-08 | LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs | Yuhao Wu et.al. | 2409.02076 | link |
| 2024-10-14 | Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation | Jasper Dekoninck et.al. | 2409.00696 | null |
| 2024-08-26 | Evaluating ChatGPT on Nuclear Domain-Specific Data | Muhammad Anwar et.al. | 2409.00090 | null |
| 2024-08-28 | LLMSecCode: Evaluating Large Language Models for Secure Coding | Anton Rydén et.al. | 2408.16100 | link |
| 2024-08-26 | LLM-3D Print: Large Language Models To Monitor and Control 3D Printing | Yayati Jadhav et.al. | 2408.14307 | null |
| 2024-08-26 | Epidemic Information Extraction for Event-Based Surveillance using Large Language Models | Sergio Consoli et.al. | 2408.14277 | null |
| 2024-10-04 | MobileQuant: Mobile-friendly Quantization for On-device Language Models | Fuwen Tan et.al. | 2408.13933 | link |
| 2024-08-23 | LalaEval: A Holistic Human Evaluation Framework for Domain-Specific Large Language Models | Chongyan Sun et.al. | 2408.13338 | null |
| 2024-08-23 | Open Llama2 Model for the Lithuanian Language | Artūras Nakvosas et.al. | 2408.12963 | null |
| 2024-08-23 | LIMP: Large Language Model Enhanced Intent-aware Mobility Prediction | Songwei Li et.al. | 2408.12832 | link |
| 2024-12-20 | Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts | Jiaqing Liu et.al. | 2408.09688 | null |
| 2024-08-20 | Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge | Ravi Raju et.al. | 2408.08808 | null |
| 2024-10-16 | The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation | Samee Arif et.al. | 2408.08688 | link |
| 2024-10-19 | Persona is a Double-edged Sword: Mitigating the Negative Impact of Role-playing Prompts in Zero-shot Reasoning Tasks | Junseok Kim et.al. | 2408.08631 | null |
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-07-23 | Yume: An Interactive World Generation Model | Xiaofeng Mao et.al. | 2507.17744 | null |
| 2025-07-23 | Flow Matching Meets Biology and Life Science: A Survey | Zihao Li et.al. | 2507.17731 | null |
| 2025-07-23 | BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems | Malsha Ashani Mahawatta Dona et.al. | 2507.17722 | null |
| 2025-07-23 | AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer | Danny D. Leybzon et.al. | 2507.17718 | null |
| 2025-07-23 | HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging | Taha Ceritli et.al. | 2507.17706 | null |
| 2025-07-23 | Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models | Changxin Tian et.al. | 2507.17702 | null |
| 2025-07-23 | Thinking Isn't an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations | Zhao Song et.al. | 2507.17699 | null |
| 2025-07-23 | Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks | Ilias Chatzistefanidis et.al. | 2507.17695 | null |
| 2025-07-23 | Simulating multiple human perspectives in socio-ecological systems using large language models | Yongchao Zeng et.al. | 2507.17680 | null |
| 2025-07-23 | See the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question Answering | Junjie Wang et.al. | 2507.17659 | null |
| 2025-07-23 | CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts | Olaf Dünkel et.al. | 2507.17651 | null |
| 2025-07-23 | Who Attacks, and Why? Using LLMs to Identify Negative Campaigning in 18M Tweets across 19 Countries | Victor Hartman et.al. | 2507.17636 | null |
| 2025-07-23 | A Hybrid Early-Exit Algorithm for Large Language Models Based on Space Alignment Decoding (SPADE) | Bowen Zheng et.al. | 2507.17618 | null |
| 2025-07-23 | CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning | Lingxiao Tang et.al. | 2507.17548 | null |
| 2025-07-23 | Anticipate, Simulate, Reason (ASR): A Comprehensive Generative AI Framework for Combating Messaging Scams | Xue Wen Tan et.al. | 2507.17543 | null |
| 2025-07-23 | AssertFlip: Reproducing Bugs via Inversion of LLM-Generated Passing Tests | Lara Khatib et.al. | 2507.17542 | null |
| 2025-07-23 | Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning | Xinyao Liu et.al. | 2507.17539 | null |
| 2025-07-23 | Enabling Cyber Security Education through Digital Twins and Generative AI | Vita Santa Barletta et.al. | 2507.17518 | null |
| 2025-07-23 | URPO: A Unified Reward & Policy Optimization Framework for Large Language Models | Songshuo Lu et.al. | 2507.17515 | null |
| 2025-07-23 | HOTA: Hamiltonian framework for Optimal Transport Advection | Nazar Buzun et.al. | 2507.17513 | null |
| 2025-07-23 | Unsupervised anomaly detection using Bayesian flow networks: application to brain FDG PET in the context of Alzheimer's disease | Hugues Roy et.al. | 2507.17486 | null |
| 2025-07-23 | An Uncertainty-Driven Adaptive Self-Alignment Framework for Large Language Models | Haoran Sun et.al. | 2507.17477 | null |
| 2025-07-23 | MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs | Alexander R. Fabbri et.al. | 2507.17476 | null |
| 2025-07-23 | BGM-HAN: A Hierarchical Attention Network for Accurate and Fair Decision Assessment on Semi-Structured Profiles | Junhua Liu et.al. | 2507.17472 | null |
| 2025-07-23 | ERMV: Editing 4D Robotic Multi-view images to enhance embodied agents | Chang Nie et.al. | 2507.17462 | null |
| 2025-07-23 | Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning | Situo Zhang et.al. | 2507.17448 | null |
| 2025-07-23 | Each to Their Own: Exploring the Optimal Embedding in RAG | Shiting Chen et.al. | 2507.17442 | null |
| 2025-07-23 | A Comprehensive Evaluation on Quantization Techniques for Large Language Models | Yutong Liu et.al. | 2507.17417 | null |
| 2025-07-23 | HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs | Zhaolin Cai et.al. | 2507.17394 | null |
| 2025-07-23 | Investigating Training Data Detection in AI Coders | Tianlin Li et.al. | 2507.17389 | null |
| 2025-07-23 | Confidence Calibration in Vision-Language-Action Models | Thomas P Zollo et.al. | 2507.17383 | null |
| 2025-07-23 | Language-Conditioned Open-Vocabulary Mobile Manipulation with Pretrained Models | Shen Tan et.al. | 2507.17379 | null |
| 2025-07-23 | DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning | Chuzhan Hao et.al. | 2507.17365 | null |
| 2025-07-23 | RoadBench: A Vision-Language Foundation Model and Benchmark for Road Damage Understanding | Xi Xiao et.al. | 2507.17353 | null |
| 2025-07-23 | CartoonAlive: Towards Expressive Live2D Modeling from Single Portraits | Chao He et.al. | 2507.17327 | null |
| 2025-07-23 | Application of Whisper in Clinical Practice: the Post-Stroke Speech Assessment during a Naming Task | Milena Davudova et.al. | 2507.17326 | null |
| 2025-07-23 | R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning | Zhuokun Chen et.al. | 2507.17307 | null |
| 2025-07-23 | A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model | Zhe Xu et.al. | 2507.17303 | null |
| 2025-07-23 | Exploring the Potential of LLMs for Serendipity Evaluation in Recommender Systems | Li Kang et.al. | 2507.17290 | null |
| 2025-07-23 | Triple X: A LLM-Based Multilingual Speech Recognition System for the INTERSPEECH2025 MLC-SLM Challenge | Miaomiao Gao et.al. | 2507.17288 | null |
| 2025-07-23 | Fully Automated SAM for Single-source Domain Generalization in Medical Image Segmentation | Huanli Zhuo et.al. | 2507.17281 | null |
| 2025-07-23 | Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance | Rishi Parekh et.al. | 2507.17273 | null |
| 2025-07-23 | Seed&Steer: Guiding Large Language Models with Compilable Prefix and Branch Signals for Unit Test Generation | Shuaiyu Zhou et.al. | 2507.17271 | null |
| 2025-07-23 | Understanding Prompt Programming Tasks and Questions | Jenny T. Liang et.al. | 2507.17264 | null |
| 2025-07-23 | Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs | Eyal German et.al. | 2507.17259 | null |
| 2025-07-23 | Agent Identity Evals: Measuring Agentic Identity | Elija Perrier et.al. | 2507.17257 | null |
| 2025-07-23 | Rethinking VAE: From Continuous to Discrete Representations Without Probabilistic Assumptions | Songxuan Shi et.al. | 2507.17255 | null |
| 2025-07-23 | R4ec: A Reasoning, Reflection, and Refinement Framework for Recommendation Systems | Hao Gu et.al. | 2507.17249 | null |
| 2025-07-23 | Perceptual Classifiers: Detecting Generative Images using Perceptual Features | Krishna Srikar Durbha et.al. | 2507.17240 | null |
| 2025-07-23 | MaskedCLIP: Bridging the Masked and CLIP Space for Semi-Supervised Medical Vision-Language Pre-training | Lei Zhu et.al. | 2507.17239 | null |
| 2025-07-23 | A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task | Mashiro Toyooka et.al. | 2507.17232 | null |
| 2025-07-23 | PIG-Nav: Key Insights for Pretrained Image Goal Navigation Models | Jiansong Wan et.al. | 2507.17220 | null |
| 2025-07-23 | The Pluralistic Moral Gap: Understanding Judgment and Value Differences between Humans and Large Language Models | Giuseppe Russo et.al. | 2507.17216 | null |
| 2025-07-23 | EFS: Evolutionary Factor Searching for Sparse Portfolio Optimization Using Large Language Models | Haochen Luo et.al. | 2507.17211 | null |
| 2025-07-23 | HypoChainer: A Collaborative System Combining LLMs and Knowledge Graphs for Hypothesis-Driven Scientific Discovery | Haoran Jiang et.al. | 2507.17209 | null |
| 2025-07-23 | Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation | Zixuan Wang et.al. | 2507.17204 | null |
| 2025-07-23 | DesignLab: Designing Slides Through Iterative Detection and Correction | Jooyeol Yun et.al. | 2507.17202 | null |
| 2025-07-23 | Vec2Face+ for Face Dataset Generation | Haiyu Wu et.al. | 2507.17192 | null |
| 2025-07-23 | LLM Meets the Sky: Heuristic Multi-Agent Reinforcement Learning for Secure Heterogeneous UAV Networks | Lijie Zheng et.al. | 2507.17188 | null |
| 2025-07-23 | SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs | Zhiqiang Liu et.al. | 2507.17178 | null |
| 2025-07-23 | Improving LLMs' Generalized Reasoning Abilities by Graph Problems | Qifan Zhang et.al. | 2507.17168 | null |
| 2025-07-23 | Can LLMs Write CI? A Study on Automatic Generation of GitHub Actions Configurations | Taher A. Ghaleb et.al. | 2507.17165 | null |
| 2025-07-23 | DOOMGAN:High-Fidelity Dynamic Identity Obfuscation Ocular Generative Morphing | Bharath Krishnamurthy et.al. | 2507.17158 | null |
| 2025-07-23 | UNICE: Training A Universal Image Contrast Enhancer | Ruodai Cui et.al. | 2507.17157 | null |
| 2025-07-23 | CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards | Cheng Liu et.al. | 2507.17147 | null |
| 2025-07-23 | SADA: Stability-guided Adaptive Diffusion Acceleration | Ting Jiang et.al. | 2507.17135 | null |
| 2025-07-23 | Resilient Multi-Agent Negotiation for Medical Supply Chains:Integrating LLMs and Blockchain for Transparent Coordination | Mariam ALMutairi et.al. | 2507.17134 | null |
| 2025-07-23 | BrownoutServe: SLO-Aware Inference Serving under Bursty Workloads for MoE-based LLMs | Jianmin Hu et.al. | 2507.17133 | null |
| 2025-07-23 | Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance | Yufei He et.al. | 2507.17131 | null |
| 2025-07-23 | BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving | Wanyi Zheng et.al. | 2507.17120 | null |
| 2025-07-23 | HySafe-AI: Hybrid Safety Architectural Analysis Framework for AI Systems: A Case Study | Mandar Pitale et.al. | 2507.17118 | null |
| 2025-07-23 | Probabilistic Graphical Models: A Concise Tutorial | Jacqueline Maasch et.al. | 2507.17116 | null |
| 2025-07-23 | Enhancing Transferability and Consistency in Cross-Domain Recommendations via Supervised Disentanglement | Yuhan Wang et.al. | 2507.17112 | null |
| 2025-07-23 | Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models | Andrii Balashov et.al. | 2507.17107 | null |
| 2025-07-22 | Risk In Context: Benchmarking Privacy Leakage of Foundation Models in Synthetic Tabular Data Generation | Jessup Byun et.al. | 2507.17066 | null |
| 2025-07-22 | Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems | Chengxuan Xia et.al. | 2507.17061 | null |
| 2025-07-22 | Toward Scalable Video Narration: A Training-free Approach Using Multimodal Large Language Models | Tz-Ying Wu et.al. | 2507.17050 | null |
| 2025-07-22 | Controllable Hybrid Captioner for Improved Long-form Video Understanding | Kuleen Sasse et.al. | 2507.17047 | null |
| 2025-07-22 | Write, Rank, or Rate: Comparing Methods for Studying Visualization Affordances | Chase Stokes et.al. | 2507.17024 | null |
| 2025-07-22 | Causal Graph Fuzzy LLMs: A First Introduction and Applications in Time Series Forecasting | Omid Orang et.al. | 2507.17016 | null |
| 2025-07-22 | Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge? | Arduin Findeis et.al. | 2507.17015 | null |
| 2025-07-22 | Multi-Label Classification with Generative AI Models in Healthcare: A Case Study of Suicidality and Risk Factors | Ming Huang et.al. | 2507.17009 | null |
| 2025-07-22 | Bringing Balance to Hand Shape Classification: Mitigating Data Imbalance Through Generative Models | Gaston Gustavo Rios et.al. | 2507.17008 | null |
| 2025-07-22 | PyG 2.0: Scalable Learning on Real World Graphs | Matthias Fey et.al. | 2507.16991 | null |
| 2025-07-22 | Obscured but Not Erased: Evaluating Nationality Bias in LLMs via Name-Based Bias Benchmarks | Giulio Pelosio et.al. | 2507.16989 | null |
| 2025-07-22 | Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain | Rishemjit Kaur et.al. | 2507.16974 | null |
| 2025-07-22 | LLM4MEA: Data-free Model Extraction Attacks on Sequential Recommenders via Large Language Models | Shilong Zhao et.al. | 2507.16969 | null |
| 2025-07-22 | Harnessing RLHF for Robust Unanswerability Recognition and Trustworthy Response Generation in LLMs | Shuyuan Lin et.al. | 2507.16951 | null |
| 2025-07-22 | AI-based Clinical Decision Support for Primary Care: A Real-World Study | Robert Korom et.al. | 2507.16947 | null |
| 2025-07-22 | AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation | Nima Fathi et.al. | 2507.16940 | null |
| 2025-07-22 | SiLQ: Simple Large Language Model Quantization-Aware Training | Steven K. Esser et.al. | 2507.16933 | null |
| 2025-07-22 | Stellar Mass-Dispersion Measure Correlations Constrain Baryonic Feedback in Fast Radio Burst Host Galaxies | Calvin Leung et.al. | 2507.16816 | null |
| 2025-07-22 | LingBench++: A Linguistically-Informed Benchmark and Reasoning Framework for Multi-Step and Cross-Cultural Inference with LLMs | Da-Chen Lian et.al. | 2507.16809 | null |
| 2025-07-22 | Rethinking LLM-Based RTL Code Optimization Via Timing Logic Metamorphosis | Zhihao Xu et.al. | 2507.16808 | null |
| 2025-07-23 | Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning | Yanjun Zheng et.al. | 2507.16802 | null |
| 2025-07-23 | Test-Time-Matching: Decouple Personality, Memory, and Linguistic Style in LLM-based Role-Playing Language Agent | Xiaoyu Zhan et.al. | 2507.16799 | null |
| 2025-07-22 | Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning | Helena Casademunt et.al. | 2507.16795 | null |
| 2025-07-22 | ChatChecker: A Framework for Dialogue System Testing and Evaluation Through Non-cooperative User Simulation | Roman Mayr et.al. | 2507.16792 | null |
| 2025-07-22 | Enhancing Domain Diversity in Synthetic Data Face Recognition with Dataset Fusion | Anjith George et.al. | 2507.16790 | null |
| 2025-07-22 | Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning | Hongyin Luo et.al. | 2507.16784 | null |
| 2025-07-22 | Cooling Matters: Benchmarking Large Language Models and Vision-Language Models on Liquid-Cooled Versus Air-Cooled H100 GPU Systems | Imran Latif et.al. | 2507.16781 | null |
| 2025-07-22 | When LLMs Copy to Think: Uncovering Copy-Guided Attacks in Reasoning LLMs | Yue Li et.al. | 2507.16773 | null |
| 2025-07-22 | WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding | Ran Wang et.al. | 2507.16768 | null |
| 2025-07-22 | Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support | Fangjian Lei et.al. | 2507.16754 | null |
| 2025-07-22 | CMP: A Composable Meta Prompt for SAM-Based Cross-Domain Few-Shot Segmentation | Shuai Chen et.al. | 2507.16753 | null |
| 2025-07-22 | Collaborative Inference and Learning between Edge SLMs and Cloud LLMs: A Survey of Algorithms, Execution, and Open Challenges | Senyao Li et.al. | 2507.16731 | null |
| 2025-07-22 | Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints | Zhenyun Yin et.al. | 2507.16727 | null |
| 2025-07-22 | Enhancing Remote Sensing Vision-Language Models Through MLLM and LLM-Based High-Quality Image-Text Dataset Generation | Yiguo He et.al. | 2507.16716 | null |
| 2025-07-22 | Advancing Risk and Quality Assurance: A RAG Chatbot for Improved Regulatory Compliance | Lars Hillebrand et.al. | 2507.16711 | null |
| 2025-07-22 | Biases in LLM-Generated Musical Taste Profiles for Recommendation | Bruno Sguerra et.al. | 2507.16708 | null |
| 2025-07-22 | FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation | Pingyi Fan et.al. | 2507.16696 | null |
| 2025-07-22 | Generating Search Explanations using Large Language Models | Arif Laksito et.al. | 2507.16692 | null |
| 2025-07-22 | PICACO: Pluralistic In-Context Value Alignment of LLMs via Total Correlation Optimization | Han Jiang et.al. | 2507.16679 | null |
| 2025-07-22 | Custom Algorithm-based Fault Tolerance for Attention Layers in Transformers | Vasileios Titopoulos et.al. | 2507.16676 | null |
| 2025-07-22 | Meta-Learning for Cold-Start Personalization in Prompt-Tuned LLMs | Yushang Zhao et.al. | 2507.16672 | null |
| 2025-07-22 | VulCoCo: A Simple Yet Effective Method for Detecting Vulnerable Code Clones | Tan Bui et.al. | 2507.16661 | null |
| 2025-07-22 | P-CoT: A Pedagogically-motivated Participatory Chain-of-Thought Prompting for Phonological Reasoning in LLMs | Dongjun Jang et.al. | 2507.16656 | null |
| 2025-07-22 | Towards Automated Regulatory Compliance Verification in Financial Auditing with Large Language Models | Armin Berger et.al. | 2507.16642 | null |
| 2025-07-22 | Step-Audio 2 Technical Report | Boyong Wu et.al. | 2507.16632 | null |
| 2025-07-22 | Automatic Fine-grained Segmentation-assisted Report Generation | Frederic Jonske et.al. | 2507.16623 | null |
| 2025-07-22 | On the Effectiveness of LLM-as-a-judge for Code Generation and Summarization | Giuseppe Crupi et.al. | 2507.16587 | null |
| 2025-07-22 | LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models | Ahmed Lekssays et.al. | 2507.16585 | null |
| 2025-07-22 | From Text to Actionable Intelligence: Automating STIX Entity and Relationship Extraction | Ahmed Lekssays et.al. | 2507.16576 | null |
| 2025-07-22 | Pixels to Principles: Probing Intuitive Physics Understanding in Multimodal Language Models | Mohamad Ballout et.al. | 2507.16572 | null |
| 2025-07-22 | TTMBA: Towards Text To Multiple Sources Binaural Audio Generation | Yuxuan He et.al. | 2507.16564 | null |
| 2025-07-22 | Exploring Gender Bias in Large Language Models: An In-depth Dive into the German Language | Kristin Gnadt et.al. | 2507.16557 | null |
| 2025-07-22 | Alternative Loss Function in Evaluation of Transformer Models | Jakub Michańków et.al. | 2507.16548 | null |
| 2025-07-22 | Learning Text Styles: A Study on Transfer, Attribution, and Verification | Zhiqiang Hu et.al. | 2507.16530 | null |
| 2025-07-22 | Spatial 3D-LLM: Exploring Spatial Awareness in 3D Vision-Language Models | Xiaoyan Wang et.al. | 2507.16524 | null |
| 2025-07-22 | C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning | Xiuwei Chen et.al. | 2507.16518 | null |
| 2025-07-22 | The Ever-Evolving Science Exam | Junying Wang et.al. | 2507.16514 | null |
| 2025-07-22 | Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications | Jean Lelong et.al. | 2507.16507 | null |
| 2025-07-22 | ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs | Zhenliang Zhang et.al. | 2507.16488 | null |
| 2025-07-22 | ACT: Bridging the Gap in Code Translation through Synthetic Data Generation & Adaptive Training | Shreya Saxena et.al. | 2507.16478 | null |
| 2025-07-22 | Learning Temporal Abstractions via Variational Homomorphisms in Option-Induced Abstract MDPs | Chang Li et.al. | 2507.16473 | null |
| 2025-07-22 | Towards Enforcing Company Policy Adherence in Agentic Workflows | Naama Zwerdling et.al. | 2507.16459 | null |
| 2025-07-22 | An approach to measuring the performance of Automatic Speech Recognition (ASR) models in the context of Large Language Model (LLM) powered applications | Sujith Pulikodan et.al. | 2507.16456 | null |
| 2025-07-22 | VGGT-Long: Chunk it, Loop it, Align it -- Pushing VGGT's Limits on Kilometer-scale Long RGB Sequences | Kai Deng et.al. | 2507.16443 | null |
| 2025-07-22 | Exploring Large Language Models for Analyzing and Improving Method Names in Scientific Code | Gunnar Larsen et.al. | 2507.16439 | null |
| 2025-07-22 | Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework | Hongyi Tang et.al. | 2507.16414 | null |
| 2025-07-22 | GG-BBQ: German Gender Bias Benchmark for Question Answering | Shalaka Satheesh et.al. | 2507.16410 | null |
| 2025-07-22 | Improving Code LLM Robustness to Prompt Perturbations via Layer-Aware Model Editing | Shuhan Liu et.al. | 2507.16407 | null |
| 2025-07-22 | Sparse-View 3D Reconstruction: Recent Advances and Open Challenges | Tanveer Younis et.al. | 2507.16406 | null |
| 2025-07-22 | LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning | Bo Hou et.al. | 2507.16395 | null |
| 2025-07-22 | Are Foundation Models All You Need for Zero-shot Face Presentation Attack Detection? | Lazaro Janier Gonzalez-Sole et.al. | 2507.16393 | null |
| 2025-07-22 | A general model for frictional contacts in colloidal systems | Kay Hofmann et.al. | 2507.16388 | null |
| 2025-07-22 | Application of LLM Guided Reinforcement Learning in Formation Control with Collision Avoidance | Chenhao Yao et.al. | 2507.16382 | null |
| 2025-07-22 | Depth Gives a False Sense of Privacy: LLM Internal States Inversion | Tian Dong et.al. | 2507.16372 | null |
| 2025-07-22 | One Polyp Identifies All: One-Shot Polyp Segmentation with SAM via Cascaded Priors and Iterative Prompt Evolution | Xinyu Mao et.al. | 2507.16337 | null |
| 2025-07-22 | Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny | Chuanhao Yan et.al. | 2507.16331 | null |
| 2025-07-22 | DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling | Boheng Li et.al. | 2507.16329 | null |
| 2025-07-22 | M-SpecGene: Generalized Foundation Model for RGBT Multispectral Vision | Kailai Zhou et.al. | 2507.16318 | null |
| 2025-07-22 | Perovskite-R1: A Domain-Specialized LLM for Intelligent Discovery of Precursor Additives and Experimental Design | Xin-De Wang et.al. | 2507.16307 | null |
| 2025-07-22 | Talking Like a Phisher: LLM-Based Attacks on Voice Phishing Classifiers | Wenhao Li et.al. | 2507.16291 | null |
| 2025-07-22 | Dens3R: A Foundation Model for 3D Geometry Prediction | Xianze Fang et.al. | 2507.16290 | null |
| 2025-07-22 | Time to Split: Exploring Data Splitting Strategies for Offline Evaluation of Sequential Recommenders | Danil Gusak et.al. | 2507.16289 | null |
| 2025-07-22 | Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition | Zefeng Qian et.al. | 2507.16287 | null |
| 2025-07-22 | Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training | Zixiao Huang et.al. | 2507.16274 | null |
| 2025-07-22 | Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction | Tianyun Zhong et.al. | 2507.16271 | null |
| 2025-07-22 | iShumei-Chinchunmei at SemEval-2025 Task 4: A balanced forgetting and retention multi-task framework using effective unlearning loss | Yujian Sun et.al. | 2507.16263 | null |
| 2025-07-22 | Edge-case Synthesis for Fisheye Object Detection: A Data-centric Perspective | Seunghyeon Kim et.al. | 2507.16254 | null |
| 2025-07-22 | Efficient RL for optimizing conversation level outcomes with an LLM-based tutor | Hyunji Nam et.al. | 2507.16252 | null |
| 2025-07-22 | eX-NIDS: A Framework for Explainable Network Intrusion Detection Leveraging Large Language Models | Paul R. B. Houssel et.al. | 2507.16241 | null |
| 2025-07-22 | Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling | Chao Zhou et.al. | 2507.16240 | null |
| 2025-07-22 | LLM-Enhanced Reranking for Complementary Product Recommendation | Zekun Xu et.al. | 2507.16237 | null |
| 2025-07-22 | Voice-based AI Agents: Filling the Economic Gaps in Digital Health Delivery | Bo Wen et.al. | 2507.16229 | null |
| 2025-07-22 | Distilled Large Language Model in Confidential Computing Environment for System-on-Chip Design | Dong Ben et.al. | 2507.16226 | null |
| 2025-07-22 | Towards Compute-Optimal Many-Shot In-Context Learning | Shahriar Golchin et.al. | 2507.16217 | null |
| 2025-07-22 | Advancing Visual Large Language Model for Multi-granular Versatile Perception | Wentao Xiang et.al. | 2507.16213 | null |
| 2025-07-22 | LOCOFY Large Design Models -- Design to code conversion solution | Sohaib Muhammad et.al. | 2507.16208 | null |
| 2025-07-22 | A Human-Centered Approach to Identifying Promises, Risks, & Challenges of Text-to-Image Generative AI in Radiology | Katelyn Morrison et.al. | 2507.16207 | null |
| 2025-07-22 | RealBench: Benchmarking Verilog Generation Models with Real-World IP Designs | Pengwei Jin et.al. | 2507.16200 | null |
| 2025-07-22 | WakenLLM: A Fine-Grained Benchmark for Evaluating LLM Reasoning Potential and Reasoning Process Stability | Zipeng Ling et.al. | 2507.16199 | null |
| 2025-07-22 | Do Large Language Models Have a Planning Theory of Mind? Evidence from MindGames: a Multi-Step Persuasion Task | Jared Moore et.al. | 2507.16196 | null |
| 2025-07-22 | Emergent Cognitive Convergence via Implementation: A Structured Loop Reflecting Four Theories of Mind (A Position Paper) | Myung Ho Kim et.al. | 2507.16184 | null |
| 2025-07-22 | LLM Data Selection and Utilization via Dynamic Bi-level Optimization | Yang Yu et.al. | 2507.16178 | null |
| 2025-07-22 | SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting | Shuhao Mei et.al. | 2507.16145 | null |
| 2025-07-22 | Disability Across Cultures: A Human-Centered Audit of Ableism in Western and Indic LLMs | Mahika Phutane et.al. | 2507.16130 | null |
| 2025-07-22 | Benchmarking LLM Privacy Recognition for Social Robot Decision Making | Dakota Sullivan et.al. | 2507.16124 | null |
| 2025-07-22 | PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation | Yaofang Liu et.al. | 2507.16116 | null |
| 2025-07-21 | Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization | Shengchao Liu et.al. | 2507.16110 | null |
| 2025-07-21 | Efficient Compositional Multi-tasking for On-device Large Language Models | Ondrej Bohdal et.al. | 2507.16083 | null |
| 2025-07-21 | The Prompt Makes the Person(a): A Systematic Evaluation of Sociodemographic Persona Prompting for Large Language Models | Marlene Lutz et.al. | 2507.16076 | null |
| 2025-07-21 | Deep Researcher with Test-Time Diffusion | Rujun Han et.al. | 2507.16075 | null |
| 2025-07-21 | Compositional Coordination for Multi-Robot Teams with Large Language Models | Zhehui Huang et.al. | 2507.16068 | null |
| 2025-07-21 | AI-Powered Commit Explorer (APCE) | Yousab Grees et.al. | 2507.16063 | null |
| 2025-07-21 | AutoMeet: a proof-of-concept study of genAI to automate meetings in automotive engineering | Simon Baeuerle et.al. | 2507.16054 | null |
| 2025-07-21 | Making REST APIs Agent-Ready: From OpenAPI to Model Context Protocol Servers for Tool-Augmented LLMs | Meriem Mastouri et.al. | 2507.16044 | null |
| 2025-07-21 | A Pilot Study on LLM-Based Agentic Translation from Android to iOS: Pitfalls and Insights | Zhili Zeng et.al. | 2507.16037 | null |
| 2025-07-21 | From Logic to Language: A Trust Index for Problem Solving with LLMs | Tehseen Rug et.al. | 2507.16028 | null |
| 2025-07-21 | AI, Expert or Peer? -- Examining the Impact of Perceived Feedback Source on Pre-Service Teachers Feedback Perception and Uptake | Lucas Jasper Jacobsen et.al. | 2507.16013 | null |
| 2025-07-21 | Diffusion Beats Autoregressive in Data-Constrained Settings | Mihir Prabhudesai et.al. | 2507.15857 | null |
| 2025-07-21 | Latent Denoising Makes Good Visual Tokenizers | Jiawei Yang et.al. | 2507.15856 | null |
| 2025-07-21 | Gemini 2.5 Pro Capable of Winning Gold at IMO 2025 | Yichen Huang et.al. | 2507.15855 | null |
| 2025-07-21 | The Other Mind: How Language Models Exhibit Human Temporal Cognition | Lingyu Li et.al. | 2507.15851 | null |
| 2025-07-21 | 3LM: Bridging Arabic, STEM, and Code through Benchmarking | Basma El Amel Boussaha et.al. | 2507.15850 | null |
| 2025-07-21 | The Impact of Language Mixing on Bilingual LLM Reasoning | Yihao Li et.al. | 2507.15849 | null |
| 2025-07-21 | FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs | Anh Nguyen et.al. | 2507.15839 | null |
| 2025-07-21 | Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation | Alessandro B. Melchiorre et.al. | 2507.15826 | null |
| 2025-07-21 | ACS: An interactive framework for conformal selection | Yu Gui et.al. | 2507.15825 | null |
| 2025-07-21 | Can Your Model Separate Yolks with a Water Bottle? Benchmarking Physical Commonsense Understanding in Video Generation Models | Enes Sanli et.al. | 2507.15824 | null |
| 2025-07-21 | Do AI models help produce verified bug fixes? | Li Huang et.al. | 2507.15822 | null |
| 2025-07-21 | LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra | Seth Karten et.al. | 2507.15815 | null |
| 2025-07-21 | Diffusion models for multivariate subsurface generation and efficient probabilistic inversion | Roberto Miele et.al. | 2507.15809 | null |
| 2025-07-21 | True Multimodal In-Context Learning Needs Attention to the Visual Context | Shuo Chen et.al. | 2507.15807 | null |
| 2025-07-21 | ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction | Danhui Chen et.al. | 2507.15803 | null |
| 2025-07-21 | Regularized Low-Rank Adaptation for Few-Shot Organ Segmentation | Ghassen Baklouti et.al. | 2507.15793 | null |
| 2025-07-21 | Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning | Sneheel Sarangi et.al. | 2507.15788 | null |
| 2025-07-21 | Reservoir Computing as a Language Model | Felix Köster et.al. | 2507.15779 | null |
| 2025-07-21 | Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR | Jiakang Wang et.al. | 2507.15778 | null |
| 2025-07-21 | Left Leaning Models: AI Assumptions on Economic Policy | Maxim Chupilkin et.al. | 2507.15771 | null |
| 2025-07-21 | A Framework for Analyzing Abnormal Emergence in Service Ecosystems Through LLM-based Agent Intention Mining | Yifan Shen et.al. | 2507.15770 | null |
| 2025-07-21 | GasAgent: A Multi-Agent Framework for Automated Gas Optimization in Smart Contracts | Jingyi Zheng et.al. | 2507.15761 | null |
| 2025-07-21 | Understanding Large Language Models' Ability on Interdisciplinary Research | Yuanhao Shen et.al. | 2507.15736 | null |
| 2025-07-21 | Gaze-supported Large Language Model Framework for Bi-directional Human-Robot Interaction | Jens V. Rüppel et.al. | 2507.15729 | null |
| 2025-07-21 | TokensGen: Harnessing Condensed Tokens for Long Video Generation | Wenqi Ouyang et.al. | 2507.15728 | null |
| 2025-07-21 | A Practical Investigation of Spatially-Controlled Image Generation with Transformers | Guoxuan Xia et.al. | 2507.15724 | null |
| 2025-07-21 | BEnchmarking LLMs for Ophthalmology (BELO) for Ophthalmological Knowledge and Reasoning | Sahana Srinivasan et.al. | 2507.15717 | null |
| 2025-07-21 | Chinchunmei at SemEval-2025 Task 11: Boosting the Large Language Model's Capability of Emotion Perception using Contrastive Learning | Tian Li et.al. | 2507.15714 | null |
| 2025-07-21 | Is Large Language Model Performance on Reasoning Tasks Impacted by Different Ways Questions Are Asked? | Seok Hwan Song et.al. | 2507.15707 | null |
| 2025-07-21 | Estimating Rate-Distortion Functions Using the Energy-Based Model | Shitong Wu et.al. | 2507.15700 | null |
| 2025-07-21 | CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models | Congmin Zheng et.al. | 2507.15698 | null |
| 2025-07-21 | Surfacing Variations to Calibrate Perceived Reliability of MLLM-generated Image Descriptions | Meng Chen et.al. | 2507.15692 | null |
| 2025-07-21 | P3: Prompts Promote Prompting | Xinyu Zhang et.al. | 2507.15675 | null |
| 2025-07-21 | BugScope: Learn to Find Bugs Like Human | Jinyao Guo et.al. | 2507.15671 | null |
| 2025-07-21 | VeriRAG: A Retrieval-Augmented Framework for Automated RTL Testability Repair | Haomin Qi et.al. | 2507.15664 | null |
| 2025-07-21 | SustainDiffusion: Optimising the Social and Environmental Sustainability of Stable Diffusion Models | Giordano d'Aloisio et.al. | 2507.15663 | null |
| 2025-07-21 | HW-MLVQA: Elucidating Multilingual Handwritten Document Understanding with a Comprehensive VQA Benchmark | Aniket Pal et.al. | 2507.15655 | null |
| 2025-07-21 | Extracting Visual Facts from Intermediate Layers for Mitigating Hallucinations in Multimodal Large Language Models | Haoran Zhou et.al. | 2507.15652 | null |
| 2025-07-21 | Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training | Kailai Yang et.al. | 2507.15640 | null |
| 2025-07-21 | DHEvo: Data-Algorithm Based Heuristic Evolution for Generalizable MILP Solving | Zhihao Zhang et.al. | 2507.15615 | null |
| 2025-07-21 | Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems | Andrii Balashov et.al. | 2507.15613 | null |
| 2025-07-21 | CylinderPlane: Nested Cylinder Representation for 3D-aware Image Generation | Ru Jia et.al. | 2507.15606 | null |
| 2025-07-21 | Applying the Chinese Wall Reverse Engineering Technique to Large Language Model Code Editing | Manatsawin Hanmongkolchai et.al. | 2507.15599 | null |
| 2025-07-21 | Learning to Extract Rational Evidence via Reinforcement Learning for Retrieval-Augmented Generation | Xinping Zhao et.al. | 2507.15586 | null |
| 2025-07-21 | DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding | Xiaoyi Bao et.al. | 2507.15569 | null |
| 2025-07-21 | Evaluating Text Style Transfer: A Nine-Language Benchmark for Text Detoxification | Vitaly Protasov et.al. | 2507.15557 | null |
| 2025-07-21 | Efficient Routing of Inference Requests across LLM Instances in Cloud-Edge Computing | Shibo Yu et.al. | 2507.15553 | null |
| 2025-07-21 | RankMixer: Scaling Up Ranking Models in Industrial Recommenders | Jie Zhu et.al. | 2507.15551 | null |
| 2025-07-21 | PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors | Yimeng Chen et.al. | 2507.15550 | null |
| 2025-07-21 | LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning | Cole Robertson et.al. | 2507.15521 | null |
| 2025-07-21 | HAMLET: Hyperadaptive Agent-based Modeling for Live Embodied Theatrics | Sizhou Chen et.al. | 2507.15518 | null |
| 2025-07-21 | Step-level Verifier-guided Hybrid Test-Time Scaling for Large Language Models | Kaiyan Chang et.al. | 2507.15512 | null |
| 2025-07-21 | ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution | Alexandru Coca et.al. | 2507.15501 | null |
| 2025-07-21 | PhishIntentionLLM: Uncovering Phishing Website Intentions through Multi-Agent Retrieval-Augmented Generation | Wenhao Li et.al. | 2507.15419 | null |
| 2025-07-21 | PDEformer-2: A Versatile Foundation Model for Two-Dimensional Partial Differential Equations | Zhanhong Ye et.al. | 2507.15409 | null |
| 2025-07-21 | PiMRef: Detecting and Explaining Ever-evolving Spear Phishing Emails with Knowledge Base Invariants | Ruofan Liu et.al. | 2507.15393 | null |
| 2025-07-21 | DAViD: Data-efficient and Accurate Vision Models from Synthetic Data | Fatemeh Saleh et.al. | 2507.15365 | null |
| 2025-07-21 | Revisiting the Effect of Grid-Following Converter on Frequency Dynamics -- Part I: Center of Inertia | Jiahao Liu et.al. | 2507.15358 | null |
| 2025-07-21 | Metaphor and Large Language Models: When Surface Features Matter More than Deep Understanding | Elisa Sanchez-Bayona et.al. | 2507.15357 | null |
| 2025-07-21 | RAD: Retrieval High-quality Demonstrations to Enhance Decision-making | Lu Guo et.al. | 2507.15356 | null |
| 2025-07-21 | Scaling Decentralized Learning with FLock | Zehua Cheng et.al. | 2507.15349 | null |
| 2025-07-21 | Probing Information Distribution in Transformer Architectures through Entropy Analysis | Amedeo Buonanno et.al. | 2507.15347 | null |
| 2025-07-21 | StackTrans: From Large Language Model to Large Pushdown Automata Model | Kechi Zhang et.al. | 2507.15343 | null |
| 2025-07-21 | Reasoning Models are Test Exploiters: Rethinking Multiple-Choice | Narun Raman et.al. | 2507.15337 | null |
| 2025-07-21 | On the Inevitability of Left-Leaning Political Bias in Aligned Language Models | Thilo Hagendorff et.al. | 2507.15328 | null |
| 2025-07-21 | BenchDepth: Are We on the Right Way to Evaluate Depth Foundation Models? | Zhenyu Li et.al. | 2507.15321 | null |
| 2025-07-21 | Butterfly Effects in Toolchains: A Comprehensive Analysis of Failed Parameter Filling in LLM Tool-Agent Systems | Qian Xiong et.al. | 2507.15296 | null |
| 2025-07-21 | A Novel Self-Evolution Framework for Large Language Models | Haoran Sun et.al. | 2507.15281 | null |
| 2025-07-21 | ChiMed 2.0: Advancing Chinese Medical Dataset in Facilitating Large Language Modeling | Yuanhe Tian et.al. | 2507.15275 | null |
| 2025-07-21 | Conditional Video Generation for High-Efficiency Video Compression | Fangqiu Yi et.al. | 2507.15269 | null |
| 2025-07-21 | IM-Chat: A Multi-agent LLM-based Framework for Knowledge Transfer in Injection Molding Industry | Junhyeong Lee et.al. | 2507.15268 | null |
| 2025-07-21 | VLM-UDMC: VLM-Enhanced Unified Decision-Making and Motion Control for Urban Autonomous Driving | Haichao Liu et.al. | 2507.15266 | null |
| 2025-07-21 | CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers | Jiaqi Han et.al. | 2507.15260 | null |
| 2025-07-21 | MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations | Deyun Zhang et.al. | 2507.15255 | null |
| 2025-07-21 | Input Reduction Enhanced LLM-based Program Repair | Boyang Yang et.al. | 2507.15251 | null |
| 2025-07-21 | FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers | Yanbing Zhang et.al. | 2507.15249 | null |
| 2025-07-21 | SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search | Xiaofeng Shi et.al. | 2507.15245 | null |
| 2025-07-21 | Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse Autoencoders | Krishna Kanth Nakka et.al. | 2507.15227 | null |
| 2025-07-21 | Solving Formal Math Problems by Decomposition and Iterative Reflection | Yichi Zhou et.al. | 2507.15225 | null |
| 2025-07-21 | SimdBench: Benchmarking Large Language Models for SIMD-Intrinsic Code Generation | Yibo He et.al. | 2507.15224 | null |
| 2025-07-21 | Hierarchical Part-based Generative Model for Realistic 3D Blood Vessel | Siqi Chen et.al. | 2507.15223 | null |
| 2025-07-21 | Improving Joint Embedding Predictive Architecture with Diffusion Noise | Yuping Qiu et.al. | 2507.15216 | null |
| 2025-07-21 | Collaborative Distillation Strategies for Parameter-Efficient Language Model Deployment | Xiandong Meng et.al. | 2507.15198 | null |
| 2025-07-21 | Better Models and Algorithms for Learning Ising Models from Dynamics | Jason Gaitonde et.al. | 2507.15173 | null |
| 2025-07-20 | What Level of Automation is "Good Enough"? A Benchmark of Large Language Models for Meta-Analysis Data Extraction | Lingbo Li et.al. | 2507.15152 | null |
| 2025-07-20 | Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction | Ce Zhang et.al. | 2507.15130 | null |
| 2025-07-20 | AnalogFed: Federated Discovery of Analog Circuit Topologies with Generative AI | Qiufeng Li et.al. | 2507.15104 | null |
| 2025-07-20 | Filling the Gap: Is Commonsense Knowledge Generation useful for Natural Language Inference? | Chathuri Jayaweera et.al. | 2507.15100 | null |
| 2025-07-20 | BleedOrigin: Dynamic Bleeding Source Localization in Endoscopic Submucosal Dissection via Dual-Stage Detection and Tracking | Mengya Xu et.al. | 2507.15094 | null |
| 2025-07-20 | A Penalty Goes a Long Way: Measuring Lexical Diversity in Synthetic Texts Under Prompt-Influenced Length Variations | Vijeta Deshpande et.al. | 2507.15092 | null |
| 2025-07-20 | Aesthetics is Cheap, Show me the Text: An Empirical Evaluation of State-of-the-Art Generative Models for OCR | Peirong Zhang et.al. | 2507.15085 | null |
| 2025-07-20 | Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback | Yiyuan Yang et.al. | 2507.15066 | null |
| 2025-07-20 | WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization | Zhengwei Tao et.al. | 2507.15061 | null |
| 2025-07-20 | LibLMFuzz: LLM-Augmented Fuzz Target Generation for Black-box Libraries | Ian Hardgrove et.al. | 2507.15058 | null |
| 2025-07-20 | Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding | Yuanhan Zhang et.al. | 2507.15028 | null |
| 2025-07-20 | Deep Generative Models in Condition and Structural Health Monitoring: Opportunities, Limitations and Future Outlook | Xin Yang et.al. | 2507.15026 | null |
| 2025-07-20 | Survey of GenAI for Automotive Software Development: From Requirements to Executable Code | Nenad Petrovic et.al. | 2507.15025 | null |
| 2025-07-20 | RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback | Qiaoyu Tang et.al. | 2507.15024 | null |
| 2025-07-20 | EduThink4AI: Translating Educational Critical Thinking into Multi-Agent LLM Systems | Xinmeng Hou et.al. | 2507.15015 | null |
| 2025-07-20 | Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression | Roy H. Jennings et.al. | 2507.14997 | null |
| 2025-07-18 | Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning | Shashanka Venkataramanan et.al. | 2507.14137 | null |
| 2025-07-18 | NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining | Maksim Kuprashevich et.al. | 2507.14119 | null |
| 2025-07-18 | CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning | Xiaoya Li et.al. | 2507.14111 | null |
| 2025-07-18 | Automated Interpretation of Non-Destructive Evaluation Contour Maps Using Large Language Models for Bridge Condition Assessment | Viraj Nishesh Darji et.al. | 2507.14107 | null |
| 2025-07-18 | Generative AI-Driven High-Fidelity Human Motion Simulation | Hari Iyer et.al. | 2507.14097 | null |
| 2025-07-18 | Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track | Brian Ondov et.al. | 2507.14096 | null |
| 2025-07-18 | DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration | Xiyun Li et.al. | 2507.14088 | null |
| 2025-07-18 | DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits | Garapati Keerthana et.al. | 2507.14079 | null |
| 2025-07-18 | Foundation Models as Class-Incremental Learners for Dermatological Image Classification | Mohamed Elkhayat et.al. | 2507.14050 | null |
| 2025-07-18 | Evaluating the Effectiveness of Cost-Efficient Large Language Models in Benchmark Biomedical Tasks | Israt Jahan et.al. | 2507.14045 | null |
| 2025-07-18 | TGIF: Talker Group-Informed Familiarization of Target Speaker Extraction | Tsun-An Hsieh et.al. | 2507.14044 | null |
| 2025-07-18 | Architecting Human-AI Cocreation for Technical Services -- Interaction Modes and Contingency Factors | Jochen Wulf et.al. | 2507.14034 | null |
| 2025-07-18 | KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models | Lam Nguyen et.al. | 2507.14032 | null |
| 2025-07-18 | Moodifier: MLLM-Enhanced Emotion-Driven Image Editing | Jiarong Ye et.al. | 2507.14024 | null |
| 2025-07-18 | Efficient Temporal Tokenization for Mobility Prediction with Large Language Models | Haoyu He et.al. | 2507.14017 | null |
| 2025-07-18 | Leveraging Pathology Foundation Models for Panoptic Segmentation of Melanoma in H&E Images | Jiaqi Lv et.al. | 2507.13974 | null |
| 2025-07-18 | DUALRec: A Hybrid Sequential and Language Model Framework for Context-Aware Movie Recommendation | Yitong Li et.al. | 2507.13957 | null |
| 2025-07-18 | Cross-modal Causal Intervention for Alzheimer's Disease Prediction | Yutao Jin et.al. | 2507.13956 | null |
| 2025-07-18 | Exploiting Primacy Effect To Improve Large Language Models | Bianca Raimondi et.al. | 2507.13949 | null |
| 2025-07-18 | Generalist Forecasting with Frozen Video Models via Latent Diffusion | Jacob C Walker et.al. | 2507.13942 | null |
| 2025-07-18 | Preprint: Did I Just Browse A Website Written by LLMs? | Sichang "Steven" He et.al. | 2507.13933 | null |
| 2025-07-18 | Enhancing LiDAR Point Features with Foundation Model Priors for 3D Object Detection | Yujian Mo et.al. | 2507.13899 | null |
| 2025-07-18 | Using LLMs to identify features of personal and professional skills in an open-response situational judgment test | Cole Walsh et.al. | 2507.13881 | null |
| 2025-07-18 | Large Language Models as Innovators: A Framework to Leverage Latent Space Exploration for Novelty Discovery | Mateusz Bystroński et.al. | 2507.13874 | null |
| 2025-07-18 | SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection | Aleksandr Gashkov et.al. | 2507.13859 | null |
| 2025-07-18 | InTraVisTo: Inside Transformer Visualisation Tool | Nicolò Brunello et.al. | 2507.13858 | null |
| 2025-07-18 | DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training | Zhixin Wang et.al. | 2507.13833 | null |
| 2025-07-18 | Question-Answer Extraction from Scientific Articles Using Knowledge Graphs and Large Language Models | Hosein Azarbonyad et.al. | 2507.13827 | null |
| 2025-07-18 | RAG-based Architectures for Drug Side Effect Retrieval in LLMs | Shad Nygren et.al. | 2507.13822 | null |
| 2025-07-18 | Team of One: Cracking Complex Video QA with Model Synergy | Jun Xie et.al. | 2507.13820 | null |
| 2025-07-18 | CodeEdu: A Multi-Agent Collaborative Platform for Personalized Coding Education | Jianing Zhao et.al. | 2507.13814 | null |
| 2025-07-18 | SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing | Yingying Zhang et.al. | 2507.13812 | null |
| 2025-07-18 | On-the-Fly Fine-Tuning of Foundational Neural Network Potentials: A Bayesian Neural Network Approach | Tim Rensmeyer et.al. | 2507.13805 | null |
| 2025-07-18 | MolPIF: A Parameter Interpolation Flow Model for Molecule Generation | Yaowei Jin et.al. | 2507.13762 | null |
| 2025-07-18 | PRIDE -- Parameter-Efficient Reduction of Identity Discrimination for Equality in LLMs | Maluna Menke et.al. | 2507.13743 | null |
| 2025-07-18 | Can Synthetic Images Conquer Forgetting? Beyond Unexplored Doubts in Few-Shot Class-Incremental Learning | Junsu Kim et.al. | 2507.13739 | null |
| 2025-07-18 | DailyLLM: Context-Aware Activity Log Generation Using Multi-Modal Sensors and LLMs | Ye Tian et.al. | 2507.13737 | null |
| 2025-07-18 | The Judge Variable: Challenging Judge-Agnostic Legal Judgment Prediction | Guillaume Zambrano et.al. | 2507.13732 | null |
| 2025-07-18 | LLaPipe: LLM-Guided Reinforcement Learning for Automated Data Preparation Pipeline Construction | Jing Chang et.al. | 2507.13712 | null |
| 2025-07-18 | CogniQ-H: A Soft Hierarchical Reinforcement Learning Paradigm for Automated Data Preparation | Jing Chang et.al. | 2507.13710 | null |
| 2025-07-18 | Consistent Explainers or Unreliable Narrators? Understanding LLM-generated Group Recommendations | Cedric Waterschoot et.al. | 2507.13705 | null |
| 2025-07-18 | TopicAttack: An Indirect Prompt Injection Attack via Topic Transition | Yulin Chen et.al. | 2507.13686 | null |
| 2025-07-18 | LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues | Haoyang Li et.al. | 2507.13681 | null |
| 2025-07-18 | KiC: Keyword-inspired Cascade for Cost-Efficient Text Generation with LLMs | Woo-Chan Kim et.al. | 2507.13666 | null |
| 2025-07-18 | CU-ICU: Customizing Unsupervised Instruction-Finetuned Language Models for ICU Datasets via Text-to-Text Transfer Transformer | Teerapong Panboonyuen et.al. | 2507.13655 | null |
| 2025-07-18 | Towards channel foundation models (CFMs): Motivations, methodologies and opportunities | Jun Jiang et.al. | 2507.13637 | null |
| 2025-07-18 | Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques | Niveen O. Jaffal et.al. | 2507.13629 | null |
| 2025-07-18 | BifrostRAG: Bridging Dual Knowledge Graphs for Multi-Hop Question Answering in Construction Safety | Yuxin Zhang et.al. | 2507.13625 | null |
| 2025-07-18 | Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters | Shanbo Cheng et.al. | 2507.13618 | null |
| 2025-07-18 | Linguistic and Embedding-Based Profiling of Texts generated by Humans and Large Language Models | Sergio E. Zanotto et.al. | 2507.13614 | null |
| 2025-07-18 | CoTasks: Chain-of-Thought based Video Instruction Tuning Tasks | Yanan Wang et.al. | 2507.13609 | null |
| 2025-07-18 | GIFT: Gradient-aware Immunization of diffusion models against malicious Fine-Tuning with safe concepts retention | Amro Abdalla et.al. | 2507.13598 | null |
| 2025-07-17 | A Collaborative Framework Integrating Large Language Model and Chemical Fragment Space: Mutual Inspiration for Lead Design | Hao Tuo et.al. | 2507.13580 | null |
| 2025-07-17 | Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries | Hyunji Nam et.al. | 2507.13579 | null |
| 2025-07-17 | LLM-Based Community Surveys for Operational Decision Making in Interconnected Utility Infrastructures | Adaeze Okeukwu-Ogbonnaya et.al. | 2507.13577 | null |
| 2025-07-17 | Apple Intelligence Foundation Language Models: Tech Report 2025 | Hanzhi Zhou et.al. | 2507.13575 | null |
| 2025-07-17 | Temporal Adaptation of Pre-trained Foundation Models for Music Structure Analysis | Yixiao Zhang et.al. | 2507.13572 | null |
| 2025-07-17 | A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models | Kirill Borodin et.al. | 2507.13563 | null |
| 2025-07-17 | Demystifying Feature Requests: Leveraging LLMs to Refine Feature Requests in Open-Source Software | Pragyan K C et.al. | 2507.13555 | null |
| 2025-07-17 | GOFAI meets Generative AI: Development of Expert Systems by means of Large Language Models | Eduardo C. Garrido-Merchán et.al. | 2507.13550 | null |
| 2025-07-17 | A Computational Approach to Modeling Conversational Systems: Analyzing Large-Scale Quasi-Patterned Dialogue Flows | Mohamed Achref Ben Ammar et.al. | 2507.13544 | null |
| 2025-07-17 | Provable Low-Frequency Bias of In-Context Learning of Representations | Yongyi Yang et.al. | 2507.13540 | null |
| 2025-07-17 | Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation | Genki Kusano et.al. | 2507.13525 | null |
| 2025-07-17 | Humans learn to prefer trustworthy AI over human partners | Yaomin Jiang et.al. | 2507.13524 | null |
| 2025-07-17 | GraphTrafficGPT: Enhancing Traffic Management Through Graph-Based AI Agent Coordination | Nabil Abdelaziz Ferhat Taleb et.al. | 2507.13511 | null |
| 2025-07-17 | Fake or Real: The Impostor Hunt in Texts for Space Operations | Agata Kaczmarek et.al. | 2507.13508 | null |
| 2025-07-17 | Revisiting LLM Value Probing Strategies: Are They Robust and Expressive? | Siqi Shen et.al. | 2507.13490 | null |
| 2025-07-17 | Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers | Liang Lin et.al. | 2507.13474 | null |
| 2025-07-17 | ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations | Shiye Cao et.al. | 2507.13468 | null |
| 2025-07-17 | "PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models | Jing Gu et.al. | 2507.13428 | null |
| 2025-07-17 | VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding | Shihao Wang et.al. | 2507.13353 | null |
| 2025-07-17 | Hierarchical Rectified Flow Matching with Mini-Batch Couplings | Yichi Zhang et.al. | 2507.13350 | null |
| 2025-07-17 | Imbalance in Balance: Online Concept Balancing in Generation Models | Yukai Shi et.al. | 2507.13345 | null |
| 2025-07-17 | Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes | Tyler Loakman et.al. | 2507.13335 | null |
| 2025-07-17 | A Survey of Context Engineering for Large Language Models | Lingrui Mei et.al. | 2507.13334 | null |
| 2025-07-17 | The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner | Zhouqi Hua et.al. | 2507.13332 | null |
| 2025-07-17 | GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM | Kyeongjin Ahn et.al. | 2507.13323 | null |
| 2025-07-17 | Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark | Junsu Kim et.al. | 2507.13314 | null |
| 2025-07-17 | The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations | Carlos Arriaga et.al. | 2507.13302 | null |
| 2025-07-17 | AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research | Yilun Zhao et.al. | 2507.13300 | null |
| 2025-07-17 | Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management | Luis Gasco et.al. | 2507.13275 | null |
| 2025-07-17 | Automating Steering for Safe Multimodal Large Language Models | Lyucheng Wu et.al. | 2507.13255 | null |
| 2025-07-17 | RemVerse: Supporting Reminiscence Activities for Older Adults through AI-Assisted Virtual Reality | Ruohao Li et.al. | 2507.13247 | null |
| 2025-07-17 | HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models | Ashray Gupta et.al. | 2507.13238 | null |
| 2025-07-17 | Enhancing Cross-task Transfer of Large Language Models via Activation Steering | Xinyu Tang et.al. | 2507.13236 | null |
| 2025-07-17 | VITA: Vision-to-Action Flow Matching Policy | Dechen Gao et.al. | 2507.13231 | null |
| 2025-07-18 | MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling | Etienne Le Naour et.al. | 2507.13207 | null |
| 2025-07-18 | Automatically assessing oral narratives of Afrikaans and isiXhosa children | Retief Louw et.al. | 2507.13205 | null |
| 2025-07-17 | Black Box Deployed -- Functional Criteria for Artificial Moral Agents in the LLM Era | Matthew E. Brophy et.al. | 2507.13175 | null |
| 2025-07-17 | SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks | Kutub Uddin et.al. | 2507.13170 | null |
| 2025-07-17 | Online Rounding for Set Cover under Subset Arrivals | Jarosław Byrka et.al. | 2507.13159 | null |
| 2025-07-17 | Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities | Hao Sun et.al. | 2507.13158 | null |
| 2025-07-17 | Multi-population GAN Training: Analyzing Co-Evolutionary Algorithms | Walter P. Casas et.al. | 2507.13157 | null |
| 2025-07-17 | SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models | Xiangyu Dong et.al. | 2507.13152 | null |
| 2025-07-17 | DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model | Maulana Bisyir Azhari et.al. | 2507.13145 | null |
| 2025-07-17 | RIDAS: A Multi-Agent Framework for AI-RAN with Representation- and Intention-Driven Agents | Kuiyuan Ding et.al. | 2507.13140 | null |
| 2025-07-17 | Detecting LLM-generated Code with Subtle Modification by Adversarial Training | Xin Yin et.al. | 2507.13123 | null |
| 2025-07-17 | A Computational Framework to Identify Self-Aspects in Text | Jaya Caporusso et.al. | 2507.13115 | null |
| 2025-07-17 | R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning | Xiaohan Guo et.al. | 2507.13107 | null |
| 2025-07-17 | Intelligent Virtual Sonographer (IVS): Enhancing Physician-Robot-Patient Communication | Tianyu Song et.al. | 2507.13052 | null |
| 2025-07-17 | MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems | Yu Cui et.al. | 2507.13038 | null |
| 2025-07-17 | Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities | Liuyi Wang et.al. | 2507.13019 | null |
| 2025-07-17 | Teach Old SAEs New Domain Tricks with Boosting | Nikita Koriagin et.al. | 2507.12990 | null |
| 2025-07-17 | A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints | Youssef Tawfilis et.al. | 2507.12979 | null |
| 2025-07-17 | UniSLU: Unified Spoken Language Understanding from Heterogeneous Cross-Task Datasets | Zhichao Sheng et.al. | 2507.12951 | null |
| 2025-07-17 | Insights into a radiology-specialised multimodal large language model with sparse autoencoders | Kenza Bouzid et.al. | 2507.12950 | null |
| 2025-07-17 | Probabilistic Soundness Guarantees in LLM Reasoning Chains | Weiqiu You et.al. | 2507.12948 | null |
| 2025-07-17 | Analysis of Image-and-Text Uncertainty Propagation in Multimodal Large Language Models with Cardiac MR-Based Applications | Yucheng Tang et.al. | 2507.12945 | null |
| 2025-07-17 | Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion | Caixia Dong et.al. | 2507.12938 | null |
| 2025-07-17 | Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models | Yifan Xu et.al. | 2507.12916 | null |
| 2025-07-17 | Agentar-DeepFinance-300K: A Large-Scale Financial Dataset via Systematic Chain-of-Thought Synthesis Optimization | Xiaoke Zhao et.al. | 2507.12901 | null |
| 2025-07-17 | Generalist Bimanual Manipulation via Foundation Video Diffusion Models | Yao Feng et.al. | 2507.12898 | null |
| 2025-07-17 | DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization | Huakang Chen et.al. | 2507.12890 | null |
| 2025-07-17 | VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks | Jian Yao et.al. | 2507.12885 | null |
| 2025-07-17 | Generative Multi-Target Cross-Domain Recommendation | Jinqiu Jin et.al. | 2507.12871 | null |
| 2025-07-17 | Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved) | Chongli Qin et.al. | 2507.12856 | null |
| 2025-07-17 | DEMONSTRATE: Zero-shot Language to Robotic Control via Multi-task Demonstration Learning | Rahel Rickenbach et.al. | 2507.12855 | null |
| 2025-07-17 | AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning | Yiming Ren et.al. | 2507.12841 | null |
| 2025-07-17 | Bridging the Gap: Leveraging Retrieval-Augmented Generation to Better Understand Public Concerns about Vaccines | Muhammad Javed et.al. | 2507.12840 | null |
| 2025-07-17 | MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval | Jeong-Woo Park et.al. | 2507.12819 | null |
| 2025-07-17 | Large Language Models' Internal Perception of Symbolic Music | Andrew Shin et.al. | 2507.12808 | null |
| 2025-07-17 | Semantic-guided Fine-tuning of Foundation Model for Long-tailed Visual Recognition | Yufei Peng et.al. | 2507.12807 | null |
| 2025-07-17 | MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models | Zhiwei Liu et.al. | 2507.12806 | null |
| 2025-07-17 | DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment | Junjie Gao et.al. | 2507.12796 | null |
| 2025-07-17 | Learning Robust Negation Text Representations | Thinh Hung Truong et.al. | 2507.12782 | null |
| 2025-07-17 | A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models | Weijieying Ren et.al. | 2507.12774 | null |
| 2025-07-17 | Local Representative Token Guided Merging for Text-to-Image Generation | Min-Jeong Lee et.al. | 2507.12771 | null |
| 2025-07-17 | Think-Before-Draw: Decomposing Emotion Semantics & Fine-Grained Controllable Expressive Talking Head Generation | Hanlei Shi et.al. | 2507.12761 | null |
| 2025-07-17 | osmAG-LLM: Zero-Shot Open-Vocabulary Object Navigation via Semantic Maps and Large Language Models Reasoning | Fujing Xie et.al. | 2507.12753 | null |
| 2025-07-17 | Multimodal-Guided Dynamic Dataset Pruning for Robust and Efficient Data-Centric Learning | Suorong Yang et.al. | 2507.12750 | null |
| 2025-07-17 | Strategy Adaptation in Large Language Model Werewolf Agents | Fuya Nakamori et.al. | 2507.12732 | null |
| 2025-07-17 | PinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform | Xiangyi Chen et.al. | 2507.12704 | null |
| 2025-07-17 | Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images | Zahra TehraniNasab et.al. | 2507.12698 | null |
| 2025-07-16 | Improving Drug Identification in Overdose Death Surveillance using Large Language Models | Arthur J. Funnell et.al. | 2507.12679 | null |
| 2025-07-16 | ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle | Mihran Miroyan et.al. | 2507.12674 | null |
| 2025-07-16 | The first open machine translation system for the Chechen language | Abu-Viskhan A. Umishov et.al. | 2507.12672 | null |
| 2025-07-16 | Single Conversation Methodology: A Human-Centered Protocol for AI-Assisted Software Development | Salvador D. Escobedo et.al. | 2507.12665 | null |
| 2025-07-16 | VLMgineer: Vision Language Models as Robotic Toolsmiths | George Jiayuan Gao et.al. | 2507.12644 | null |
| 2025-07-16 | NLI4VolVis: Natural Language Interaction for Volume Visualization via LLM Multi-Agents and Editable 3D Gaussian Splatting | Kuangshi Ai et.al. | 2507.12621 | null |
| 2025-07-16 | BootSeer: Analyzing and Mitigating Initialization Bottlenecks in Large-Scale LLM Training | Rui Li et.al. | 2507.12619 | null |
| 2025-07-16 | Learning What Matters: Probabilistic Task Selection via Mutual Information for Model Finetuning | Prateek Chanda et.al. | 2507.12612 | null |
| 2025-07-16 | Enhancing In-Domain and Out-Domain EmoFake Detection via Cooperative Multilingual Speech Foundation Models | Orchid Chetia Phukan et.al. | 2507.12595 | null |
| 2025-07-16 | Assay2Mol: large language model-based drug design using BioAssay context | Yifan Deng et.al. | 2507.12574 | null |
| 2025-07-16 | Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models | Gen Luo et.al. | 2507.12566 | null |
| 2025-07-17 | PhysX: Physical-Grounded 3D Asset Generation | Ziang Cao et.al. | 2507.12465 | null |
| 2025-07-16 | CytoSAE: Interpretable Cell Embeddings for Hematology | Muhammed Furkan Dasdelen et.al. | 2507.12464 | null |
| 2025-07-16 | Mitigating Object Hallucinations via Sentence-Level Early Intervention | Shangpin Peng et.al. | 2507.12455 | null |
| 2025-07-16 | Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models | Yik Siu Chan et.al. | 2507.12428 | null |
| 2025-07-16 | Advancing Retrieval-Augmented Generation for Structured Enterprise and Internal Data | Chandana Cheerla et.al. | 2507.12425 | null |
| 2025-07-16 | SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? | Xinyi He et.al. | 2507.12415 | null |
| 2025-07-16 | Modeling Feasible Locomotion of Nanobots for Cancer Detection and Treatment | Noble Harasha et.al. | 2507.12400 | null |
| 2025-07-16 | Assessing the Value of Visual Input: A Benchmark of Multimodal Large Language Models for Robotic Path Planning | Jacinto Colan et.al. | 2507.12391 | null |
| 2025-07-16 | Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics | Meysam Alizadeh et.al. | 2507.12372 | null |
| 2025-07-16 | Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate | Ana Davila et.al. | 2507.12370 | null |
| 2025-07-16 | GitChameleon: Evaluating AI Code Generation Against Python Library Version Incompatibilities | Diganta Misra et.al. | 2507.12367 | null |
| 2025-07-16 | Thought Purity: Defense Paradigm For Chain-of-Thought Attack | Zihao Xue et.al. | 2507.12314 | null |
| 2025-07-16 | Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization | Prashanth Vijayaraghavan et.al. | 2507.12308 | null |
| 2025-07-16 | Humans are more gullible than LLMs in believing common psychological myths | Bevan Koopman et.al. | 2507.12296 | null |
| 2025-07-16 | Text-ADBench: Text Anomaly Detection Benchmark based on LLMs Embedding | Feng Xiao et.al. | 2507.12295 | null |
| 2025-07-16 | SHACL Validation in the Presence of Ontologies: Semantics and Rewriting Techniques | Anouk Oudshoorn et.al. | 2507.12286 | null |
| 2025-07-16 | FADE: Adversarial Concept Erasure in Flow Models | Zixuan Fu et.al. | 2507.12283 | null |
| 2025-07-17 | Next-Gen Museum Guides: Autonomous Navigation and Visitor Interaction with an Agentic Robot | Luca Garello et.al. | 2507.12273 | null |
| 2025-07-16 | Improving Contextual ASR via Multi-grained Fusion with Large Language Models | Shilin Zhou et.al. | 2507.12252 | null |
| 2025-07-16 | Generate to Ground: Multimodal Text Conditioning Boosts Phrase Grounding in Medical Vision-Language Models | Felix Nützel et.al. | 2507.12236 | null |
| 2025-07-16 | MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM | Tao Chen et.al. | 2507.12232 | null |
| 2025-07-16 | Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning | Yuhao Chen et.al. | 2507.12215 | null |
| 2025-07-16 | Draw an Ugly Person An Exploration of Generative AIs Perceptions of Ugliness | Garyoung Kim et.al. | 2507.12212 | null |
| 2025-07-16 | BuildEvo: Designing Building Energy Consumption Forecasting Heuristics via LLM-driven Evolution | Subin Lin et.al. | 2507.12207 | null |
| 2025-07-16 | Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage | Junqing Lin et.al. | 2507.12205 | null |
| 2025-07-16 | RODS: Robust Optimization Inspired Diffusion Sampling for Detecting and Reducing Hallucination in Generative Models | Yiqi Tian et.al. | 2507.12201 | null |
| 2025-07-16 | Multi-Component VAE with Gaussian Markov Random Field | Fouad Oubari et.al. | 2507.12165 | null |
| 2025-07-16 | PRISM: Distributed Inference for Foundation Models at Edge | Muhammad Azlan Qazi et.al. | 2507.12145 | null |
| 2025-07-16 | Overview of the Sensemaking Task at the ELOQUENT 2025 Lab: LLMs as Teachers, Students and Evaluators | Pavel Šindelář et.al. | 2507.12143 | null |
| 2025-07-16 | RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization | Vladimir Bogachev et.al. | 2507.12142 | null |
| 2025-07-16 | Room Impulse Response Generation Conditioned on Acoustic Parameters | Silvia Arellano et.al. | 2507.12136 | null |
| 2025-07-16 | Iterative Augmentation with Summarization Refinement (IASR) Evaluation for Unstructured Survey data Modeling and Analysis | Payal Bhattad et.al. | 2507.12126 | null |
| 2025-07-16 | Open-Vocabulary Indoor Object Grounding with 3D Hierarchical Scene Graph | Sergey Linok et.al. | 2507.12123 | null |
| 2025-07-16 | DeepShade: Enable Shade Simulation by Text-conditioned Image Generation | Longchao Da et.al. | 2507.12103 | null |
| 2025-07-16 | LLAMA: Multi-Feedback Smart Contract Fuzzing Framework with LLM-Guided Seed Generation | Keke Gai et.al. | 2507.12084 | null |
| 2025-07-16 | Findings of MEGA: Maths Explanation with LLMs using the Socratic Method for Active Learning | Tosin Adewumi et.al. | 2507.12079 | null |
| 2025-07-16 | Evaluating the Ability of Large Language Models to Reason about Cardinal Directions, Revisited | Anthony G Cohn et.al. | 2507.12059 | null |
| 2025-07-16 | FloGAN: Scenario-Based Urban Mobility Flow Generation via Conditional GANs and Dynamic Region Decoupling | Seanglidet Yean et.al. | 2507.12053 | null |
| 2025-07-16 | A Comparative Approach to Assessing Linguistic Creativity of Large Language Models and Humans | Anca Dinu et.al. | 2507.12039 | null |
| 2025-07-16 | 3D-MoRe: Unified Modal-Contextual Reasoning for Embodied Question Answering | Rongtao Xu et.al. | 2507.12026 | null |
| 2025-07-16 | EME-TTS: Unlocking the Emphasis and Emotion Link in Speech Synthesis | Haoxun Li et.al. | 2507.12015 | null |
| 2025-07-16 | DSSD: Efficient Edge-Device Deployment and Collaborative Inference via Distributed Split Speculative Decoding | Jiahong Ning et.al. | 2507.12000 | null |
| 2025-07-16 | Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection | Tairan Huang et.al. | 2507.11997 | null |
| 2025-07-16 | Robust Planning for Autonomous Vehicles with Diffusion-Based Failure Samplers | Juanran Wang et.al. | 2507.11991 | null |
| 2025-07-16 | Aime: Towards Fully-Autonomous Multi-Agent Framework | Yexuan Shi et.al. | 2507.11988 | null |
| 2025-07-16 | Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions | Lukas Ellinger et.al. | 2507.11981 | null |
| 2025-07-16 | Value-Based Large Language Model Agent Simulation for Mutual Evaluation of Trust and Interpersonal Closeness | Yuki Sakamoto et.al. | 2507.11979 | null |
| 2025-07-16 | Graph Representations for Reading Comprehension Analysis using Large Language Model and Eye-Tracking Biomarker | Yuhong Zhang et.al. | 2507.11972 | null |
| 2025-07-16 | Watch, Listen, Understand, Mislead: Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation | Sahid Hossain Mustakim et.al. | 2507.11968 | null |
| 2025-07-16 | Toxicity-Aware Few-Shot Prompting for Low-Resource Singlish Translation | Ziyu Ge et.al. | 2507.11966 | null |
| 2025-07-16 | PoTPTQ: A Two-step Power-of-Two Post-training for LLMs | Xinyu Wang et.al. | 2507.11959 | null |
| 2025-07-16 | The benefits of query-based KGQA systems for complex and temporal questions in LLM era | Artem Alekseev et.al. | 2507.11954 | null |
| 2025-07-16 | BlockBPE: Parallel BPE Tokenization | Amos You et.al. | 2507.11941 | null |
| 2025-07-16 | A Multi-Level Similarity Approach for Single-View Object Grasping: Matching, Planning, and Fine-Tuning | Hao Chen et.al. | 2507.11938 | null |
| 2025-07-16 | A Survey of Deep Learning for Geometry Problem Solving | Jianzhe Ma et.al. | 2507.11936 | null |
| 2025-07-16 | Hyperphantasia: A Benchmark for Evaluating the Mental Visualization Capabilities of Multimodal LLMs | Mohammad Shahab Sepehri et.al. | 2507.11932 | null |
| 2025-07-16 | From Generative to Episodic: Sample-Efficient Replicable Reinforcement Learning | Max Hopkins et.al. | 2507.11926 | null |
| 2025-07-16 | Marco-Bench-MIF: On Multilingual Instruction-Following Capability of Large Language Models | Bo Zeng et.al. | 2507.11882 | null |
| 2025-07-16 | DualReward: A Dynamic Reinforcement Learning Framework for Cloze Tests Distractor Generation | Tianyou Huang et.al. | 2507.11875 | null |
| 2025-07-16 | CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching | Sidharth Kannan et.al. | 2507.11842 | null |
| 2025-07-16 | The Evolving Role of Large Language Models in Scientific Innovation: Evaluator, Collaborator, and Scientist | Haoxuan Zhang et.al. | 2507.11810 | null |
| 2025-07-16 | Tracing Facts or just Copies? A critical investigation of the Competitions of Mechanisms in Large Language Models | Dante Campregher et.al. | 2507.11809 | null |
| 2025-07-15 | Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation | Alessandro Palma et.al. | 2507.11789 | null |
| 2025-07-15 | Foundation Models for Brain Signals: A Critical Review of Current Progress and Future Directions | Gayal Kuruppu et.al. | 2507.11783 | null |
| 2025-07-15 | Large-scale distributed synchronization systems, using a cancel-on-completion redundancy mechanism | Alexander Stolyar et.al. | 2507.11779 | null |
| 2025-07-15 | Scaling laws for activation steering with Llama 2 models and refusal mechanisms | Sheikh Abdur Raheem Ali et.al. | 2507.11771 | null |
| 2025-07-15 | LLMs are Bayesian, in Expectation, not in Realization | Leon Chlon et.al. | 2507.11768 | null |
| 2025-07-15 | Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning | Fan Shi et.al. | 2507.11761 | null |
| 2025-07-15 | CRABS: A syntactic-semantic pincer strategy for bounding LLM interpretation of Python notebooks | Meng Li et.al. | 2507.11742 | null |
| 2025-07-15 | Auto-Formulating Dynamic Programming Problems with Large Language Models | Chenyu Zhou et.al. | 2507.11737 | null |
| 2025-07-15 | Subgraph Generation for Generalizing on Out-of-Distribution Links | Jay Revolinsky et.al. | 2507.11710 | null |
| 2025-07-15 | MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization | Atharva Naik et.al. | 2507.11687 | null |
| 2025-07-15 | Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification | Moises Andrade et.al. | 2507.11662 | null |
| 2025-07-15 | Deep Generative Methods and Tire Architecture Design | Fouad Oubari et.al. | 2507.11639 | null |
| 2025-07-15 | Interpretable Prediction of Lymph Node Metastasis in Rectal Cancer MRI Using Variational Autoencoders | Benjamin Keel et.al. | 2507.11638 | null |
| 2025-07-15 | MapIQ: Benchmarking Multimodal Large Language Models for Map Question Answering | Varun Srivastava et.al. | 2507.11625 | null |
| 2025-07-15 | k-Contextuality as a Heuristic for Memory Separations in Learning | Mariesa H. Teo et.al. | 2507.11604 | null |
| 2025-07-15 | SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics | Suyuan Zhao et.al. | 2507.11588 | null |
| 2025-07-15 | Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation | Zhen Xu et.al. | 2507.11540 | null |
| 2025-07-15 | Streaming 4D Visual Geometry Transformer | Dong Zhuo et.al. | 2507.11539 | null |
| 2025-07-15 | DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering | Yinsheng Li et.al. | 2507.11527 | null |
| 2025-07-15 | LLM-based ambiguity detection in natural language instructions for collaborative surgical robots | Ana Davila et.al. | 2507.11525 | null |
| 2025-07-15 | AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air | Shiyi Yang et.al. | 2507.11515 | null |
| 2025-07-15 | HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing | Pan Du et.al. | 2507.11474 | null |
| 2025-07-15 | LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer | Yaoxian Dong et.al. | 2507.11457 | null |
| 2025-07-15 | Implementing Adaptations for Vision AutoRegressive Model | Kaif Shaikh et.al. | 2507.11441 | null |
| 2025-07-15 | Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models | Paul A. Bereuter et.al. | 2507.11427 | null |
| 2025-07-16 | Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize? | Yanjian Zhang et.al. | 2507.11423 | null |
| 2025-07-15 | Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations | Miray Özcan et.al. | 2507.11417 | null |
| 2025-07-15 | Seq vs Seq: An Open Suite of Paired Encoders and Decoders | Orion Weller et.al. | 2507.11412 | null |
| 2025-07-15 | KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning? | Soumadeep Saha et.al. | 2507.11408 | null |
| 2025-07-15 | EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes | LG AI Research et.al. | 2507.11407 | null |
| 2025-07-15 | DCR: Quantifying Data Contamination in LLMs Evaluation | Cheng Xu et.al. | 2507.11405 | null |
| 2025-07-15 | Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs | Gabriel Bo et.al. | 2507.11371 | null |
| 2025-07-15 | From Chaos to Automation: Enabling the Use of Unstructured Data for Robotic Process Automation | Kelly Kurowski et.al. | 2507.11364 | null |
| 2025-07-15 | What is the Best Process Model Representation? A Comparative Analysis for Process Modeling with Large Language Models | Alexis Brissard et.al. | 2507.11356 | null |
| 2025-07-15 | Foundation Models for Logistics: Toward Certifiable, Conversational Planning Interfaces | Yunhao Yang et.al. | 2507.11352 | null |
| 2025-07-15 | RefModel: Detecting Refactorings using Foundation Models | Pedro Simões et.al. | 2507.11346 | null |
| 2025-07-15 | Guiding LLM Decision-Making with Fairness Reward Models | Zara Hall et.al. | 2507.11344 | null |
| 2025-07-15 | MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network | Jianfei Jiang et.al. | 2507.11333 | null |
| 2025-07-15 | Automated Novelty Evaluation of Academic Paper: A Collaborative Approach Integrating Human and Large Language Model Knowledge | Wenqing Wu et.al. | 2507.11330 | null |
| 2025-07-15 | Internal Value Alignment in Large Language Models through Controlled Value Vector Activation | Haoran Jin et.al. | 2507.11316 | null |
| 2025-07-15 | LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification | Fengxiao Tang et.al. | 2507.11310 | null |
| 2025-07-15 | Dr.Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian | Andrei Niculae et.al. | 2507.11299 | null |
| 2025-07-15 | Opus: A Prompt Intention Framework for Complex Workflow Generation | Théo Fagnoni et.al. | 2507.11288 | null |
| 2025-07-15 | Taming Uncertainty via Automation: Observing, Analyzing, and Optimizing Agentic AI Systems | Dany Moshkovich et.al. | 2507.11277 | null |
| 2025-07-15 | FMC: Formalization of Natural Language Mathematical Competition Problems | Jiaxuan Xie et.al. | 2507.11275 | null |
| 2025-07-15 | KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding | Luohe Shi et.al. | 2507.11273 | null |
| 2025-07-15 | An Empirical Study of Multi-Agent RAG for Real-World University Admissions Counseling | Anh Nguyen-Duc et.al. | 2507.11272 | null |
| 2025-07-15 | MFGDiffusion: Mask-Guided Smoke Synthesis for Enhanced Forest Fire Detection | Guanghao Wu et.al. | 2507.11252 | null |
| 2025-07-15 | Generative Click-through Rate Prediction with Applications to Search Advertising | Lingwei Kong et.al. | 2507.11246 | null |
| 2025-07-15 | NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation Models | X. Feng et.al. | 2507.11245 | null |
| 2025-07-15 | Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages | Lyzander Marciano Andrylie et.al. | 2507.11230 | null |
| 2025-07-15 | An Agentic Flow for Finite State Machine Extraction using Prompt Chaining | Fares Wael et.al. | 2507.11222 | null |
| 2025-07-15 | EsBBQ and CaBBQ: The Spanish and Catalan Bias Benchmarks for Question Answering | Valle Ruiz-Fernández et.al. | 2507.11216 | null |
| 2025-07-15 | Role-Playing LLM-Based Multi-Agent Support Framework for Detecting and Addressing Family Communication Bias | Rushia Harada et.al. | 2507.11210 | null |
| 2025-07-15 | Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding | Conrad Borchers et.al. | 2507.11198 | null |
| 2025-07-15 | Mixture of Experts in Large Language Models | Danyang Zhang et.al. | 2507.11181 | null |
| 2025-07-15 | Latent Space Consistency for Sparse-View CT Reconstruction | Duoyou Chen et.al. | 2507.11152 | null |
| 2025-07-15 | What Should LLMs Forget? Quantifying Personal Data in LLMs for Right-to-Be-Forgotten Requests | Dimitri Staufer et.al. | 2507.11128 | null |
| 2025-07-15 | MSA at ImageCLEF 2025 Multimodal Reasoning: Multilingual Multimodal Reasoning With Ensemble Vision Language Models | Seif Ahmed et.al. | 2507.11114 | null |
| 2025-07-15 | Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs | Sanhanat Sivapiromrat et.al. | 2507.11112 | null |
| 2025-07-15 | KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model | Jie Yang et.al. | 2507.11102 | null |
| 2025-07-15 | The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs | Zichen Wen et.al. | 2507.11097 | null |
| 2025-07-15 | EditGen: Harnessing Cross-Attention Control for Instruction-Based Auto-Regressive Audio Editing | Vassilis Sioros et.al. | 2507.11096 | null |
| 2025-07-15 | Beyond Traditional Algorithms: Leveraging LLMs for Accurate Cross-Border Entity Identification | Andres Azqueta-Gavaldón et.al. | 2507.11086 | null |
| 2025-07-15 | Function-to-Style Guidance of LLMs for Code Translation | Longhui Zhang et.al. | 2507.11083 | null |
| 2025-07-15 | Tactical Decision for Multi-UGV Confrontation with a Vision-Language Model-Based Commander | Li Wang et.al. | 2507.11079 | null |
| 2025-07-15 | LogTinyLLM: Tiny Large Language Models Based Contextual Log Anomaly Detection | Isaiah Thompson Ocansey et.al. | 2507.11071 | null |
| 2025-07-15 | SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks | Pavel Adamenko et.al. | 2507.11059 | null |
| 2025-07-15 | LLM-Augmented Symptom Analysis for Cardiovascular Disease Risk Prediction: A Clinical NLP | Haowei Yang et.al. | 2507.11052 | null |
| 2025-07-15 | Aligned Query Expansion: Efficient Query Expansion for Information Retrieval through LLM Alignment | Adam Yang et.al. | 2507.11042 | null |
| 2025-07-15 | Functional Emotion Modeling in Biomimetic Reinforcement Learning | Louis Wang et.al. | 2507.11027 | null |
| 2025-07-15 | Incentivizing Knowledge Transfers | Zhonghong Kuang et.al. | 2507.11018 | null |
| 2025-07-15 | First-Order Error Matters: Accurate Compensation for Quantized Large Language Models | Xingyu Zheng et.al. | 2507.11017 | null |
| 2025-07-15 | SIMCODE: A Benchmark for Natural Language to ns-3 Network Simulation Code Generation | Tasnim Ahmed et.al. | 2507.11014 | null |
| 2025-07-15 | Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation | Yanbo Wang et.al. | 2507.11001 | null |
| 2025-07-15 | Teach Me Sign: Stepwise Prompting LLM for Sign Language Production | Zhaoyi An et.al. | 2507.10972 | null |
| 2025-07-15 | DS@GT at eRisk 2025: From prompts to predictions, benchmarking early depression detection with conversational agent based assessments and temporal attention models | Anthony Miyaguchi et.al. | 2507.10958 | null |
| 2025-07-15 | Modeling Understanding of Story-Based Analogies Using Large Language Models | Kalit Inani et.al. | 2507.10957 | null |
| 2025-07-15 | Towards Practical Benchmarking of Data Cleaning Techniques: On Generating Authentic Errors via Large Language Models | Xinyuan Liu et.al. | 2507.10934 | null |
| 2025-07-15 | Artificial Finance: How AI Thinks About Money | Orhan Erdem et.al. | 2507.10933 | null |
| 2025-07-15 | Enhancing Safe and Controllable Protein Generation via Knowledge Preference Optimization | Yuhao Wang et.al. | 2507.10923 | null |
| 2025-07-15 | HanjaBridge: Resolving Semantic Ambiguity in Korean LLMs via Hanja-Augmented Pre-Training | Seungho Choi et.al. | 2507.10920 | null |
| 2025-07-15 | LLM-Driven Dual-Level Multi-Interest Modeling for Recommendation | Ziyan Wang et.al. | 2507.10917 | null |
| 2025-07-15 | Lessons Learned from Evaluation of LLM based Multi-agents in Safer Therapy Recommendation | Yicong Wu et.al. | 2507.10911 | null |
| 2025-07-15 | Evaluating Generated Commit Messages with Large Language Models | Qunhong Zeng et.al. | 2507.10906 | null |
| 2025-07-15 | LiLM-RDB-SFC: Lightweight Language Model with Relational Database-Guided DRL for Optimized SFC Provisioning | Parisa Fard Moshiri et.al. | 2507.10903 | null |
| 2025-07-15 | Object-Centric Mobile Manipulation through SAM2-Guided Perception and Imitation Learning | Wang Zhicheng et.al. | 2507.10899 | null |
| 2025-07-15 | LLMATCH: A Unified Schema Matching Framework with Large Language Models | Sha Wang et.al. | 2507.10897 | null |
| 2025-07-15 | Learning from Imperfect Data: Robust Inference of Dynamic Systems using Simulation-based Generative Model | Hyunwoo Cho et.al. | 2507.10884 | null |
| 2025-07-15 | From Alerts to Intelligence: A Novel LLM-Aided Framework for Host-based Intrusion Detection | Danyu Sun et.al. | 2507.10873 | null |
| 2025-07-14 | WhisperKit: On-device Real-time ASR with Billion-Scale Transformers | Atila Orhon et.al. | 2507.10860 | null |
| 2025-07-14 | MultiVox: Benchmarking Voice Assistants for Multimodal Interactions | Ramaneswaran Selvakumar et.al. | 2507.10859 | null |
| 2025-07-14 | LLMs on Trial: Evaluating Judicial Fairness for Large Language Models | Yiran Hu et.al. | 2507.10852 | null |
| 2025-07-14 | LLM-Guided Agentic Object Detection for Open-World Understanding | Furkan Mumcu et.al. | 2507.10844 | null |
| 2025-07-14 | REAL-IoT: Characterizing GNN Intrusion Detection Robustness under Practical Adversarial Attack | Zhonghao Zhan et.al. | 2507.10836 | null |
| 2025-07-14 | Supporting SENĆOTEN Language Documentation Efforts with Automatic Speech Recognition | Mengzhe Geng et.al. | 2507.10827 | null |
| 2025-07-14 | Semantic Context for Tool Orchestration | Robert Müller et.al. | 2507.10820 | null |
| 2025-07-14 | How Robust are LLM-Generated Library Imports? An Empirical Study using Stack Overflow | Jasmine Latendresse et.al. | 2507.10818 | null |
| 2025-07-14 | Versatile and Generalizable Manipulation via Goal-Conditioned Reinforcement Learning with Grounded Object Detection | Huiyi Wang et.al. | 2507.10814 | null |
| 2025-07-14 | Automated Thematic Analyses Using LLMs: Xylazine Wound Management Social Media Chatter Use Case | JaMor Hairston et.al. | 2507.10803 | null |
| 2025-07-14 | Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers | Yilun Zhao et.al. | 2507.10787 | null |
| 2025-07-14 | Warehouse Spatial Question Answering with LLM Agent | Hsiang-Wei Huang et.al. | 2507.10778 | null |
| 2025-07-14 | rt-RISeg: Real-Time Model-Free Robot Interactive Segmentation for Active Instance-Level Object Understanding | Howard H. Qian et.al. | 2507.10776 | null |
| 2025-07-14 | Spatial Reasoners for Continuous Variables in Any Domain | Bart Pogodzinski et.al. | 2507.10768 | null |
| 2025-07-14 | Integrating Biological Knowledge for Robust Microscopy Image Profiling on De Novo Cell Lines | Jiayuan Chen et.al. | 2507.10737 | null |
| 2025-07-14 | Bridging Brains and Machines: A Unified Frontier in Neuroscience, Artificial Intelligence, and Neuromorphic Systems | Sohan Shankar et.al. | 2507.10722 | null |
| 2025-07-14 | Exploring User Security and Privacy Attitudes and Concerns Toward the Use of General-Purpose LLM Chatbots for Mental Health | Jabari Kwesi et.al. | 2507.10695 | null |
| 2025-07-14 | Machine-learning inference of stellar properties using integrated photometric and spectroscopic data | Ilay Kamai et.al. | 2507.10666 | null |
| 2025-07-14 | Emulating Dark Matter Halo Merger Trees with Graph Generative Models | Tri Nguyen et.al. | 2507.10652 | null |
| 2025-07-14 | MP1: Mean Flow Tames Policy Learning in 1-step for Robotic Manipulation | Juyi Sheng et.al. | 2507.10543 | null |
| 2025-07-14 | Fusing LLM Capabilities with Routing Data | Tao Feng et.al. | 2507.10540 | null |
| 2025-07-14 | Graph World Model | Tao Feng et.al. | 2507.10539 | null |
| 2025-07-14 | CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks | Hongchao Jiang et.al. | 2507.10535 | null |
| 2025-07-14 | Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination | Mingqi Wu et.al. | 2507.10532 | null |
| 2025-07-14 | Accurate generation of chemical reaction transition states by conditional flow matching | Ping Tuo et.al. | 2507.10530 | null |
| 2025-07-14 | Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI | Jiangkai Wu et.al. | 2507.10510 | null |
| 2025-07-14 | Scene-Aware Conversational ADAS with Generative AI for Real-Time Driver Assistance | Kyungtae Han et.al. | 2507.10500 | null |
| 2025-07-14 | Can You Detect the Difference? | İsmail Tarım et.al. | 2507.10475 | null |
| 2025-07-14 | MLAR: Multi-layer Large Language Model-based Robotic Process Automation Applicant Tracking | Mohamed T. Younes et.al. | 2507.10472 | null |
| 2025-07-14 | An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments | Mikko Korkiakoski et.al. | 2507.10469 | null |
| 2025-07-14 | Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems | Hammad Atta et.al. | 2507.10457 | null |
| 2025-07-14 | Text-Visual Semantic Constrained AI-Generated Image Quality Assessment | Qiang Li et.al. | 2507.10432 | null |
| 2025-07-14 | Towards Emotion Co-regulation with LLM-powered Socially Assistive Robots: Integrating LLM Prompts and Robotic Behaviors to Support Parent-Neurodivergent Child Dyads | Jing Li et.al. | 2507.10427 | null |
| 2025-07-14 | Zorse: Optimizing LLM Training Efficiency on Heterogeneous GPU Clusters | Runsheng Benson Guo et.al. | 2507.10392 | null |
| 2025-07-14 | Test-Time Canonicalization by Foundation Models for Robust Perception | Utkarsh Singhal et.al. | 2507.10375 | null |
| 2025-07-14 | Using AI to replicate human experimental results: a motion study | Rosa Illan Castillo et.al. | 2507.10342 | null |
| 2025-07-14 | Grammar-Guided Evolutionary Search for Discrete Prompt Optimisation | Muzhaffar Hazman et.al. | 2507.10326 | null |
| 2025-07-14 | Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching | Yuhan Liu et.al. | 2507.10318 | null |
| 2025-07-14 | Recognizing Dementia from Neuropsychological Tests with State Space Models | Liming Wang et.al. | 2507.10311 | null |
| 2025-07-14 | DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs | Jiahe Zhao et.al. | 2507.10302 | null |
| 2025-07-14 | FaceLLM: A Multimodal Large Language Model for Face Understanding | Hatef Otroshi Shahreza et.al. | 2507.10300 | null |
| 2025-07-14 | Prompt Informed Reinforcement Learning for Visual Coverage Path Planning | Venkat Margapuri et.al. | 2507.10284 | null |
| 2025-07-14 | Cross-Timeslot Optimization for Distributed GPU Inference Using Reinforcement Learning | Chengze Du et.al. | 2507.10259 | null |
| 2025-07-14 | Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection | Jinglun Li et.al. | 2507.10225 | null |
| 2025-07-14 | Absher: A Benchmark for Evaluating Large Language Models Understanding of Saudi Dialects | Renad Al-Monef et.al. | 2507.10216 | null |
| 2025-07-14 | A Training-Free, Task-Agnostic Framework for Enhancing MLLM Performance on High-Resolution Images | Jaeseong Lee et.al. | 2507.10202 | null |
| 2025-07-14 | History Matching under Uncertainty of Geological Scenarios with Implicit Geological Realism Control with Generative Deep Learning and Graph Convolutions | Gleb Shishaev et.al. | 2507.10201 | null |
| 2025-07-14 | Natural Language-based Assessment of L2 Oral Proficiency using LLMs | Stefano Bannò et.al. | 2507.10200 | null |
| 2025-07-14 | Breaking the Myth: Can Small Models Infer Postconditions Too? | Gehao Zhang et.al. | 2507.10182 | null |
| 2025-07-14 | Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving | Wonung Kim et.al. | 2507.10178 | null |
| 2025-07-14 | Abusive text transformation using LLMs | Rohitash Chandra et.al. | 2507.10177 | null |
| 2025-07-14 | Task-Based Flexible Feature Distillation for LLMs | Khouloud Saadi et.al. | 2507.10155 | null |
| 2025-07-14 | Past-Future Scheduler for LLM Serving under SLA Guarantees | Ruihao Gong et.al. | 2507.10150 | null |
| 2025-07-14 | DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation | Ivan Martinović et.al. | 2507.10118 | null |
| 2025-07-14 | Accelerating Automatic Program Repair with Dual Retrieval-Augmented Fine-Tuning and Patch Generation on Large Language Models | Hanyang Guo et.al. | 2507.10103 | null |
| 2025-07-14 | Fusing Large Language Models with Temporal Transformers for Time Series Forecasting | Chen Su et.al. | 2507.10098 | null |
| 2025-07-14 | Towards High Supervised Learning Utility Training Data Generation: Data Pruning and Column Reordering | Tung Sum Thomas Kwok et.al. | 2507.10088 | null |
| 2025-07-14 | Foundation Model Driven Robotics: A Comprehensive Review | Muhammad Tayyab Khan et.al. | 2507.10087 | null |
| 2025-07-14 | Cultural Bias in Large Language Models: Evaluating AI Agents through Moral Questionnaires | Simon Münker et.al. | 2507.10073 | null |
| 2025-07-14 | ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism | Zedong Liu et.al. | 2507.10069 | null |
| 2025-07-14 | LLMShot: Reducing snapshot testing maintenance via LLMs | Ergün Batuhan Kaynak et.al. | 2507.10062 | null |
| 2025-07-14 | GeLaCo: An Evolutionary Approach to Layer Compression | David Ponce et.al. | 2507.10059 | null |
| 2025-07-14 | Explicit Vulnerability Generation with LLMs: An Investigation Beyond Adversarial Attacks | Emir Bosnak et.al. | 2507.10054 | null |
| 2025-07-14 | Automating SPARQL Query Translations between DBpedia and Wikidata | Malte Christian Bartels et.al. | 2507.10045 | null |
| 2025-07-14 | Towards Applying Large Language Models to Complement Single-Cell Foundation Models | Steven Palayew et.al. | 2507.10039 | null |
| 2025-07-14 | EAT: QoS-Aware Edge-Collaborative AIGC Task Scheduling via Attention-Guided Diffusion Reinforcement Learning | Zhifei Xu et.al. | 2507.10026 | null |
| 2025-07-14 | Qualitative Study for LLM-assisted Design Study Process: Strategies, Challenges, and Roles | Shaolun Ruan et.al. | 2507.10024 | null |
| 2025-07-14 | The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents | Lixu Wang et.al. | 2507.10016 | null |
| 2025-07-14 | (Almost) Free Modality Stitching of Foundation Models | Jaisidh Singh et.al. | 2507.10015 | null |
| 2025-07-14 | Protective Factor-Aware Dynamic Influence Learning for Suicide Risk Prediction on Social Media | Jun Li et.al. | 2507.10008 | null |
| 2025-07-14 | Deep Hidden Cognition Facilitates Reliable Chain-of-Thought Reasoning | Zijun Chen et.al. | 2507.10007 | null |
| 2025-07-14 | Differentially Private Federated Low Rank Adaptation Beyond Fixed-Matrix | Ming Wen et.al. | 2507.09990 | null |
| 2025-07-14 | Demonstrating the Octopi-1.5 Visual-Tactile-Language Model | Samson Yu et.al. | 2507.09985 | null |
| 2025-07-14 | Tiny Reward Models | Sarah Pan et.al. | 2507.09973 | null |
| 2025-07-14 | AnalogTester: A Large Language Model-Based Framework for Automatic Testbench Generation in Analog Circuit Design | Weiyu Chen et.al. | 2507.09965 | null |
| 2025-07-14 | DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models | Luolin Xiong et.al. | 2507.09955 | null |
| 2025-07-14 | Can GPT-4o mini and Gemini 2.0 Flash Predict Fine-Grained Fashion Product Attributes? A Zero-Shot Analysis | Shubham Shukla et.al. | 2507.09950 | null |
| 2025-07-14 | Iceberg: Enhancing HLS Modeling with Synthetic Data | Zijian Ding et.al. | 2507.09948 | null |
| 2025-07-14 | Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed Inference | Jiaming Cheng et.al. | 2507.09942 | null |
| 2025-07-14 | Memorization Sinks: Isolating Memorization during LLM Training | Gaurav R. Ghosal et.al. | 2507.09937 | null |
| 2025-07-14 | Enhancing Retrieval Augmented Generation with Hierarchical Text Segmentation Chunking | Hai Toan Nguyen et.al. | 2507.09935 | null |
| 2025-07-14 | Mechanistic Interpretability of LoRA-Adapted Language Models for Nuclear Reactor Safety Applications | Yoon Pyo Lee et.al. | 2507.09931 | null |
| 2025-07-14 | Solving dynamic portfolio selection problems via score-based diffusion models | Ahmad Aghapour et.al. | 2507.09916 | null |
| 2025-07-14 | Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce Scenarios | Siyue Yao et.al. | 2507.09915 | null |
| 2025-07-14 | TolerantECG: A Foundation Model for Imperfect Electrocardiogram | Huynh Nguyen Dang et.al. | 2507.09887 | null |
| 2025-07-14 | VerifyBench: A Systematic Benchmark for Evaluating Reasoning Verifiers Across Domains | Xuzhao Li et.al. | 2507.09884 | null |
| 2025-07-14 | AdaBrain-Bench: Benchmarking Brain Foundation Models for Brain-Computer Interface Applications | Jiamin Wu et.al. | 2507.09882 | null |
| 2025-07-14 | Covering a Few Submodular Constraints and Applications | Tanvi Bajpai et.al. | 2507.09879 | null |
| 2025-07-14 | ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models | Yongheng Zhang et.al. | 2507.09876 | null |
| 2025-07-14 | Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition | Qinyuan Ye et.al. | 2507.09875 | null |
| 2025-07-14 | Turning the Tide: Repository-based Code Reflection | Wei Zhang et.al. | 2507.09866 | null |
| 2025-07-14 | A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends | Yihao Ding et.al. | 2507.09861 | null |
| 2025-07-14 | Model-Grounded Symbolic Artificial Intelligence Systems Learning and Reasoning with Model-Grounded Symbolic Artificial Intelligence Systems | Aniruddha Chattopadhyay et.al. | 2507.09854 | null |
| 2025-07-14 | Rethinking Prompt Optimization: Reinforcement, Diversification, and Migration in Blackbox LLMs | MohammadReza Davari et.al. | 2507.09839 | null |
| 2025-07-14 | Generative Audio Language Modeling with Continuous-valued Tokens and Masked Next-Token Prediction | Shu-wen Yang et.al. | 2507.09834 | null |
| 2025-07-13 | Generative Cognitive Diagnosis | Jiatong Li et.al. | 2507.09831 | null |
| 2025-07-13 | Measuring What Matters: A Framework for Evaluating Safety Risks in Real-World LLM Applications | Jia Yi Goh et.al. | 2507.09820 | null |
| 2025-07-13 | VRU-Accident: A Vision-Language Benchmark for Video Question Answering and Dense Captioning for Accident Scene Understanding | Younggun Kim et.al. | 2507.09815 | null |
| 2025-07-13 | A Scalable and Efficient Signal Integration System for Job Matching | Ping Liu et.al. | 2507.09797 | null |
| 2025-07-13 | CADmium: Fine-Tuning Code Language Models for Text-Driven Sequential CAD Design | Prashant Govindarajan et.al. | 2507.09792 | null |
| 2025-07-13 | Prompting for Performance: Exploring LLMs for Configuring Software | Helge Spieker et.al. | 2507.09790 | null |
| 2025-07-13 | TinyTroupe: An LLM-powered Multiagent Persona Simulation Toolkit | Paulo Salem et.al. | 2507.09788 | null |
| 2025-07-13 | Efficient Molecular Conformer Generation with SO(3)-Averaged Flow Matching and Reflow | Zhonglin Cao et.al. | 2507.09785 | null |
| 2025-07-13 | Do we need equivariant models for molecule generation? | Ewa M. Nowara et.al. | 2507.09753 | null |
| 2025-07-13 | Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations | Bradley P. Allen et.al. | 2507.09751 | null |
| 2025-07-13 | BrainFLORA: Uncovering Brain Concept Representation via Multimodal Neural Embeddings | Dongyang Li et.al. | 2507.09747 | null |
| 2025-07-13 | Enhancing Trading Performance Through Sentiment Analysis with Large Language Models: Evidence from the S&P 500 | Haojie Liu et.al. | 2507.09739 | null |
| 2025-07-13 | Continental scale habitat modelling with artificial intelligence and multimodal earth observation | Sara Si-Moussi et.al. | 2507.09732 | null |
| 2025-07-13 | Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces | Baturay Saglam et.al. | 2507.09709 | null |
| 2025-07-13 | MCEval: A Dynamic Framework for Fair Multilingual Cultural Evaluation of LLMs | Shulin Huang et.al. | 2507.09701 | null |
| 2025-07-13 | ExpStar: Towards Automatic Commentary Generation for Multi-discipline Scientific Experiments | Jiali Chen et.al. | 2507.09693 | null |
| 2025-07-13 | Prompt2DEM: High-Resolution DEMs for Urban and Open Environments from Global Prompts Using a Monocular Foundation Model | Osher Rafaeli et.al. | 2507.09681 | null |
| 2025-07-13 | Can AI Rely on the Systematicity of Truth? The Challenge of Modelling Normative Domains | Matthieu Queloz et.al. | 2507.09676 | null |
| 2025-07-13 | Is Quantization a Deal-breaker? Empirical Insights from Large Code Models | Saima Afrin et.al. | 2507.09665 | null |
| 2025-07-13 | Towards Concise and Adaptive Thinking in Large Reasoning Models: A Survey | Jason Zhu et.al. | 2507.09662 | null |
| 2025-07-13 | Negotiating Comfort: Simulating Personality-Driven LLM Agents in Shared Residential Social Networks | Ann Nedime Nese Rende et.al. | 2507.09657 | null |
| 2025-07-13 | Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset | Lily Hong Zhang et.al. | 2507.09650 | null |
| 2025-07-13 | Can Group Relative Policy Optimization Improve Thai Legal Reasoning and Question Answering? | Pawitsapak Akarajaradwong et.al. | 2507.09638 | null |
| 2025-07-13 | Demystifying Flux Architecture | Or Greenberg et.al. | 2507.09595 | null |
| 2025-07-11 | Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective | Hangjie Yuan et.al. | 2507.08801 | null |
| 2025-07-11 | NeuralOS: Towards Simulating Operating Systems via Neural Generative Models | Luke Rivard et.al. | 2507.08800 | null |
| 2025-07-11 | One Token to Fool LLM-as-a-Judge | Yulai Zhao et.al. | 2507.08794 | null |
| 2025-07-11 | From One to More: Contextual Part Latents for 3D Generation | Shaocong Dong et.al. | 2507.08772 | null |
| 2025-07-11 | BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity | Chenyang Song et.al. | 2507.08771 | null |
| 2025-07-11 | Multilingual Multimodal Software Developer for Code Generation | Linzheng Chai et.al. | 2507.08719 | null |
| 2025-07-11 | Unreal is all you need: Multimodal ISAC Data Simulation with Only One Engine | Kongwu Huang et.al. | 2507.08716 | null |
| 2025-07-11 | KG-Attention: Knowledge Graph-Guided Attention at Test-Time via Bidirectional Information Aggregation | Songlin Zhai et.al. | 2507.08704 | null |
| 2025-07-11 | ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way | Rajarshi Roy et.al. | 2507.08679 | null |
| 2025-07-11 | LLMCup: Ranking-Enhanced Comment Updating with LLMs | Hua Ge et.al. | 2507.08671 | null |
| 2025-07-11 | KELPS: A Framework for Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment | Jiyao Zhang et.al. | 2507.08665 | null |
| 2025-07-11 | Introspection of Thought Helps AI Agents | Haoran Sun et.al. | 2507.08664 | null |
| 2025-07-11 | Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning | Xingguang Ji et.al. | 2507.08649 | null |
| 2025-07-11 | DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images | Haoran Sun et.al. | 2507.08648 | null |
| 2025-07-11 | NL in the Middle: Code Translation with LLMs and Intermediate Representations | Chi-en Amy Tai et.al. | 2507.08627 | null |
| 2025-07-11 | A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1 | Marcin Pietroń et.al. | 2507.08621 | null |
| 2025-07-11 | Agentic Large Language Models for Conceptual Systems Engineering and Design | Soheyl Massoudi et.al. | 2507.08619 | null |
| 2025-07-11 | AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs | Florian Grötschla et.al. | 2507.08616 | null |
| 2025-07-11 | Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data | Parag Dutta et.al. | 2507.08610 | null |
| 2025-07-11 | Unlocking Speech Instruction Data Potential with Query Rewriting | Yonghua Hei et.al. | 2507.08603 | null |
| 2025-07-11 | Visual Semantic Description Generation with MLLMs for Image-Text Matching | Junyu Chen et.al. | 2507.08590 | null |
| 2025-07-11 | To Trade or Not to Trade: An Agentic Approach to Estimating Market Risk Improves Trading Decisions | Dimitrios Emmanoulopoulos et.al. | 2507.08584 | null |
| 2025-07-11 | Large Multi-modal Model Cartographic Map Comprehension for Textual Locality Georeferencing | Kalana Wijegunarathna et.al. | 2507.08575 | null |
| 2025-07-11 | AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling | Preslav Aleksandrov et.al. | 2507.08567 | null |
| 2025-07-11 | FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation | Yuxuan Jiang et.al. | 2507.08557 | null |
| 2025-07-11 | White-Basilisk: A Hybrid Model for Code Vulnerability Detection | Ioannis Lamprou et.al. | 2507.08540 | null |
| 2025-07-11 | The AI Language Proficiency Monitor -- Tracking the Progress of LLMs on Multilingual Benchmarks | David Pomerenke et.al. | 2507.08538 | null |
| 2025-07-11 | A Multi-granularity Concept Sparse Activation and Hierarchical Knowledge Graph Fusion Framework for Rare Disease Diagnosis | Mingda Zhang et.al. | 2507.08529 | null |
| 2025-07-11 | InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching | Yilun Wang et.al. | 2507.08523 | null |
| 2025-07-11 | Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation | Liu He et.al. | 2507.08513 | null |
| 2025-07-11 | From Language to Logic: A Bi-Level Framework for Structured Reasoning | Keying Yang et.al. | 2507.08501 | null |
| 2025-07-11 | Semantic-Augmented Latent Topic Modeling with LLM-in-the-Loop | Mengze Hong et.al. | 2507.08498 | null |
| 2025-07-11 | LLaPa: A Vision-Language Model Framework for Counterfactual-Aware Procedural Planning | Shibo Sun et.al. | 2507.08496 | null |
| 2025-07-11 | A Third Paradigm for LLM Evaluation: Dialogue Game-Based Evaluation using clembench | David Schlangen et.al. | 2507.08491 | null |
| 2025-07-11 | ILT-Iterative LoRA Training through Focus-Feedback-Fix for Multilingual Speech Recognition | Qingliang Meng et.al. | 2507.08477 | null |
| 2025-07-11 | SynBridge: Bridging Reaction States via Discrete Flow for Bidirectional Reaction Prediction | Haitao Lin et.al. | 2507.08475 | null |
| 2025-07-11 | Using Large Language Models for Legal Decision-Making in Austrian Value-Added Tax Law: An Experimental Study | Marina Luketina et.al. | 2507.08468 | null |
| 2025-07-11 | F3-Net: Foundation Model for Full Abnormality Segmentation of Medical Images with Flexible Input Modality Requirement | Seyedeh Sahar Taheri Otaghsara et.al. | 2507.08460 | null |
| 2025-07-11 | Diagnosing Failures in Large Language Models' Answers: Integrating Error Attribution into Evaluation Framework | Zishan Xu et.al. | 2507.08459 | null |
| 2025-07-11 | A document is worth a structured record: Principled inductive bias design for document recognition | Benjamin Meyer et.al. | 2507.08458 | null |
| 2025-07-11 | CUE-RAG: Towards Accurate and Cost-Efficient Graph-Based RAG via Multi-Partite Graph and Query-Driven Iterative Retrieval | Yaodong Su et.al. | 2507.08445 | null |
| 2025-07-11 | Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation | Anlin Zheng et.al. | 2507.08441 | null |
| 2025-07-11 | Finding Common Ground: Using Large Language Models to Detect Agreement in Multi-Agent Decision Conferences | Selina Heller et.al. | 2507.08440 | null |
| 2025-07-11 | xpSHACL: Explainable SHACL Validation using Retrieval-Augmented Generation and Large Language Models | Gustavo Correa Publio et.al. | 2507.08432 | null |
| 2025-07-11 | ChainEdit: Propagating Ripple Effects in LLM Knowledge Editing through Logical Rule-Guided Chains | Zilu Dong et.al. | 2507.08427 | null |
| 2025-07-11 | Generative artificial intelligence and hybrid models to accelerate LES in reactive flows: Application to hydrogen/methane combustion | Xiangrui Zou et.al. | 2507.08426 | null |
| 2025-07-11 | A Survey of Large Language Models in Discipline-specific Research: Challenges, Methods and Opportunities | Lu Xiang et.al. | 2507.08425 | null |
| 2025-07-11 | InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes | Zesong Yang et.al. | 2507.08416 | null |
| 2025-07-11 | Multi-modal Mutual-Guidance Conditional Prompt Learning for Vision-Language Models | Shijun Yang et.al. | 2507.08410 | null |
| 2025-07-11 | PanMatch: Unleashing the Potential of Large Vision Models for Unified Matching Models | Yongjian Zhang et.al. | 2507.08400 | null |
| 2025-07-11 | Understanding Driving Risks using Large Language Models: Toward Elderly Driver Assessment | Yuki Yoshihara et.al. | 2507.08367 | null |
| 2025-07-11 | Leveraging Machine Learning and Enhanced Parallelism Detection for BPMN Model Generation from Text | Phuong Nam Lê et.al. | 2507.08362 | null |
| 2025-07-11 | Cycle Context Verification for In-Context Medical Image Segmentation | Shishuai Hu et.al. | 2507.08357 | null |
| 2025-07-11 | Exploring Design of Multi-Agent LLM Dialogues for Research Ideation | Keisuke Ueda et.al. | 2507.08350 | null |
| 2025-07-11 | What Factors Affect LLMs and RLLMs in Financial Question Answering? | Peng Wang et.al. | 2507.08339 | null |
| 2025-07-11 | CoCo-Bot: Energy-based Composable Concept Bottlenecks for Interpretable Generative Models | Sangwon Kim et.al. | 2507.08334 | null |
| 2025-07-11 | CRMAgent: A Multi-Agent LLM System for E-Commerce CRM Message Template Generation | Yinzhu Quan et.al. | 2507.08325 | null |
| 2025-07-11 | Generative AI in Science: Applications, Challenges, and Emerging Questions | Ryan Harries et.al. | 2507.08310 | null |
| 2025-07-11 | Improving MLLM's Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency | Yupu Liang et.al. | 2507.08309 | null |
| 2025-07-11 | M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning | Inclusion AI et.al. | 2507.08306 | null |
| 2025-07-11 | KAT-V1: Kwai-AutoThink Technical Report | Zizheng Zhan et.al. | 2507.08297 | null |
| 2025-07-11 | Invariant-based Robust Weights Watermark for Large Language Models | Qingxiao Guo et.al. | 2507.08288 | null |
| 2025-07-11 | Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training | Aleksei Ilin et.al. | 2507.08284 | null |
| 2025-07-11 | Agent Safety Alignment via Reinforcement Learning | Zeyang Sha et.al. | 2507.08270 | null |
| 2025-07-11 | A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning | Hiroshi Yoshihara et.al. | 2507.08267 | null |
| 2025-07-11 | CL3R: 3D Reconstruction and Contrastive Learning for Enhanced Robotic Manipulation Representations | Wenbo Cui et.al. | 2507.08262 | null |
| 2025-07-11 | Quantum-Accelerated Neural Imputation with Large Language Models (LLMs) | Hossein Jamali et.al. | 2507.08255 | null |
| 2025-07-11 | Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models | Ulzee An et.al. | 2507.08254 | null |
| 2025-07-11 | Leveraging Large Language Models for Classifying App Users' Feedback | Yasaman Abedini et.al. | 2507.08250 | null |
| 2025-07-11 | Time Variation in the TeV Cosmic Ray Anisotropy with IceCube and Energy Dependence of the Solar Dipole | Perri Zilberman et.al. | 2507.08242 | null |
| 2025-07-11 | Data Generation without Function Estimation | Hadi Daneshmand et.al. | 2507.08239 | null |
| 2025-07-11 | InsightBuild: LLM-Powered Causal Reasoning in Smart Building Systems | Pinaki Prasad Guha Neogi et.al. | 2507.08235 | null |
| 2025-07-11 | Can LLMs Reliably Simulate Real Students' Abilities in Mathematics and Reading Comprehension? | KV Aditya Srivatsa et.al. | 2507.08232 | null |
| 2025-07-11 | Making VLMs More Robot-Friendly: Self-Critical Distillation of Low-Level Procedural Reasoning | Chan Young Park et.al. | 2507.08224 | null |
| 2025-07-10 | Effect of Static vs. Conversational AI-Generated Messages on Colorectal Cancer Screening Intent: a Randomized Controlled Trial | Neil K. R. Sehgal et.al. | 2507.08211 | null |
| 2025-07-10 | Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions | Quanyan Zhu et.al. | 2507.08208 | null |
| 2025-07-10 | A Dynamic Stackelberg Game Framework for Agentic AI Defense Against LLM Jailbreaking | Zhengye Han et.al. | 2507.08207 | null |
| 2025-07-10 | TruthTorchLM: A Comprehensive Library for Predicting Truthfulness in LLM Outputs | Duygu Nur Yaldiz et.al. | 2507.08203 | null |
| 2025-07-10 | Consciousness as a Jamming Phase | Kaichen Ouyang et.al. | 2507.08197 | null |
| 2025-07-10 | CTRLS: Chain-of-Thought Reasoning via Latent State-Transition | Junda Wu et.al. | 2507.08182 | null |
| 2025-07-10 | Analysis of Propaganda in Tweets From Politically Biased Sources | Vivek Sharma et.al. | 2507.08169 | null |
| 2025-07-10 | KP-A: A Unified Network Knowledge Plane for Catalyzing Agentic Network Intelligence | Yun Tang et.al. | 2507.08164 | null |
| 2025-07-10 | ALCo-FM: Adaptive Long-Context Foundation Model for Accident Prediction | Pinaki Prasad Guha Neogi et.al. | 2507.08153 | null |
| 2025-07-10 | Distilling Empathy from Large Language Models | Henry J. Xie et.al. | 2507.08151 | null |
| 2025-07-10 | Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores | Vivek Chari et.al. | 2507.08143 | null |
| 2025-07-10 | GRASP: Generic Reasoning And SPARQL Generation across Knowledge Graphs | Sebastian Walter et.al. | 2507.08107 | null |
| 2025-07-10 | Low-rank Momentum Factorization for Memory Efficient Training | Pouria Mahdavinia et.al. | 2507.08091 | null |
| 2025-07-10 | Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions | Simon Matrenok et.al. | 2507.08068 | null |
| 2025-07-10 | Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs | Ziyue Li et.al. | 2507.07996 | null |
| 2025-07-10 | Multigranular Evaluation for Brain Visual Decoding | Weihao Xia et.al. | 2507.07993 | null |
| 2025-07-10 | Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs | Jeongseok Hyun et.al. | 2507.07990 | null |
| 2025-07-10 | Automating Expert-Level Medical Reasoning Evaluation of Large Language Models | Shuang Zhou et.al. | 2507.07988 | null |
| 2025-07-10 | OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding | JingLi Lin et.al. | 2507.07984 | null |
| 2025-07-10 | Performance and Practical Considerations of Large and Small Language Models in Clinical Decision Support in Rheumatology | Sabine Felde et.al. | 2507.07983 | null |
| 2025-07-10 | Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling | Haoyu Wu et.al. | 2507.07982 | null |
| 2025-07-10 | Defending Against Prompt Injection With a Few DefensiveTokens | Sizhe Chen et.al. | 2507.07974 | null |
| 2025-07-10 | Scaling RL to Long Videos | Yukang Chen et.al. | 2507.07966 | null |
| 2025-07-10 | Dynamic Chunking for End-to-End Hierarchical Sequence Modeling | Sukjun Hwang et.al. | 2507.07955 | null |
| 2025-07-10 | Input Conditioned Layer Dropping in Speech Foundation Models | Abdul Hannan et.al. | 2507.07954 | null |
| 2025-07-10 | Low Resource Reconstruction Attacks Through Benign Prompts | Sol Yarkoni et.al. | 2507.07947 | null |
| 2025-07-10 | Can Large Language Models Improve Phishing Defense? A Large-Scale Controlled Experiment on Warning Dialogue Explanations | Federico Maria Cau et.al. | 2507.07916 | null |
| 2025-07-10 | MIRA: A Novel Framework for Fusing Modalities in Medical RAG | Jinhong Wang et.al. | 2507.07902 | null |
| 2025-07-10 | An Integrated Framework of Prompt Engineering and Multidimensional Knowledge Graphs for Legal Dispute Analysis | Mingda Zhang et.al. | 2507.07893 | null |
| 2025-07-10 | Automating MD simulations for Proteins using Large language Models: NAMD-Agent | Achuth Chandrasekhar et.al. | 2507.07887 | null |
| 2025-07-10 | Opting Out of Generative AI: a Behavioral Experiment on the Role of Education in Perplexity AI Avoidance | Roberto Ulloa et.al. | 2507.07881 | null |
| 2025-07-10 | LISTEN: Lightweight Industrial Sound-representable Transformer for Edge Notification | Changheon Han et.al. | 2507.07879 | null |
| 2025-07-10 | Mitigating Watermark Stealing Attacks in Generative Models via Multi-Key Watermarking | Toluwani Aremu et.al. | 2507.07871 | null |
| 2025-07-10 | DocCHA: Towards LLM-Augmented Interactive Online diagnosis System | Xinyi Liu et.al. | 2507.07870 | null |
| 2025-07-10 | THUNDER: Tile-level Histopathology image UNDERstanding benchmark | Pierre Marza et.al. | 2507.07860 | null |
| 2025-07-10 | From Ambiguity to Accuracy: The Transformative Effect of Coreference Resolution on Retrieval-Augmented Generation systems | Youngjoon Jang et.al. | 2507.07847 | null |
| 2025-07-10 | Towards Benchmarking Foundation Models for Tabular Data With Text | Martin Mráz et.al. | 2507.07829 | null |
| 2025-07-10 | MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving | Lu Xu et.al. | 2507.07818 | null |
| 2025-07-10 | Understanding and Controlling Repetition Neurons and Induction Heads in In-Context Learning | Nhi Hoai Doan et.al. | 2507.07810 | null |
| 2025-07-10 | SecureSpeech: Prompt-based Speaker and Content Protection | Belinda Soh Hui Hui et.al. | 2507.07799 | null |
| 2025-07-10 | Measuring AI Alignment with Human Flourishing | Elizabeth Hilliard et.al. | 2507.07787 | null |
| 2025-07-10 | Where are we with calibration under dataset shift in image classification? | Mélanie Roschewitz et.al. | 2507.07780 | null |
| 2025-07-10 | A Unified Empirical Risk Minimization Framework for Flexible N-Tuples Weak Supervision | Shuying Huang et.al. | 2507.07771 | null |
| 2025-07-10 | Structured Prompts, Better Outcomes? Exploring the Effects of a Structured Interface with ChatGPT in a Graduate Robotics Course | Jerome Brender et.al. | 2507.07767 | null |
| 2025-07-10 | When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance | Peizhang Shao et.al. | 2507.07748 | null |
| 2025-07-10 | On the capabilities of LLMs for classifying and segmenting time series of fruit picking motions into primitive actions | Eleni Konstantinidou et.al. | 2507.07745 | null |
| 2025-07-10 | GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing | Peiyan Zhang et.al. | 2507.07735 | null |
| 2025-07-10 | Not All Preferences are What You Need for Post-Training: Selective Alignment Strategy for Preference Optimization | Zhijin Dong et.al. | 2507.07725 | null |
| 2025-07-10 | KeyKnowledgeRAG (K^2RAG): An Enhanced RAG method for improved LLM question-answering capabilities | Hruday Markondapatnaikuni et.al. | 2507.07695 | null |
| 2025-07-10 | From Domain Documents to Requirements: Retrieval-Augmented Generation in the Space Industry | Chetan Arora et.al. | 2507.07689 | null |
| 2025-07-10 | Rationale-Enhanced Decoding for Multi-modal Chain-of-Thought | Shin'ya Yamaguchi et.al. | 2507.07685 | null |
| 2025-07-10 | Accelerating Transposed Convolutions on FPGA-based Edge Devices | Jude Haris et.al. | 2507.07683 | null |
| 2025-07-10 | Prompt Engineering for Requirements Engineering: A Literature Review and Roadmap | Kaicheng Huang et.al. | 2507.07682 | null |
| 2025-07-10 | PlanQA: A Benchmark for Spatial Reasoning in LLMs using Structured Representations | Fedor Rodionov et.al. | 2507.07644 | null |
| 2025-07-10 | FrugalRAG: Learning to retrieve and reason for multi-hop QA | Abhinav Java et.al. | 2507.07634 | null |
| 2025-07-10 | T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates | Zhitao Wang et.al. | 2507.07633 | null |
| 2025-07-10 | Exploring the Limits of Model Compression in LLMs: A Knowledge Distillation Study on QA Tasks | Joyeeta Datta et.al. | 2507.07630 | null |
| 2025-07-10 | SpatialViz-Bench: Automatically Generated Spatial Visualization Reasoning Tasks for MLLMs | Siting Wang et.al. | 2507.07610 | null |
| 2025-07-10 | Enhancing Vaccine Safety Surveillance: Extracting Vaccine Mentions from Emergency Department Triage Notes Using Fine-Tuned Large Language Models | Sedigh Khademi et.al. | 2507.07599 | null |
| 2025-07-10 | NexViTAD: Few-shot Unsupervised Cross-Domain Defect Detection via Vision Foundation Models and Multi-Task Learning | Tianwei Mu et.al. | 2507.07579 | null |
| 2025-07-10 | Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation | Yupu Liang et.al. | 2507.07572 | null |
| 2025-07-10 | CEA-LIST at CheckThat! 2025: Evaluating LLMs as Detectors of Bias and Opinion in Text | Akram Elbouanani et.al. | 2507.07539 | null |
| 2025-07-10 | MAPEX: Modality-Aware Pruning of Experts for Remote Sensing Foundation Models | Joelle Hanna et.al. | 2507.07527 | null |
| 2025-07-10 | Toward Real-World Chinese Psychological Support Dialogues: CPsDD Dataset and a Co-Evolving Multi-Agent System | Yuanchen Shi et.al. | 2507.07509 | null |
| 2025-07-10 | PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving | Mihir Parmar et.al. | 2507.07495 | null |
| 2025-07-10 | Sparse Autoencoders Reveal Interpretable Structure in Small Gene Language Models | Haoxiang Guan et.al. | 2507.07486 | null |
| 2025-07-10 | Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models | Kaiqu Liang et.al. | 2507.07484 | null |
| 2025-07-10 | General purpose models for the chemical sciences | Nawaf Alampara et.al. | 2507.07456 | null |
| 2025-07-10 | RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning | Hongzhi Zhang et.al. | 2507.07451 | null |
| 2025-07-10 | StarDojo: Benchmarking Open-Ended Behaviors of Agentic Multimodal LLMs in Production-Living Simulations with Stardew Valley | Weihao Tan et.al. | 2507.07445 | null |
| 2025-07-10 | SAND: Boosting LLM Agents with Self-Taught Action Deliberation | Yu Xia et.al. | 2507.07441 | null |
| 2025-07-10 | Towards Interpretable Time Series Foundation Models | Matthieu Boileau et.al. | 2507.07439 | null |
| 2025-07-10 | Neural networks leverage nominally quantum and post-quantum representations | Paul M. Riechers et.al. | 2507.07432 | null |
| 2025-07-10 | DrugMCTS: a drug repurposing framework combining multi-agent, RAG and Monte Carlo Tree Search | Zerui Yang et.al. | 2507.07426 | null |
| 2025-07-10 | Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning | Jingjing Jiang et.al. | 2507.07424 | null |
| 2025-07-10 | May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks | Nishit V. Pandya et.al. | 2507.07417 | null |
| 2025-07-10 | EPIC: Efficient Prompt Interaction for Text-Image Classification | Xinyao Yu et.al. | 2507.07415 | null |
| 2025-07-10 | GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation | Fardin Rastakhiz et.al. | 2507.07414 | null |
| 2025-07-10 | Hybrid LLM-Enhanced Intrusion Detection for Zero-Day Threats in IoT Networks | Mohammad F. Al-Hammouri et.al. | 2507.07413 | null |
| 2025-07-10 | Phishing Detection in the Gen-AI Era: Quantized LLMs vs Classical Models | Jikesh Thapa et.al. | 2507.07406 | null |
| 2025-07-10 | KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows | Zaifeng Pan et.al. | 2507.07400 | null |
| 2025-07-10 | Behave Your Motion: Habit-preserved Cross-category Animal Motion Transfer | Zhimin Zhang et.al. | 2507.07394 | null |
| 2025-07-10 | Learning Collective Variables from Time-lagged Generation | Seonghyun Park et.al. | 2507.07390 | null |
| 2025-07-10 | Bradley-Terry and Multi-Objective Reward Modeling Are Complementary | Zhiwei Zhang et.al. | 2507.07375 | null |
| 2025-07-10 | PacGDC: Label-Efficient Generalizable Depth Completion with Projection Ambiguity and Consistency | Haotian Wang et.al. | 2507.07374 | null |
| 2025-07-09 | On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment | Sarah Ball et.al. | 2507.07341 | null |
| 2025-07-09 | Bridging the Plausibility-Validity Gap by Fine-Tuning a Reasoning-Enhanced LLM for Chemical Synthesis and Discovery | Malikussaid et.al. | 2507.07328 | null |
| 2025-07-09 | Frontier LLMs Still Struggle with Simple Reasoning Tasks | Alan Malek et.al. | 2507.07313 | null |
| 2025-07-09 | Multi-Agent Retrieval-Augmented Framework for Evidence-Based Counterspeech Against Health Misinformation | Anirban Saha Anik et.al. | 2507.07307 | null |
| 2025-07-09 | Application of LLMs to Multi-Robot Path Planning and Task Allocation | Ashish Kumar et.al. | 2507.07302 | null |
| 2025-07-09 | Time Series Foundation Models for Multivariate Financial Time Series Forecasting | Ben A. Marconi et.al. | 2507.07296 | null |
| 2025-07-09 | Thermodynamic Prediction Enabled by Automatic Dataset Building and Machine Learning | Juejing Liu et.al. | 2507.07293 | null |
| 2025-07-09 | Open Source Planning & Control System with Language Agents for Autonomous Scientific Discovery | Licong Xu et.al. | 2507.07257 | null |
| 2025-07-09 | A Language-Driven Framework for Improving Personalized Recommendations: Merging LLMs with Traditional Algorithms | Aaron Goldstein et.al. | 2507.07251 | null |
| 2025-07-09 | Medical Red Teaming Protocol of Language Models: On the Importance of User Perspectives in Healthcare Settings | Minseon Kim et.al. | 2507.07248 | null |
| 2025-07-09 | Attentions Under the Microscope: A Comparative Study of Resource Utilization for Variants of Self-Attention | Zhengyu Tian et.al. | 2507.07247 | null |
| 2025-07-09 | An Information-Theoretic Perspective on Multi-LLM Uncertainty Estimation | Maya Kruse et.al. | 2507.07236 | null |
| 2025-07-09 | SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains | Krithika Ramesh et.al. | 2507.07229 | null |
| 2025-07-09 | Compute Can't Handle the Truth: Why Communication Tax Prioritizes Memory and Interconnects in Modern AI Infrastructure | Myoungsoo Jung et.al. | 2507.07223 | null |
| 2025-07-09 | Neurosymbolic Feature Extraction for Identifying Forced Labor in Supply Chains | Zili Wang et.al. | 2507.07217 | null |
| 2025-07-09 | Scale leads to compositional generalization | Florian Redhardt et.al. | 2507.07207 | null |
| 2025-07-09 | State-Inference-Based Prompting for Natural Language Trading with Game NPCs | Minkyung Kim et.al. | 2507.07203 | null |
| 2025-07-09 | A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality | Mohamed Elmoghany et.al. | 2507.07202 | null |
| 2025-07-09 | Combining Pre-Trained Models for Enhanced Feature Representation in Reinforcement Learning | Elia Piccoli et.al. | 2507.07197 | null |
| 2025-07-09 | Bridging the Last Mile of Prediction: Enhancing Time Series Forecasting with Conditional Guided Flow Matching | Huibo Xu et.al. | 2507.07192 | null |
| 2025-07-09 | Prompt Perturbations Reveal Human-Like Biases in LLM Survey Responses | Jens Rupprecht et.al. | 2507.07188 | null |
| 2025-07-09 | Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs | Itay Itzhak et.al. | 2507.07186 | null |
| 2025-07-09 | Interpretable EEG-to-Image Generation with Semantic Prompts | Arshak Rezvani et.al. | 2507.07157 | null |
| 2025-07-09 | Evaluating Retrieval-Augmented Generation Agents for Autonomous Scientific Discovery in Astrophysics | Xueqing Xu et.al. | 2507.07155 | null |
| 2025-07-09 | Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor | Vatsal Agarwal et.al. | 2507.07106 | null |
| 2025-07-09 | Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models | Tiezheng Zhang et.al. | 2507.07104 | null |
| 2025-07-09 | Evaluating Attribute Confusion in Fashion Text-to-Image Generation | Ziyue Liu et.al. | 2507.07079 | null |
| 2025-07-09 | 5C Prompt Contracts: A Minimalist, Creative-Friendly, Token-Efficient Design Framework for Individual and SME LLM Usage | Ugur Ari et.al. | 2507.07045 | null |
| 2025-07-09 | UniConv: Unifying Retrieval and Response Generation for Large Language Models in Conversations | Fengran Mo et.al. | 2507.07030 | null |
| 2025-07-09 | First Return, Entropy-Eliciting Explore | Tianyu Zheng et.al. | 2507.07017 | null |
| 2025-07-09 | Integrating Pathology Foundation Models and Spatial Transcriptomics for Cellular Decomposition from Histology Images | Yutong Sun et.al. | 2507.07013 | null |
| 2025-07-09 | GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning | S M Taslim Uddin Raju et.al. | 2507.07006 | null |
| 2025-07-09 | Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs | Yahan Yu et.al. | 2507.06999 | null |
| 2025-07-09 | MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation | Qilong Xing et.al. | 2507.06992 | null |
| 2025-07-09 | Are They All Good? Evaluating the Quality of CoTs in LLM-based Code Generation | Binquan Zhang et.al. | 2507.06980 | null |
| 2025-07-09 | Hallucinating 360°: Panoramic Street-View Generation via Local Scenes Diffusion and Probabilistic Prompting | Fei Teng et.al. | 2507.06971 | null |
| 2025-07-09 | Scaling Towards the Information Boundary of Instruction Set: InfinityInstruct-Subject Technical Report | Li Du et.al. | 2507.06968 | null |
| 2025-07-09 | Investigating the Robustness of Retrieval-Augmented Generation at the Query Level | Sezen Perçin et.al. | 2507.06956 | null |
| 2025-07-09 | What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models | Keyon Vafa et.al. | 2507.06952 | null |
| 2025-07-09 | Rethinking Verification for LLM Code Generation: From Generation to Testing | Zihan Ma et.al. | 2507.06920 | null |
| 2025-07-09 | Exploring LLMs for Predicting Tutor Strategy and Student Outcomes in Dialogues | Fareya Ikram et.al. | 2507.06910 | null |
| 2025-07-09 | MultiJustice: A Chinese Dataset for Multi-Party, Multi-Charge Legal Prediction | Xiao Wang et.al. | 2507.06909 | null |
| 2025-07-09 | SCoRE: Streamlined Corpus-based Relation Extraction using Multi-Label Contrastive Learning and Bayesian kNN | Luca Mariotti et.al. | 2507.06895 | null |
| 2025-07-09 | Developing and Maintaining an Open-Source Repository of AI Evaluations: Challenges and Insights | Alexandra Abbas et.al. | 2507.06893 | null |
| 2025-07-09 | Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model | Jing Liang et.al. | 2507.06892 | null |
| 2025-07-09 | DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models | Liang Wang et.al. | 2507.06853 | null |
| 2025-07-09 | The Dark Side of LLMs Agent-based Attacks for Complete Computer Takeover | Matteo Lupinacci et.al. | 2507.06850 | null |
| 2025-07-09 | Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation | Tao Feng et.al. | 2507.06830 | null |
| 2025-07-09 | Adaptive Termination for Multi-round Parallel Reasoning: An Universal Semantic Entropy-Guided Framework | Zenan Xu et.al. | 2507.06829 | null |
| 2025-07-09 | Democratizing High-Fidelity Co-Speech Gesture Video Generation | Xu Yang et.al. | 2507.06812 | null |
| 2025-07-09 | Text to model via SysML: Automated generation of dynamical system computational models from unstructured natural language text via enhanced System Modeling Language diagrams | Matthew Anderson Hendricks et.al. | 2507.06803 | null |
| 2025-07-09 | Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining: Method, Evaluation and Applications | Seonwu Kim et.al. | 2507.06795 | null |
| 2025-07-09 | Checklist Engineering Empowers Multilingual LLM Judges | Mohammad Ghiasvand Mohammadkhani et.al. | 2507.06774 | null |
| 2025-07-09 | Leveraging LLMs for Semantic Conflict Detection via Unit Test Generation | Nathalia Barbosa et.al. | 2507.06762 | null |
| 2025-07-09 | LOVON: Legged Open-Vocabulary Object Navigator | Daojie Peng et.al. | 2507.06747 | null |
| 2025-07-09 | PenTest2.0: Towards Autonomous Privilege Escalation Using GenAI | Haitham S. Al-Sinani et.al. | 2507.06742 | null |
| 2025-07-09 | Hierarchical Feature Alignment for Gloss-Free Sign Language Translation | Sobhan Asasi et.al. | 2507.06732 | null |
| 2025-07-09 | On the Effect of Uncertainty on Layer-wise Inference Dynamics | Sunwoo Kim et.al. | 2507.06722 | null |
| 2025-07-09 | A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding | Zhenyang Liu et.al. | 2507.06719 | null |
| 2025-07-09 | CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs | Garapati Keerthana et.al. | 2507.06715 | null |
| 2025-07-09 | Elite Polarization in European Parliamentary Speeches: a Novel Measurement Approach Using Large Language Models | Gennadii Iakovlev et.al. | 2507.06658 | null |
| 2025-07-09 | Deep Disentangled Representation Network for Treatment Effect Estimation | Hui Meng et.al. | 2507.06650 | null |
| 2025-07-09 | EXAONE Path 2.0: Pathology Foundation Model with End-to-End Supervision | Myungjang Pyeon et.al. | 2507.06639 | null |
| 2025-07-09 | UniOD: A Universal Model for Outlier Detection across Diverse Domains | Dazhi Fu et.al. | 2507.06624 | null |
| 2025-07-09 | Expediting data extraction using a large language model (LLM) and scoping review protocol: a methodological study within a complex scoping review | James Stewart-Evans et.al. | 2507.06623 | null |
| 2025-07-09 | FuDoBa: Fusing Document and Knowledge Graph-based Representations with Bayesian Optimisation | Boshko Koloski et.al. | 2507.06622 | null |
| 2025-07-09 | Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation | Anshuk Uppal et.al. | 2507.06613 | null |
| 2025-07-09 | From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization | Xinjie Chen et.al. | 2507.06573 | null |
| 2025-07-09 | SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference | Qian Chen et.al. | 2507.06567 | null |
| 2025-07-09 | The Flaws of Others: An LLM-driven Framework for Scientific Knowledge Production | Juan B. Gutiérrez et.al. | 2507.06565 | null |
| 2025-07-09 | SkyVLN: Vision-and-Language Navigation and NMPC Control for UAVs in Urban Environments | Tianshun Li et.al. | 2507.06564 | null |
| 2025-07-09 | SPEAR: Subset-sampled Performance Evaluation via Automated Ground Truth Generation for RAG | Zou Yuheng et.al. | 2507.06554 | null |
| 2025-07-09 | Large Language Model for Extracting Complex Contract Information in Industrial Scenes | Yunyang Cao et.al. | 2507.06539 | null |
| 2025-07-09 | InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes under Herd Behavior | Huisheng Wang et.al. | 2507.06528 | null |
| 2025-07-09 | FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation | Liqiang Jing et.al. | 2507.06523 | null |
| 2025-07-09 | SpindleKV: A Novel KV Cache Reduction Method Balancing Both Shallow and Deep Layers | Zicong Tang et.al. | 2507.06517 | null |
| 2025-07-09 | QUEST: Query Optimization in Unstructured Document Analysis | Zhaoze Sun et.al. | 2507.06515 | null |
| 2025-07-09 | Towards LLM-based Root Cause Analysis of Hardware Design Failures | Siyu Qiu et.al. | 2507.06512 | null |
| 2025-07-09 | Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection | Yupeng Hu et.al. | 2507.06510 | null |
| 2025-07-09 | GR-LLMs: Recent Advances in Generative Recommendation Based on Large Language Models | Zhen Yang et.al. | 2507.06507 | null |
| 2025-07-09 | Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings | Russell Taylor et.al. | 2507.06506 | null |
| 2025-07-09 | MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models | Yiwen Liu et.al. | 2507.06502 | null |
| 2025-07-09 | On the Robustness of Verbal Confidence of LLMs in Adversarial Attacks | Stephen Obadinma et.al. | 2507.06489 | null |
| 2025-07-09 | Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning | Ziyang Wang et.al. | 2507.06485 | null |
| 2025-07-09 | 3D-Generalist: Self-Improving Vision-Language-Action Models for Crafting 3D Worlds | Fan-Yun Sun et.al. | 2507.06484 | null |
| 2025-07-09 | Learning Japanese with Jouzu: Interaction Outcomes with Stylized Dialogue Fictional Agents | Zackary Rackauckas et.al. | 2507.06483 | null |
| 2025-07-09 | IMPACT: Industrial Machine Perception via Acoustic Cognitive Transformer | Changheon Han et.al. | 2507.06481 | null |
| 2025-07-09 | Generative Lagrangian data assimilation for ocean dynamics under extreme sparsity | Niloofar Asefi et.al. | 2507.06479 | null |
| 2025-07-09 | Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models | Aaron Dharna et.al. | 2507.06466 | null |
| 2025-07-09 | Evaluating Efficiency and Novelty of LLM-Generated Code for Graph Analysis | Atieh Barati Nia et.al. | 2507.06463 | null |
| 2025-07-08 | A Semantic Parsing Framework for End-to-End Time Normalization | Xin Su et.al. | 2507.06450 | null |
| 2025-07-08 | Perception-Aware Policy Optimization for Multimodal Reasoning | Zhenhailong Wang et.al. | 2507.06448 | null |
| 2025-07-08 | Exploring Task Performance with Interpretable Models via Sparse Auto-Encoders | Shun Wang et.al. | 2507.06427 | null |
| 2025-07-08 | Reward Models Can Improve Themselves: Reward-Guided Adversarial Failure Mode Discovery for Robust Reward Modeling | Pankayaraj Pathmanathan et.al. | 2507.06419 | null |
| 2025-07-08 | PAST: A multimodal single-cell foundation model for histopathology and spatial transcriptomics in cancer | Changchun Yang et.al. | 2507.06418 | null |
| 2025-07-08 | Voltage Regulation in Distribution Systems with Data Center Loads | Yize Chen et.al. | 2507.06416 | null |
| 2025-07-08 | An AI-Driven Thermal-Fluid Testbed for Advanced Small Modular Reactors: Integration of Digital Twin and Large Language Models | Doyeong Lim et.al. | 2507.06399 | null |
| 2025-07-08 | SLDB: An End-To-End Heterogeneous System-on-Chip Benchmark Suite for LLM-Aided Design | Elisavet Lydia Alvanaki et.al. | 2507.06376 | null |
| 2025-07-08 | Bridging AI and Software Security: A Comparative Vulnerability Assessment of LLM Agent Deployment Paradigms | Tarek Gasmi et.al. | 2507.06323 | null |
| 2025-07-08 | Too Human to Model:The Uncanny Valley of LLMs in Social Simulation -- When Generative Language Agents Misalign with Modelling Principles | Yongchao Zeng et.al. | 2507.06310 | null |
| 2025-07-08 | Humans overrely on overconfident language models, across languages | Neil Rathi et.al. | 2507.06306 | null |
| 2025-07-08 | RSRefSeg 2: Decoupling Referring Remote Sensing Image Segmentation with Foundation Models | Keyan Chen et.al. | 2507.06231 | null |
| 2025-07-08 | Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers | Zhiyuan Peng et.al. | 2507.06223 | null |
| 2025-07-08 | Is Diversity All You Need for Scalable Robotic Manipulation? | Modi Shi et.al. | 2507.06219 | null |
| 2025-07-08 | A Survey on Latent Reasoning | Rui-Jie Zhu et.al. | 2507.06203 | null |
| 2025-07-08 | UQLM: A Python Package for Uncertainty Quantification in Large Language Models | Dylan Bouchard et.al. | 2507.06196 | null |
| 2025-07-08 | SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads | Jiale Lao et.al. | 2507.06192 | null |
| 2025-07-08 | Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review | Zhicheng Lin et.al. | 2507.06185 | null |
| 2025-07-08 | Data-Semantics-Aware Recommendation of Diverse Pivot Tables | Whanhee Cho et.al. | 2507.06171 | null |
| 2025-07-09 | Skywork-R1V3 Technical Report | Wei Shen et.al. | 2507.06167 | null |
| 2025-07-08 | Evaluation of Habitat Robotics using Large Language Models | William Li et.al. | 2507.06157 | null |
| 2025-07-08 | Large Language Models Predict Human Well-being -- But Not Equally Everywhere | Pat Pataranutaporn et.al. | 2507.06141 | null |
| 2025-07-08 | Coding Triangle: How Does Large Language Model Understand Code? | Taolin Zhang et.al. | 2507.06138 | null |
| 2025-07-08 | PrefixAgent: An LLM-Powered Design Framework for Efficient Prefix Adder Optimization | Dongsheng Zuo et.al. | 2507.06127 | null |
| 2025-07-09 | Omni-Video: Democratizing Unified Video Understanding and Generation | Zhiyu Tan et.al. | 2507.06119 | null |
| 2025-07-08 | Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis | Xintong Hu et.al. | 2507.06116 | null |
| 2025-07-08 | Reflections Unlock: Geometry-Aware Reflection Disentanglement in 3D Gaussian Splatting for Photorealistic Scenes Rendering | Jiayi Song et.al. | 2507.06103 | null |
| 2025-07-09 | FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models | Bo Pang et.al. | 2507.06057 | null |
| 2025-07-08 | Entropy-Memorization Law: Evaluating Memorization Difficulty of Data in LLMs | Yizhan Huang et.al. | 2507.06056 | null |
| 2025-07-08 | Kernel Trace Distance: Quantum Statistical Metric between Measures through RKHS Density Operators | Arturo Castellanos et.al. | 2507.06055 | null |
| 2025-07-08 | Hierarchical Interaction Summarization and Contrastive Prompting for Explainable Recommendations | Yibin Liu et.al. | 2507.06044 | null |
| 2025-07-08 | CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations | Xiaohu Li et.al. | 2507.06043 | null |
| 2025-07-08 | CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation | Kushal Gajjar et.al. | 2507.06013 | null |
| 2025-07-08 | DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations | Nicholas Popovič et.al. | 2507.05997 | null |
| 2025-07-08 | Development and Evaluation of HopeBot: an LLM-based chatbot for structured and interactive PHQ-9 depression screening | Zhijun Guo et.al. | 2507.05984 | null |
| 2025-07-08 | Multi-Agent Debate Strategies to Enhance Requirements Engineering with Large Language Models | Marc Oriol et.al. | 2507.05981 | null |
| 2025-07-08 | RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages | Gabriel Chua et.al. | 2507.05980 | null |
| 2025-07-08 | Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval | Haiwen Li et.al. | 2507.05970 | null |
| 2025-07-08 | OpenFActScore: Open-Source Atomic Evaluation of Factuality in Text Generation | Lucas Fonseca Lage et.al. | 2507.05965 | null |
| 2025-07-08 | Evaluation of Large Language Model-Driven AutoML in Data and Model Management from Human-Centered Perspective | Jiapeng Yao et.al. | 2507.05962 | null |
| 2025-07-08 | A Wireless Foundation Model for Multi-Task Prediction | Yucheng Sheng et.al. | 2507.05938 | null |
| 2025-07-08 | BlueLM-2.5-3B Technical Report | Baojiao Xiong et.al. | 2507.05934 | null |
| 2025-07-08 | Few-shot text-based emotion detection | Teodor-George Marchitan et.al. | 2507.05918 | null |
| 2025-07-08 | Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis | Gholamali Aminian et.al. | 2507.05913 | null |
| 2025-07-08 | AI-Reporter: A Path to a New Genre of Scientific Communication | Gerd Graßhoff et.al. | 2507.05903 | null |
| 2025-07-08 | Psychometric Item Validation Using Virtual Respondents with Trait-Response Mediators | Sungjib Lim et.al. | 2507.05890 | null |
| 2025-07-08 | Current Practices for Building LLM-Powered Reasoning Tools Are Ad Hoc -- and We Can Do Better | Aaron Bembenek et.al. | 2507.05886 | null |
| 2025-07-08 | RecRankerEval: A Flexible and Extensible Framework for Top-k LLM-based Recommendation | Zeyuan Meng et.al. | 2507.05880 | null |
| 2025-07-08 | KERAG_R: Knowledge-Enhanced Retrieval-Augmented Generation for Recommendation | Zeyuan Meng et.al. | 2507.05863 | null |
| 2025-07-08 | USIGAN: Unbalanced Self-Information Feature Transport for Weakly Paired Image IHC Virtual Staining | Yue Peng et.al. | 2507.05843 | null |
| 2025-07-08 | Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models | L'ea Dubois et.al. | 2507.05822 | null |
| 2025-07-08 | 2D Instance Editing in 3D Space | Yuhuan Xie et.al. | 2507.05819 | null |
| 2025-07-08 | Affective-ROPTester: Capability and Bias Analysis of LLMs in Predicting Retinopathy of Prematurity | Shuai Zhao et.al. | 2507.05816 | null |
| 2025-07-08 | Just Say Better or Worse: A Human-AI Collaborative Framework for Medical Image Segmentation Without Manual Annotations | Yizhe Zhang et.al. | 2507.05815 | null |
| 2025-07-08 | Improving Robustness of Foundation Models in Domain Adaptation with Soup-Adapters | Marco Roschkowski et.al. | 2507.05807 | null |
| 2025-07-08 | DREAM: Document Reconstruction via End-to-end Autoregressive Model | Xin Li et.al. | 2507.05805 | null |
| 2025-07-08 | Creating a customisable freely-accessible Socratic AI physics tutor | Eugenio Tufino et.al. | 2507.05795 | null |
| 2025-07-08 | TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model | Yujie Hu et.al. | 2507.05790 | null |
| 2025-07-08 | Flippi: End To End GenAI Assistant for E-Commerce | Anand A. Rajasekar et.al. | 2507.05788 | null |
| 2025-07-08 | Text-Guided Token Communication for Wireless Image Transmission | Bole Liu et.al. | 2507.05781 | null |
| 2025-07-08 | LeAD: The LLM Enhanced Planning System Converged with End-to-end Autonomous Driving | Yuhang Zhang et.al. | 2507.05754 | null |
| 2025-07-08 | Jigsaw: Training Multi-Billion-Parameter AI Weather Models with Optimized Model Parallelism | Deifilia Kieckhefen et.al. | 2507.05753 | null |
| 2025-07-08 | DocTalk: Scalable Graph-based Dialogue Synthesis for Enhancing LLM Conversational Capabilities | Jing Yang Lee et.al. | 2507.05750 | null |
| 2025-07-08 | Tissue Concepts v2: a Supervised Foundation Model for whole slide images | Till Nicke et.al. | 2507.05742 | null |
| 2025-07-08 | When Transformers Meet Recommenders: Integrating Self-Attentive Sequential Recommendation with Fine-Tuned LLMs | Kechen Liu et.al. | 2507.05733 | null |
| 2025-07-08 | ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark | He Wang et.al. | 2507.05727 | null |
| 2025-07-08 | Large Language Models for Agent-Based Modelling: Current and possible uses across the modelling cycle | Loïs Vanhée et.al. | 2507.05723 | null |
| 2025-07-08 | HIRAG: Hierarchical-Thought Instruction-Tuning Retrieval-Augmented Generation | YiHan Jiao et.al. | 2507.05714 | null |
| 2025-07-08 | DRAGON: Dynamic RAG Benchmark On News | Fedor Chernogorskii et.al. | 2507.05713 | null |
| 2025-07-08 | Smoothie-Qwen: Post-Hoc Smoothing to Reduce Language Bias in Multilingual LLMs | SeungWon Ji et.al. | 2507.05686 | null |
| 2025-07-08 | MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos | Rongsheng Wang et.al. | 2507.05675 | null |
| 2025-07-08 | Integrating Diffusion-based Multi-task Learning with Online Reinforcement Learning for Robust Quadruped Robot Control | Xinyao Qin et.al. | 2507.05674 | null |
| 2025-07-08 | TuneShield: Mitigating Toxicity in Conversational AI while Fine-tuning on Untrusted Data | Aravind Cheruvu et.al. | 2507.05660 | null |
| 2025-07-08 | LLMs are Introvert | Litian Zhang et.al. | 2507.05638 | null |
| 2025-07-08 | SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression | Yiqiao Jin et.al. | 2507.05633 | null |
| 2025-07-08 | Enhancing Student Learning with LLM-Generated Retrieval Practice Questions: An Empirical Study in Data Science Courses | Yuan An et.al. | 2507.05629 | null |
| 2025-07-08 | DreamGrasp: Zero-Shot 3D Multi-Object Reconstruction from Partial-View Images for Robotic Manipulation | Young Hun Kim et.al. | 2507.05627 | null |
| 2025-07-08 | Flipping Knowledge Distillation: Leveraging Small Models' Expertise to Enhance LLMs in Text Matching | Mingzhe Li et.al. | 2507.05617 | null |
| 2025-07-08 | Domain adaptation of large language models for geotechnical applications | Lei Fan et.al. | 2507.05613 | null |
| 2025-07-08 | MMW: Side Talk Rejection Multi-Microphone Whisper on Smart Glasses | Yang Liu et.al. | 2507.05609 | null |
| 2025-07-08 | Structured Task Solving via Modular Embodied Intelligence: A Case Study on Rubik's Cube | Chongshan Fan et.al. | 2507.05607 | null |
| 2025-07-08 | Self-Review Framework for Enhancing Instruction Following Capability of LLM | Sihyun Park et.al. | 2507.05598 | null |
| 2025-07-08 | PaddleOCR 3.0 Technical Report | Cheng Cui et.al. | 2507.05595 | null |
| 2025-07-08 | MLlm-DR: Towards Explainable Depression Recognition with MultiModal Large Language Models | Wei Zhang et.al. | 2507.05591 | null |
| 2025-07-08 | The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation | Alexander Xiong et.al. | 2507.05578 | null |
| 2025-07-08 | Beyond Retrieval: Ensembling Cross-Encoders and GPT Rerankers with LLMs for Biomedical QA | Shashank Verma et.al. | 2507.05577 | null |
| 2025-07-08 | Prompt Migration: Stabilizing GenAI Applications with Evolving Large Language Models | Shivani Tripathi et.al. | 2507.05573 | null |
| 2025-07-08 | ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models | Jiaxu Tian et.al. | 2507.05568 | null |
| 2025-07-08 | Search-based Selection of Metamorphic Relations for Optimized Robustness Testing of Large Language Models | Sangwon Hyun et.al. | 2507.05565 | null |
| 2025-07-08 | Enhancing Test-Time Scaling of Large Language Models with Hierarchical Retrieval-Augmented MCTS | Alex ZH Dou et.al. | 2507.05557 | null |
| 2025-07-08 | A Malliavin calculus approach to score functions in diffusion generative models | Ehsan Mirafzali et.al. | 2507.05550 | null |
| 2025-07-07 | SenseCF: LLM-Prompted Counterfactuals for Intervention and Sensor Data Augmentation | Shovito Barua Soumma et.al. | 2507.05541 | null |
| 2025-07-07 | Conversational Education at Scale: A Multi-LLM Agent Workflow for Procedural Learning and Pedagogic Quality Assessment | Jiahuan Pei et.al. | 2507.05528 | null |
| 2025-07-07 | Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications | Jean-Philippe Corbeil et.al. | 2507.05517 | null |
| 2025-07-07 | Tool for Supporting Debugging and Understanding of Normative Requirements Using LLMs | Alex Kleijwegt et.al. | 2507.05504 | null |
| 2025-07-07 | MolFORM: Multi-modal Flow Matching for Structure-Based Drug Design | Jie Huang et.al. | 2507.05503 | null |
| 2025-07-07 | Deep Research Comparator: A Platform For Fine-grained Human Annotations of Deep Research Agents | Prahaladh Chandrahasan et.al. | 2507.05495 | null |
| 2025-07-07 | MBFormer: A General Transformer-based Learning Paradigm for Many-body Interactions in Real Materials | Bowen Hou et.al. | 2507.05480 | null |
| 2025-07-07 | Dense and comeager conjugacy classes in zero-dimensional dynamics | Michal Doucha et.al. | 2507.05474 | null |
| 2025-07-07 | Inaugural MOASEI Competition at AAMAS'2025: A Technical Report | Ceferino Patino et.al. | 2507.05469 | null |
| 2025-07-07 | Risk-Aware Aerocapture Guidance Through a Probabilistic Indicator Function | Grace E. Calkins et.al. | 2507.05454 | null |
| 2025-07-07 | On the Semantics of Large Language Models | Martin Schuele et.al. | 2507.05448 | null |
| 2025-07-07 | PhoniTale: Phonologically Grounded Mnemonic Generation for Typologically Distant Language Pairs | Sana Kang et.al. | 2507.05444 | null |
| 2025-07-07 | Mastering Regional 3DGS: Locating, Initializing, and Editing with Diverse 2D Priors | Lanqing Guo et.al. | 2507.05426 | null |
| 2025-07-07 | "Lost-in-the-Later": Framework for Quantifying Contextual Grounding in Large Language Models | Yufei Tao et.al. | 2507.05424 | null |
| 2025-07-07 | Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning | Jaedong Hwang et.al. | 2507.05418 | null |
| 2025-07-07 | PBE Meets LLM: When Few Examples Aren't Few-Shot Enough | Shuning Zhang et.al. | 2507.05403 | null |
| 2025-07-07 | Neural-Driven Image Editing | Pengfei Zhou et.al. | 2507.05397 | null |
| 2025-07-07 | Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences | Guillem Ramírez et.al. | 2507.05391 | null |
| 2025-07-07 | From General to Specialized: The Need for Foundational Models in Agriculture | Vishal Nedungadi et.al. | 2507.05390 | null |
| 2025-07-07 | Reinforcement Fine-Tuning Naturally Mitigates Forgetting in Continual Post-Training | Song Lai et.al. | 2507.05386 | null |
| 2025-07-07 | Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing | Chun-Hsiao Yeh et.al. | 2507.05259 | null |
| 2025-07-07 | Spatio-Temporal LLM: Reasoning about Environments and Actions | Haozhen Zheng et.al. | 2507.05258 | null |
| 2025-07-07 | Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions | Yuanzhe Hu et.al. | 2507.05257 | null |
| 2025-07-07 | Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning | Yana Wei et.al. | 2507.05255 | null |
| 2025-07-07 | Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models | Ziqi Miao et.al. | 2507.05248 | null |
| 2025-07-07 | Modeling Latent Partner Strategies for Adaptive Zero-Shot Human-Agent Collaboration | Benjamin Li et.al. | 2507.05244 | null |
| 2025-07-07 | StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling | Meng Wei et.al. | 2507.05240 | null |
| 2025-07-07 | All in One: Visual-Description-Guided Unified Point Cloud Segmentation | Zongyan Han et.al. | 2507.05211 | null |
| 2025-07-07 | MedGemma Technical Report | Andrew Sellergren et.al. | 2507.05201 | null |
| 2025-07-07 | CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale | Jonathan Hyun et.al. | 2507.05178 | null |
| 2025-07-07 | OpenS2S: Advancing Open-Source End-to-End Empathetic Large Speech Language Model | Chen Wang et.al. | 2507.05177 | null |
| 2025-07-07 | A Dynamical Systems Perspective on the Analysis of Neural Networks | Dennis Chemnitz et.al. | 2507.05164 | null |
| 2025-07-07 | 4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture | Yutian Chen et.al. | 2507.05163 | null |
| 2025-07-07 | AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models | Chinnappa Guggilla et.al. | 2507.05157 | null |
| 2025-07-07 | Interpretable Mnemonic Generation for Kanji Learning via Expectation-Maximization | Jaewook Lee et.al. | 2507.05137 | null |
| 2025-07-07 | LERa: Replanning with Visual Feedback in Instruction Following | Svyatoslav Pchelintsev et.al. | 2507.05135 | null |
| 2025-07-07 | An Evaluation of Large Language Models on Text Summarization Tasks Using Prompt Engineering Techniques | Walid Mohamed Aly et.al. | 2507.05123 | null |
| 2025-07-07 | LVM4CSI: Enabling Direct Application of Pre-Trained Large Vision Models for Wireless Channel Tasks | Jiajia Guo et.al. | 2507.05121 | null |
| 2025-07-07 | VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots | Danil S. Grigorev et.al. | 2507.05118 | null |
| 2025-07-07 | DICE: Discrete inverse continuity equation for learning population dynamics | Tobias Blickhan et.al. | 2507.05107 | null |
| 2025-07-07 | The Hidden Threat in Plain Text: Attacking RAG Data Loaders | Alberto Castagnaro et.al. | 2507.05093 | null |
| 2025-07-07 | Gaussian approximation for non-linearity parameter estimation in perturbed random fields on the sphere | Claudio Durastanti et.al. | 2507.05074 | null |
| 2025-07-07 | ICAS: Detecting Training Data from Autoregressive Image Generative Models | Hongyao Yu et.al. | 2507.05068 | null |
| 2025-07-07 | Replacing thinking with tool usage enables reasoning in small language models | Corrado Rainone et.al. | 2507.05065 | null |
| 2025-07-07 | What Shapes User Trust in ChatGPT? A Mixed-Methods Study of User Attributes, Trust Dimensions, Task Context, and Societal Perceptions among University Students | Kadija Bouyzourn et.al. | 2507.05046 | null |
| 2025-07-07 | MoLink: Distributed and Efficient Serving Framework for Large Models | Lewei Jin et.al. | 2507.05043 | null |
| 2025-07-07 | Beyond Scaling Curves: Internal Dynamics of Neural Networks Through the NTK Lens | Konstantin Nikolaou et.al. | 2507.05035 | null |
| 2025-07-07 | Estimating Object Physical Properties from RGB-D Vision and Depth Robot Sensors Using Deep Learning | Ricardo Cardoso et.al. | 2507.05029 | null |
| 2025-07-07 | A Generative Diffusion Model for Amorphous Materials | Kai Yang et.al. | 2507.05024 | null |
| 2025-07-07 | Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification | Chenfei Xiong et.al. | 2507.05010 | null |
| 2025-07-07 | Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition | Britty Baby et.al. | 2507.05007 | null |
| 2025-07-07 | From Autonomy to Agency: Agentic Vehicles for Human-Centered Mobility Systems | Jiangbo Yu et.al. | 2507.04996 | null |
| 2025-07-07 | Parameterized Diffusion Optimization enabled Autoregressive Ordinal Regression for Diabetic Retinopathy Grading | Qinkai Yu et.al. | 2507.04978 | null |
| 2025-07-07 | Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models | Eunseop Yoon et.al. | 2507.04976 | null |
| 2025-07-07 | The Case for Instance-Optimized LLMs in OLAP Databases | Bardia Mohammadi et.al. | 2507.04967 | null |
| 2025-07-07 | EXPOTION: Facial Expression and Motion Control for Multimodal Music Generation | Fathinah Izzati et.al. | 2507.04955 | null |
| 2025-07-07 | ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation | Chenchen Zhang et.al. | 2507.04952 | null |
| 2025-07-07 | ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding | Jianjiang Yang et.al. | 2507.04943 | null |
| 2025-07-07 | Contextual Light-Particle Interference | Brian Stout et.al. | 2507.04935 | null |
| 2025-07-07 | LIFT: Automating Symbolic Execution Optimization with Large Language Models for AI Networks | Ruoxi Wang et.al. | 2507.04931 | null |
| 2025-07-07 | HV-MMBench: Benchmarking MLLMs for Human-Centric Video Understanding | Yuxuan Cai et.al. | 2507.04909 | null |
| 2025-07-07 | Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations | A. Bochkov et.al. | 2507.04886 | null |
| 2025-07-07 | DoPI: Doctor-like Proactive Interrogation LLM for Traditional Chinese Medicine | Zewen Sun et.al. | 2507.04877 | null |
| 2025-07-07 | Music Boomerang: Reusing Diffusion Models for Data Augmentation and Audio Manipulation | Alexander Fichtinger et.al. | 2507.04864 | null |
| 2025-07-07 | Supporting Software Formal Verification with Large Language Models: An Experimental Study | Weiqi Wang et.al. | 2507.04857 | null |
| 2025-07-07 | Semantically Consistent Discrete Diffusion for 3D Biological Graph Modeling | Chinmay Prabhakar et.al. | 2507.04856 | null |
| 2025-07-07 | Shrey Ganatra et.al. | 2507.04854 | null | |
| 2025-07-07 | Dialogue-Based Multi-Dimensional Relationship Extraction from Novels | Yuchen Yan et.al. | 2507.04852 | null |
| 2025-07-07 | Spec-TOD: A Specialized Instruction-Tuned LLM Framework for Efficient Task-Oriented Dialogue Systems | Quang-Vinh Nguyen et.al. | 2507.04841 | null |
| 2025-07-07 | RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction | Johannes Künzel et.al. | 2507.04839 | null |
| 2025-07-07 | The Geopolitical Determinants of Economic Growth, 1960-2019 | Tianyu Fan et.al. | 2507.04833 | null |
| 2025-07-07 | Harnessing Pairwise Ranking Prompting Through Sample-Efficient Ranking Distillation | Junru Wu et.al. | 2507.04820 | null |
| 2025-07-07 | Application and Evaluation of Large Language Models for Forecasting the Impact of Traffic Incidents | George Jagadeesh et.al. | 2507.04803 | null |
| 2025-07-07 | Generalization bounds for score-based generative models: a synthetic proof | Arthur Stéphanovitch et.al. | 2507.04794 | null |
| 2025-07-07 | Reason to Rote: Rethinking Memorization in Reasoning | Yupei Du et.al. | 2507.04782 | null |
| 2025-07-07 | From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection | Zexi Jia et.al. | 2507.04769 | null |
| 2025-07-07 | ABench-Physics: Benchmarking Physical Reasoning in LLMs via High-Difficulty and Dynamic Physics Problems | Yiming Zhang et.al. | 2507.04766 | null |
| 2025-07-07 | GraphBrep: Learning B-Rep in Graph Structure for Efficient CAD Generation | Weilin Lai et.al. | 2507.04765 | null |
| 2025-07-07 | Intervening to learn and compose disentangled representations | Alex Markham et.al. | 2507.04754 | null |
| 2025-07-07 | Large Language Models for Network Intrusion Detection Systems: Foundations, Implementations, and Future Directions | Shuo Yang et.al. | 2507.04752 | null |
| 2025-07-07 | LLMs as Architects and Critics for Multi-Source Opinion Summarization | Anuj Attri et.al. | 2507.04751 | null |
| 2025-07-07 | LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction | Sungmin Lee et.al. | 2507.04748 | null |
| 2025-07-07 | Activation Steering for Chain-of-Thought Compression | Seyedarmin Azizi et.al. | 2507.04742 | null |
| 2025-07-07 | ChipSeek-R1: Generating Human-Surpassing RTL with LLM via Hierarchical Reward-Driven Reinforcement Learning | Zhirong Chen et.al. | 2507.04736 | null |
| 2025-07-07 | An analysis of vision-language models for fabric retrieval | Francesco Giuliari et.al. | 2507.04735 | null |
| 2025-07-07 | "This Suits You the Best": Query Focused Comparative Explainable Summarization | Arnav Attri et.al. | 2507.04733 | null |
| 2025-07-07 | Who's the Mole? Modeling and Detecting Intention-Hiding Malicious Agents in LLM-Based Multi-Agent Systems | Yizhe Xie et.al. | 2507.04724 | null |
| 2025-07-07 | LOOM-Scope: a comprehensive and efficient LOng-cOntext Model evaluation framework | Zecheng Tang et.al. | 2507.04723 | null |
| 2025-07-07 | Geometric-Guided Few-Shot Dental Landmark Detection with Human-Centric Foundation Model | Anbang Wang et.al. | 2507.04710 | null |
| 2025-07-07 | Why We Feel What We Feel: Joint Detection of Emotions and Their Opinion Triggers in E-commerce | Arnav Attri et.al. | 2507.04708 | null |
| 2025-07-07 | Tempo-R0: A Video-MLLM for Temporal Video Grounding through Efficient Temporal Sensing Reinforcement Learning | Feng Yue et.al. | 2507.04702 | null |
| 2025-07-07 | XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL | Yifu Liu et.al. | 2507.04701 | null |
| 2025-07-07 | A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets | Zexi Jia et.al. | 2507.04699 | null |
| 2025-07-07 | Performance Evaluation of General Purpose Large Language Models for Basic Linear Algebra Subprograms Code Generation | Daichi Mukunoki et.al. | 2507.04697 | null |
| 2025-07-07 | AKEGEN: A LLM-based Tabular Corpus Generator for Evaluating Dataset Discovery in Data Lakes | Zhenwei Dai et.al. | 2507.04687 | null |
| 2025-07-07 | ChangeBridge: Spatiotemporal Image Generation with Multimodal Controls for Remote Sensing | Zhenghui Zhao et.al. | 2507.04678 | null |
| 2025-07-07 | VectorLLM: Human-like Extraction of Structured Building Contours vis Multimodal LLMs | Tao Zhang et.al. | 2507.04664 | null |
| 2025-07-07 | MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding | Zhicheng Zhang et.al. | 2507.04635 | null |
| 2025-07-07 | Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? | Yun Qu et.al. | 2507.04632 | null |
| 2025-07-07 | Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts | Yun Wang et.al. | 2507.04631 | null |
| 2025-07-07 | Heterogeneous User Modeling for LLM-based Recommendation | Honghui Bao et.al. | 2507.04626 | null |
| 2025-07-07 | Knowledge-Aware Self-Correction in Language Models via Structured Memory Graphs | Swayamjit Saha et.al. | 2507.04625 | null |
| 2025-07-07 | Hierarchical Intent-guided Optimization with Pluggable LLM-Driven Semantics for Session-based Recommendation | Jinpeng Chen et.al. | 2507.04623 | null |
| 2025-07-07 | Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences | Yusong Zhang et.al. | 2507.04621 | null |
| 2025-07-07 | any4: Learned 4-bit Numeric Representation for LLMs | Mostafa Elhoushi et.al. | 2507.04610 | null |
| 2025-07-07 | PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes | Xinliang Frederick Zhang et.al. | 2507.04607 | null |
| 2025-07-07 | QR-LoRA: Efficient and Disentangled Fine-tuning via QR Decomposition for Customized Generation | Jiahui Yang et.al. | 2507.04599 | null |
| 2025-07-06 | Evaluating LLMs on Real-World Forecasting Against Human Superforecasters | Janna Lu et.al. | 2507.04562 | null |
| 2025-07-06 | MambaVideo for Discrete Video Tokenization with Channel-Split Quantization | Dawit Mureja Argaw et.al. | 2507.04559 | null |
| 2025-07-06 | Self-supervised learning of speech representations with Dutch archival data | Nik Vaessen et.al. | 2507.04554 | null |
| 2025-07-06 | Greedy Dynamic Matching | Nick Arnosti et.al. | 2507.04551 | null |
| 2025-07-06 | DP-Fusion: Token-Level Differentially Private Inference for Large Language Models | Rushil Thareja et.al. | 2507.04531 | null |
| 2025-07-06 | DOTResize: Reducing LLM Width via Discrete Optimal Transport-based Neuron Merging | Neha Verma et.al. | 2507.04517 | null |
| 2025-07-06 | Unveiling the Potential of Diffusion Large Language Model in Controllable Generation | Zhen Xiong et.al. | 2507.04504 | null |
| 2025-07-06 | A validity-guided workflow for robust large language model research in psychology | Zhicheng Lin et.al. | 2507.04491 | null |
| 2025-07-06 | Source Attribution in Retrieval-Augmented Generation | Ikhtiyor Nematov et.al. | 2507.04480 | null |
| 2025-07-06 | Model Inversion Attacks on Llama 3: Extracting PII from Large Language Models | Sathesh P. Sivashanmugam et.al. | 2507.04478 | null |
| 2025-07-06 | The role of large language models in UI/UX design: A systematic literature review | Ammar Ahmed et.al. | 2507.04469 | null |
| 2025-07-06 | GradOT: Training-free Gradient-preserving Offsite-tuning for Large Language Models | Kai Yao et.al. | 2507.04455 | null |
| 2025-07-06 | ESSA: Evolutionary Strategies for Scalable Alignment | Daria Korotyshova et.al. | 2507.04453 | null |
| 2025-07-03 | MultiGen: Using Multimodal Generation in Simulation to Learn Multimodal Policies in Real | Renhao Wang et.al. | 2507.02864 | null |
| 2025-07-03 | RefTok: Reference-Based Tokenization for Video Generation | Xiang Fan et.al. | 2507.02862 | null |
| 2025-07-03 | Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching | Xin Zhou et.al. | 2507.02860 | null |
| 2025-07-03 | Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation | Jiaer Xia et.al. | 2507.02859 | null |
| 2025-07-03 | Requirements Elicitation Follow-Up Question Generation | Yuchen Shen et.al. | 2507.02858 | null |
| 2025-07-03 | MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs | Purbesh Mitra et.al. | 2507.02851 | null |
| 2025-07-03 | Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection | Ziqi Miao et.al. | 2507.02844 | null |
| 2025-07-03 | LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding | Yuchen Ma et.al. | 2507.02843 | null |
| 2025-07-03 | StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason | Kaiyi Zhang et.al. | 2507.02841 | null |
| 2025-07-03 | ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning | Ruiyang Zhou et.al. | 2507.02834 | null |
| 2025-07-03 | SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model | Wencheng Zhang et.al. | 2507.02822 | null |
| 2025-07-03 | Multimodal Mathematical Reasoning with Diverse Solving Perspective | Wenhao Shi et.al. | 2507.02804 | null |
| 2025-07-03 | Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models | Riccardo Cantini et.al. | 2507.02799 | null |
| 2025-07-03 | No time to train! Training-Free Reference-Based Instance Segmentation | Miguel Espinosa et.al. | 2507.02798 | null |
| 2025-07-03 | From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding | Xiangfeng Wang et.al. | 2507.02790 | null |
| 2025-07-03 | Moral Responsibility or Obedience: What Do We Want from AI? | Joseph Boland et.al. | 2507.02788 | null |
| 2025-07-03 | Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs | Ken Tsui et.al. | 2507.02778 | null |
| 2025-07-03 | KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs | Yuzhang Xie et.al. | 2507.02773 | null |
| 2025-07-03 | Grounding Intelligence in Movement | Melanie Segado et.al. | 2507.02771 | null |
| 2025-07-03 | DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment | Ke-Han Lu et.al. | 2507.02768 | null |
| 2025-07-03 | Knowledge Protocol Engineering: A New Paradigm for AI in Domain-Specific Knowledge Work | Guangwei Zhang et.al. | 2507.02760 | null |
| 2025-07-03 | Fast and Simplex: 2-Simplicial Attention in Triton | Aurko Roy et.al. | 2507.02754 | null |
| 2025-07-03 | Who's Sorry Now: User Preferences Among Rote, Empathic, and Explanatory Apologies from LLM Chatbots | Zahra Ashktorab et.al. | 2507.02745 | null |
| 2025-07-03 | Prompt learning with bounding box constraints for medical image segmentation | Mélanie Gaillochet et.al. | 2507.02743 | null |
| 2025-07-03 | Early Signs of Steganographic Capabilities in Frontier LLMs | Artur Zolkowski et.al. | 2507.02737 | null |
| 2025-07-03 | Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving | Matthieu Zimmer et.al. | 2507.02726 | null |
| 2025-07-03 | On the Convergence of Large Language Model Optimizer for Black-Box Network Management | Hoon Lee et.al. | 2507.02689 | null |
| 2025-07-03 | Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs | Francesco Di Salvo et.al. | 2507.02671 | null |
| 2025-07-03 | AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models | Ziyin Zhou et.al. | 2507.02664 | null |
| 2025-07-03 | Hey AI, Generate Me a Hardware Code! Agentic AI-based Hardware Design & Verification | Deepak Narayan Gadde et.al. | 2507.02660 | null |
| 2025-07-03 | Medical Data Pecking: A Context-Aware Approach for Automated Quality Evaluation of Structured Medical Data | Irena Girshovitz et.al. | 2507.02628 | null |
| 2025-07-03 | VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning | Siran Chen et.al. | 2507.02626 | null |
| 2025-07-03 | FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference | Xing Liu et.al. | 2507.02620 | null |
| 2025-07-03 | Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory | Kenneth Payne et.al. | 2507.02618 | null |
| 2025-07-03 | DynamiCare: A Dynamic Multi-Agent Framework for Interactive and Open-Ended Medical Decision-Making | Tianqi Shang et.al. | 2507.02616 | null |
| 2025-07-03 | De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks | Wei Fan et.al. | 2507.02606 | null |
| 2025-07-03 | MPF: Aligning and Debiasing Language Models post Deployment via Multi Perspective Fusion | Xin Guan et.al. | 2507.02595 | null |
| 2025-07-03 | Revisiting Active Learning under (Human) Label Variation | Cornelia Gruber et.al. | 2507.02593 | null |
| 2025-07-03 | Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning | Buzhen Huang et.al. | 2507.02565 | null |
| 2025-07-03 | LLMREI: Automating Requirements Elicitation Interviews with LLMs | Alexander Korn et.al. | 2507.02564 | null |
| 2025-07-03 | Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability | Luca Baroni et.al. | 2507.02559 | null |
| 2025-07-03 | Clarifying Before Reasoning: A Coq Prover with Structural Context | Yanzhen Lu et.al. | 2507.02541 | null |
| 2025-07-03 | Are You Listening to Me? Fine-Tuning Chatbots for Empathetic Dialogue | Paulo Ricardo Knob et.al. | 2507.02537 | null |
| 2025-07-03 | Meta-Fair: AI-Assisted Fairness Testing of Large Language Models | Miguel Romero-Arjona et.al. | 2507.02533 | null |
| 2025-07-03 | Open-Source System for Multilingual Translation and Cloned Speech Synthesis | Mateo Cámara et.al. | 2507.02530 | null |
| 2025-07-03 | RetrySQL: text-to-SQL training with retry data for self-correcting query generation | Alicja Rączkowska et.al. | 2507.02529 | null |
| 2025-07-03 | Continual Gradient Low-Rank Projection Fine-Tuning for LLMs | Chenxu Wang et.al. | 2507.02503 | null |
| 2025-07-03 | CrowdTrack: A Benchmark for Difficult Multiple Pedestrian Tracking in Real Scenarios | Teng Fu et.al. | 2507.02479 | null |
| 2025-07-03 | System-performance and cost modeling of Large Language Model training and inference | Wenzhe Guo et.al. | 2507.02456 | null |
| 2025-07-03 | Introducing a New Brexit-Related Uncertainty Index: Its Evolution and Economic Consequences | Ismet Gocer et.al. | 2507.02439 | null |
| 2025-07-03 | Toward a Robust and Generalizable Metamaterial Foundation Model | Namjung Kim et.al. | 2507.02436 | null |
| 2025-07-03 | Improving Consistency in Vehicle Trajectory Prediction Through Preference Optimization | Caio Azevedo et.al. | 2507.02406 | null |
| 2025-07-03 | Evaluating Language Models For Threat Detection in IoT Security Logs | Jorge J. Tejero-Fernández et.al. | 2507.02390 | null |
| 2025-07-03 | JoyTTS: LLM-based Spoken Chatbot With Voice Cloning | Fangru Zhou et.al. | 2507.02380 | null |
| 2025-07-03 | Efficient Code LLM Training via Distribution-Consistent and Diversity-Aware Data Selection | Weijie Lyu et.al. | 2507.02378 | null |
| 2025-07-03 | UVLM: Benchmarking Video Language Model for Underwater World Understanding | Xizhe Xue et.al. | 2507.02373 | null |
| 2025-07-03 | Holistic Tokenizer for Autoregressive Image Generation | Anlin Zheng et.al. | 2507.02358 | null |
| 2025-07-03 | Coling-UniA at SciVQA 2025: Few-Shot Example Retrieval and Confidence-Informed Ensembling for Multimodal Large Language Models | Christian Jaumann et.al. | 2507.02357 | null |
| 2025-07-03 | DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning | Dohoon Kim et.al. | 2507.02302 | null |
| 2025-07-03 | Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization | De Cheng et.al. | 2507.02288 | null |
| 2025-07-03 | Misaligned from Within: Large Language Models Reproduce Our Double-Loop Learning Blindness | Tim Rogers et.al. | 2507.02283 | null |
| 2025-07-03 | Content filtering methods for music recommendation: A review | Terence Zeng et.al. | 2507.02282 | null |
| 2025-07-03 | LaCo: Efficient Layer-wise Compression of Visual Tokens for Multimodal Large Language Models | Juntao Liu et.al. | 2507.02279 | null |
| 2025-07-03 | NLP4Neuro: Sequence-to-sequence learning for neural population decoding | Jacob J. Morra et.al. | 2507.02264 | null |
| 2025-07-03 | Uncertainty-aware Reward Design Process | Yang Yang et.al. | 2507.02256 | null |
| 2025-07-03 | Listwise Preference Alignment Optimization for Tail Item Recommendation | Zihao Li et.al. | 2507.02255 | null |
| 2025-07-03 | Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation | Jungkoo Kang et.al. | 2507.02253 | null |
| 2025-07-03 | SurgVisAgent: Multimodal Agentic Model for Versatile Surgical Visual Enhancement | Zeyu Lei et.al. | 2507.02252 | null |
| 2025-07-03 | VERBA: Verbalizing Model Differences Using Large Language Models | Shravan Doda et.al. | 2507.02241 | null |
| 2025-07-03 | DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs | Mohammad Akyash et.al. | 2507.02226 | null |
| 2025-07-03 | GDC Cohort Copilot: An AI Copilot for Curating Cohorts from the Genomic Data Commons | Steven Song et.al. | 2507.02221 | null |
| 2025-07-02 | ESTR-CoT: Towards Explainable and Accurate Event Stream based Scene Text Recognition with Chain-of-Thought Reasoning | Xiao Wang et.al. | 2507.02200 | null |
| 2025-07-02 | EvalAssist: A Human-Centered Tool for LLM-as-a-Judge | Zahra Ashktorab et.al. | 2507.02186 | null |
| 2025-07-02 | Computer Science Education in the Age of Generative AI | Russell Beale et.al. | 2507.02183 | null |
| 2025-07-02 | Enhancing COBOL Code Explanations: A Multi-Agents Approach Using Large Language Models | Fangjian Lei et.al. | 2507.02182 | null |
| 2025-07-02 | The Revolution Has Arrived: What the Current State of Large Language Models in Education Implies for the Future | Russell Beale et.al. | 2507.02180 | null |
| 2025-07-02 | Data Diversification Methods In Alignment Enhance Math Performance In LLMs | Berkan Dokmeci et.al. | 2507.02173 | null |
| 2025-07-02 | Reasoning or Not? A Comprehensive Evaluation of Reasoning LLMs for Dialogue Summarization | Keyan Jin et.al. | 2507.02145 | null |
| 2025-07-02 | When LLMs Disagree: Diagnosing Relevance Filtering Bias and Retrieval Divergence in SDG Search | William A. Ingram et.al. | 2507.02139 | null |
| 2025-07-02 | Dissecting the Impact of Mobile DVFS Governors on LLM Inference Performance and Energy Efficiency | Zongpu Zhang et.al. | 2507.02135 | null |
| 2025-07-02 | BACTA-GPT: An AI-Based Bayesian Adaptive Clinical Trial Architect | Krishna Padmanabhan et.al. | 2507.02130 | null |
| 2025-07-02 | Generative Latent Diffusion for Efficient Spatiotemporal Data Reduction | Xiao Li et.al. | 2507.02129 | null |
| 2025-07-02 | CROP: Circuit Retrieval and Optimization with Parameter Guidance using LLMs | Jingyu Pan et.al. | 2507.02128 | null |
| 2025-07-02 | SAKURAONE: Empowering Transparent and Open AI Platforms through Private-Sector HPC Investment in Japan | Fumikazu Konishi et.al. | 2507.02124 | null |
| 2025-07-02 | PAL: Designing Conversational Agents as Scalable, Cooperative Patient Simulators for Palliative-Care Training | Neil K. R. Sehgal et.al. | 2507.02122 | null |
| 2025-07-02 | What Neuroscience Can Teach AI About Learning in Continuously Changing Environments | Daniel Durstewitz et.al. | 2507.02103 | null |
| 2025-07-02 | The Future is Agentic: Definitions, Perspectives, and Open Challenges of Multi-Agent Recommender Systems | Reza Yousefi Maragheh et.al. | 2507.02097 | null |
| 2025-07-02 | Sample Complexity Bounds for Linear Constrained MDPs with a Generative Model | Xingtu Liu et.al. | 2507.02089 | null |
| 2025-07-02 | McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models | Tian Lan et.al. | 2507.02088 | null |
| 2025-07-02 | Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions | Eitan Anzenberg et.al. | 2507.02087 | null |
| 2025-07-02 | Measuring Scientific Capabilities of Language Models with a Systems Biology Dry Lab | Haonan Duan et.al. | 2507.02083 | null |
| 2025-07-02 | Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs | Mohammad Ali Alomrani et.al. | 2507.02076 | null |
| 2025-07-02 | Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges | Sanjeda Akter et.al. | 2507.02074 | null |
| 2025-07-02 | MGC: A Compiler Framework Exploiting Compositional Blindness in Aligned LLMs for Malware Generation | Lu Yan et.al. | 2507.02057 | null |
| 2025-07-02 | How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks | Rahul Ramachandran et.al. | 2507.01955 | null |
| 2025-07-02 | Test-Time Scaling with Reflective Generative Model | Zixiao Wang et.al. | 2507.01951 | null |
| 2025-07-02 | Kwai Keye-VL Technical Report | Kwai Keye Team et.al. | 2507.01949 | null |
| 2025-07-02 | LongAnimation: Long Animation Generation with Dynamic Global-Local Memory | Nan Chen et.al. | 2507.01945 | null |
| 2025-07-02 | SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars | Xiaosheng Zhao et.al. | 2507.01939 | null |
| 2025-07-02 | The Thin Line Between Comprehension and Persuasion in LLMs | Adrian de Wynter et.al. | 2507.01936 | null |
| 2025-07-02 | Large Language Model-Driven Closed-Loop UAV Operation with Semantic Observations | Wenhao Wang et.al. | 2507.01930 | null |
| 2025-07-02 | A Survey on Vision-Language-Action Models: An Action Tokenization Perspective | Yifan Zhong et.al. | 2507.01925 | null |
| 2025-07-02 | Decision-oriented Text Evaluation | Yu-Shiang Huang et.al. | 2507.01923 | null |
| 2025-07-02 | Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models | Chengao Li et.al. | 2507.01915 | null |
| 2025-07-02 | Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning | Qingdong He et.al. | 2507.01908 | null |
| 2025-07-02 | AI4Research: A Survey of Artificial Intelligence for Scientific Research | Qiguang Chen et.al. | 2507.01903 | null |
| 2025-07-02 | High-Layer Attention Pruning with Rescaling | Songtao Liu et.al. | 2507.01900 | null |
| 2025-07-02 | MiCoTA: Bridging the Learnability Gap with Intermediate CoT and Teacher Assistants | Dongyi Ding et.al. | 2507.01887 | null |
| 2025-07-02 | Improving GANs by leveraging the quantum noise from real hardware | Hongni Jin et.al. | 2507.01886 | null |
| 2025-07-02 | A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs | Niccolò McConnell et.al. | 2507.01881 | null |
| 2025-07-02 | Towards Foundation Auto-Encoders for Time-Series Anomaly Detection | Gastón García González et.al. | 2507.01875 | null |
| 2025-07-02 | DIY-MKG: An LLM-Based Polyglot Language Learning System | Kenan Tang et.al. | 2507.01872 | null |
| 2025-07-02 | Bridging UI Design and chatbot Interactions: Applying Form-Based Principles to Conversational Agents | Sanjay Krishna Anbalagan et.al. | 2507.01862 | null |
| 2025-07-02 | TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types | Yuhao Lin et.al. | 2507.01857 | null |
| 2025-07-02 | Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages | Samridhi Raj Sinha et.al. | 2507.01853 | null |
| 2025-07-02 | Low-Perplexity LLM-Generated Sequences and Where To Find Them | Arthur Wuhrmann et.al. | 2507.01844 | null |
| 2025-07-02 | Out-of-Distribution Detection Methods Answer the Wrong Questions | Yucen Lily Li et.al. | 2507.01831 | null |
| 2025-07-02 | APRMCTS: Improving LLM-based Automated Program Repair with Iterative Tree Search | Haichuan Hu et.al. | 2507.01827 | null |
| 2025-07-02 | LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs | Reza Arabpour et.al. | 2507.01806 | null |
| 2025-07-02 | Towards Decentralized and Sustainable Foundation Model Training with the Edge | Leyang Xue et.al. | 2507.01803 | null |
| 2025-07-02 | HCNQA: Enhancing 3D VQA with Hierarchical Concentration Narrowing Supervision | Shengli Zhou et.al. | 2507.01800 | null |
| 2025-07-02 | Robust brain age estimation from structural MRI with contrastive learning | Carlo Alberto Barbano et.al. | 2507.01794 | null |
| 2025-07-02 | Machine learning prediction of a chemical reaction over 8 decades of energy | Daniel Julian et.al. | 2507.01793 | null |
| 2025-07-02 | FreeLoRA: Enabling Training-Free LoRA Fusion for Autoregressive Multi-Subject Personalization | Peng Zheng et.al. | 2507.01792 | null |
| 2025-07-02 | MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining | Zhixun Chen et.al. | 2507.01785 | null |
| 2025-07-02 | Frontiers of Generative AI for Network Optimization: Theories, Limits, and Visions | Bo Yang et.al. | 2507.01773 | null |
| 2025-07-02 | Enhanced Generative Model Evaluation with Clipped Density and Coverage | Nicolas Salvy et.al. | 2507.01761 | null |
| 2025-07-02 | Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis | Peng Zheng et.al. | 2507.01756 | null |
| 2025-07-02 | Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training | Ismail Labiad et.al. | 2507.01752 | null |
| 2025-07-02 | LLMs for Legal Subsumption in German Employment Contracts | Oliver Wardas et.al. | 2507.01734 | null |
| 2025-07-02 | Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach | Hao Wei et.al. | 2507.01728 | null |
| 2025-07-02 | Generative flow-based warm start of the variational quantum eigensolver | Hang Zou et.al. | 2507.01726 | null |
| 2025-07-02 | Agent Ideate: A Framework for Product Idea Generation from Patents Using Agentic AI | Gopichand Kanumolu et.al. | 2507.01717 | null |
| 2025-07-02 | Generative modeling of convergence maps based on predicted one-point statistics | Vilasini Tinnaneri Sreekanth et.al. | 2507.01707 | null |
| 2025-07-02 | AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness | Zixin Chen et.al. | 2507.01702 | null |
| 2025-07-02 | Graph Representation-based Model Poisoning on Federated LLMs in CyberEdge Networks | Hanlin Cai et.al. | 2507.01694 | null |
| 2025-07-02 | GPT, But Backwards: Exactly Inverting Language Model Outputs | Adrians Skapars et.al. | 2507.01693 | null |
| 2025-07-02 | A generative modeling / Physics-Informed Neural Network approach to random differential equations | Georgios Arampatzis et.al. | 2507.01687 | null |
| 2025-07-02 | Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling | Zeyu Huang et.al. | 2507.01679 | null |
| 2025-07-02 | AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training | Zhenyu Han et.al. | 2507.01663 | null |
| 2025-07-02 | SAILViT: Towards Robust and Generalizable Visual Backbones for MLLMs via Gradual Feature Refinement | Weijie Yin et.al. | 2507.01643 | null |
| 2025-07-02 | DaiFu: In-Situ Crash Recovery for Deep Learning Systems | Zilong He et.al. | 2507.01628 | null |
| 2025-07-02 | Chart Question Answering from Real-World Analytical Narratives | Maeve Hutchinson et.al. | 2507.01627 | null |
| 2025-07-02 | Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems | Zhaoyan Sun et.al. | 2507.01599 | null |
| 2025-07-02 | Emotionally Intelligent Task-oriented Dialogue Systems: Architecture, Representation, and Optimisation | Shutong Feng et.al. | 2507.01594 | null |
| 2025-07-02 | A Gift from the Integration of Discriminative and Diffusion-based Generative Learning: Boundary Refinement Remote Sensing Semantic Segmentation | Hao Wang et.al. | 2507.01573 | null |
| 2025-07-02 | Self-Guided Process Reward Optimization with Masked Step Advantage for Process Reinforcement Learning | Wu Fei et.al. | 2507.01551 | null |
| 2025-07-02 | Crafting Hanzi as Narrative Bridges: An AI Co-Creation Workshop for Elderly Migrants | Wen Zhan et.al. | 2507.01548 | null |
| 2025-07-02 | MARVIS: Modality Adaptive Reasoning over VISualizations | Benjamin Feuer et.al. | 2507.01544 | null |
| 2025-07-02 | Is External Information Useful for Stance Detection with LLMs? | Quang Minh Nguyen et.al. | 2507.01543 | null |
| 2025-07-02 | Efficient Out-of-Scope Detection in Dialogue Systems via Uncertainty-Driven LLM Routing | Álvaro Zaera et.al. | 2507.01541 | null |
| 2025-07-02 | Loss Functions in Diffusion Models: A Comparative Study | Dibyanshu Kumar et.al. | 2507.01516 | null |
| 2025-07-02 | SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism | Beitao Chen et.al. | 2507.01513 | null |
| 2025-07-02 | AVC-DPO: Aligned Video Captioning via Direct Preference Optimization | Jiyang Tang et.al. | 2507.01492 | null |
| 2025-07-02 | Agent-as-Tool: A Study on the Hierarchical Decision Making with Reinforcement Learning | Yanfei Zhang et.al. | 2507.01489 | null |
| 2025-07-02 | BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments | Yibo Qiu et.al. | 2507.01485 | null |
| 2025-07-02 | Evaluating the Effectiveness of Direct Preference Optimization for Personalizing German Automatic Text Simplifications for Persons with Intellectual Disabilities | Yingqiang Gao et.al. | 2507.01479 | null |
| 2025-07-02 | Representation Entanglement for Generation:Training Diffusion Transformers Is Much Easier Than You Think | Ge Wu et.al. | 2507.01467 | null |
| 2025-07-02 | NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation | Max Gandyra et.al. | 2507.01463 | null |
| 2025-07-02 | Using multi-agent architecture to mitigate the risk of LLM hallucinations | Abd Elrahman Amer et.al. | 2507.01446 | null |
| 2025-07-02 | A Large Language Model for Chemistry and Retrosynthesis Predictions | Yueqing Zhang et.al. | 2507.01444 | null |
| 2025-07-02 | EdgeLoRA: An Efficient Multi-Tenant LLM Serving System on Edge Devices | Zheyu Shen et.al. | 2507.01438 | null |
| 2025-07-02 | Challenges & Opportunities with LLM-Assisted Visualization Retargeting | Luke S. Snyder et.al. | 2507.01436 | null |
| 2025-07-02 | Pensieve Grader: An AI-Powered, Ready-to-Use Platform for Effortless Handwritten STEM Grading | Yoonseok Yang et.al. | 2507.01431 | null |
| 2025-07-02 | TriVLA: A Unified Triple-System-Based Unified Vision-Language-Action Model for General Robot Control | Zhenyang Liu et.al. | 2507.01424 | null |
| 2025-07-02 | Evaluating LLM Agent Collusion in Double Auctions | Kushal Agrawal et.al. | 2507.01413 | null |
| 2025-07-02 | BronchoGAN: Anatomically consistent and domain-agnostic image-to-image translation for video bronchoscopy | Ahmad Soliman et.al. | 2507.01387 | null |
| 2025-07-02 | RALLY: Role-Adaptive LLM-Driven Yoked Navigation for Agentic UAV Swarms | Ziyao Wang et.al. | 2507.01378 | null |
| 2025-07-02 | AI Agents and Agentic AI-Navigating a Plethora of Concepts for Future Manufacturing | Yinwang Ren et.al. | 2507.01376 | null |
| 2025-07-02 | Activation Reward Models for Few-Shot Model Alignment | Tianning Chai et.al. | 2507.01368 | null |
| 2025-07-02 | Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy | Chris Yuhao Liu et.al. | 2507.01352 | null |
| 2025-07-02 | SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech | Cheng Zhuangfei et.al. | 2507.01348 | null |
| 2025-07-02 | LEDOM: An Open and Fundamental Reverse Language Model | Xunjian Yin et.al. | 2507.01335 | null |
| 2025-07-02 | Symbolic or Numerical? Understanding Physics Problem Solving in Reasoning LLMs | Nifu Dan et.al. | 2507.01334 | null |
| 2025-07-02 | Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy | Xiaoyun Zhang et.al. | 2507.01327 | null |
| 2025-07-02 | ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks | Zhiyao Ren et.al. | 2507.01321 | null |
| 2025-07-02 | La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation | Kai Liu et.al. | 2507.01299 | null |
| 2025-07-02 | Beyond Black-Box AI: Interpretable Hybrid Systems for Dementia Care | Matthew JY Kang et.al. | 2507.01282 | null |
| 2025-07-02 | Rethinking All Evidence: Enhancing Trustworthy Retrieval-Augmented Generation via Conflict-Driven Summarization | Juan Chen et.al. | 2507.01281 | null |
| 2025-07-02 | Evaluating Large Language Models for Multimodal Simulated Ophthalmic Decision-Making in Diabetic Retinopathy and Glaucoma Screening | Cindy Lie Tabuse et.al. | 2507.01278 | null |
| 2025-07-02 | AI Meets Maritime Training: Precision Analytics for Enhanced Safety and Performance | Vishakha Lall et.al. | 2507.01274 | null |
| 2025-07-02 | PULSE: Practical Evaluation Scenarios for Large Multimodal Model Unlearning | Tatsuki Kawakami et.al. | 2507.01271 | null |
| 2025-07-02 | LLM-based Realistic Safety-Critical Driving Video Generation | Yongjie Fu et.al. | 2507.01264 | null |
| 2025-07-02 | GAIus: Combining Genai with Legal Clauses Retrieval for Knowledge-based Assistant | Michał Matak et.al. | 2507.01259 | null |
| 2025-07-01 | Beyond First-Order: Training LLMs with Stochastic Conjugate Subgradients and AdamW | Di Zhang et.al. | 2507.01241 | null |
| 2025-07-01 | PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning | Xingke Yang et.al. | 2507.01216 | null |
| 2025-07-01 | 2024 NASA SUITS Report: LLM-Driven Immersive Augmented Reality User Interface for Robotics and Space Exploration | Kathy Zhuang et.al. | 2507.01206 | null |
| 2025-07-01 | Escaping Platos Cave: JAM for Aligning Independently Trained Vision and Language Models | Hyoseo et.al. | 2507.01201 | null |
| 2025-07-01 | Are Large Brainwave Foundation Models Capable Yet? Insights from Fine-tuning | Na Lee et.al. | 2507.01196 | null |
| 2025-07-01 | FlashDP: Private Training Large Language Models with Efficient DP-SGD | Liangyu Wang et.al. | 2507.01154 | null |
| 2025-07-01 | SonoGym: High Performance Simulation for Challenging Surgical Tasks with Robotic Ultrasound | Yunke Ao et.al. | 2507.01152 | null |
| 2025-07-01 | Geometry-aware 4D Video Generation for Robot Manipulation | Zeyi Liu et.al. | 2507.01099 | null |
| 2025-07-01 | A theoretical prediction for the dipole in nearby distances using cosmography | Hayley J. Macpherson et.al. | 2507.01095 | null |
| 2025-07-02 | GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning | GLM-V Team et.al. | 2507.01006 | null |
| 2025-07-01 | Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives | Sixun Dong et.al. | 2506.24124 | null |
| 2025-06-30 | Calligrapher: Freestyle Text Image Customization | Yue Ma et.al. | 2506.24123 | null |
| 2025-06-30 | TextMesh4D: High-Quality Text-to-4D Mesh Generation | Sisi Dai et.al. | 2506.24121 | null |
| 2025-06-30 | Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime | Yuqing Wang et.al. | 2506.24120 | null |
| 2025-06-30 | DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World | Xiangtai Li et.al. | 2506.24102 | null |
| 2025-06-30 | Imagine for Me: Creative Conceptual Blending of Real Images and Text via Blended Attention | Wonwoong Cho et.al. | 2506.24085 | null |
| 2025-06-30 | Logit-Gap Steering: Efficient Short-Suffix Jailbreaks for Aligned Large Language Models | Tung-Ling Li et.al. | 2506.24056 | null |
| 2025-06-30 | Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC | Xinming Wei et.al. | 2506.24045 | null |
| 2025-06-30 | A Survey on Vision-Language-Action Models for Autonomous Driving | Sicong Jiang et.al. | 2506.24044 | null |
| 2025-06-30 | Foundation Models for Zero-Shot Segmentation of Scientific Images without AI-Ready Data | Shubhabrata Mukherjee et.al. | 2506.24039 | null |
| 2025-06-30 | Minimally dissipative multi-bit logical operations | Jérémie Klinger et.al. | 2506.24021 | null |
| 2025-06-30 | Ella: Embodied Social Agents with Lifelong Memory | Hongxin Zhang et.al. | 2506.24019 | null |
| 2025-06-30 | EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations | Hyunjong Kim et.al. | 2506.24016 | null |
| 2025-06-30 | Large Language Models Don't Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective | Anselm R. Strohmaier et.al. | 2506.24006 | null |
| 2025-06-30 | Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning | Seungjun Yi et.al. | 2506.23998 | null |
| 2025-06-30 | TaP: A Taxonomy-Guided Framework for Automated and Scalable Preference Data Generation | Renren Jin et.al. | 2506.23979 | null |
| 2025-06-30 | Visual and Memory Dual Adapter for Multi-Modal Object Tracking | Boyue Xu et.al. | 2506.23972 | null |
| 2025-06-30 | UMA: A Family of Universal Models for Atoms | Brandon M. Wood et.al. | 2506.23971 | null |
| 2025-06-30 | Unveiling Decision-Making in LLMs for Text Classification : Extraction of influential and interpretable concepts with Sparse Autoencoders | Mathis Le Bail et.al. | 2506.23951 | null |
| 2025-06-30 | AI Risk-Management Standards Profile for General-Purpose AI (GPAI) and Foundation Models | Anthony M. Barrett et.al. | 2506.23949 | null |
| 2025-07-01 | Graft: Integrating the Domain Knowledge via Efficient Parameter Synergy for MLLMs | Yang Dai et.al. | 2506.23940 | null |
| 2025-06-30 | Leveraging the Potential of Prompt Engineering for Hate Speech Detection in Low-Resource Languages | Ruhina Tabasshum Prome et.al. | 2506.23930 | null |
| 2025-06-30 | IMPACT: Inflectional Morphology Probes Across Complex Typologies | Mohammed J. Saeed et.al. | 2506.23929 | null |
| 2025-06-30 | Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice | Akshit Kumar et.al. | 2506.23924 | null |
| 2025-06-30 | The Trilemma of Truth in Large Language Models | Germans Savcisens et.al. | 2506.23921 | null |
| 2025-06-30 | World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation | Haonan Chen et.al. | 2506.23919 | null |
| 2025-06-30 | Advancing Multi-Step Mathematical Reasoning in Large Language Models through Multi-Layered Self-Reflection with Auto-Prompting | André de Souza Loureiro et.al. | 2506.23888 | null |
| 2025-06-30 | Scaling Self-Supervised Representation Learning for Symbolic Piano Performance | Louis Bradshaw et.al. | 2506.23869 | null |
| 2025-06-30 | Large Language Models for Statistical Inference: Context Augmentation with Applications to the Two-Sample Problem and Regression | Marc Ratkovic et.al. | 2506.23862 | null |
| 2025-06-30 | Email as the Interface to Generative AI Models: Seamless Administrative Automation | Andres Navarro et.al. | 2506.23850 | null |
| 2025-06-30 | A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents | Hang Su et.al. | 2506.23844 | null |
| 2025-06-30 | Refine Any Object in Any Scene | Ziwei Chen et.al. | 2506.23835 | null |
| 2025-06-30 | Towards the "Digital Me": A vision of authentic Conversational Agents powered by personal Human Digital Twins | Lluís C. Coll et.al. | 2506.23826 | null |
| 2025-06-30 | Flash-VStream: Efficient Real-Time Understanding for Long Video Streams | Haoji Zhang et.al. | 2506.23825 | null |
| 2025-07-01 | The Impact of AI on Educational Assessment: A Framework for Constructive Alignment | Patrick Stokkink et.al. | 2506.23815 | null |
| 2025-06-30 | Leveraging a Multi-Agent LLM-Based System to Educate Teachers in Hate Incidents Management | Ewelina Gajewska et.al. | 2506.23774 | null |
| 2025-06-30 | Software Engineering for Large Language Models: Research Status, Challenges and the Road Ahead | Hongzhou Rao et.al. | 2506.23762 | null |
| 2025-06-30 | A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications | Boyang Yang et.al. | 2506.23749 | null |
| 2025-07-01 | Positional Bias in Binary Question Answering: How Uncertainty Shapes Model Preferences | Tiziano Labruna et.al. | 2506.23743 | null |
| 2025-06-30 | AutoEvoEval: An Automated Framework for Evolving Close-Ended LLM Evaluation Data | JiaRu Wu et.al. | 2506.23735 | null |
| 2025-06-30 | Radioactive Watermarks in Diffusion and Autoregressive Image Generative Models | Michel Meintz et.al. | 2506.23731 | null |
| 2025-06-30 | System-Embedded Diffusion Bridge Models | Bartlomiej Sobieski et.al. | 2506.23726 | null |
| 2025-06-30 | PAC Bench: Do Foundation Models Understand Prerequisites for Executing Manipulation Policies? | Atharva Gundawar et.al. | 2506.23725 | null |
| 2025-06-30 | MDPG: Multi-domain Diffusion Prior Guidance for MRI Reconstruction | Lingtong Zhang et.al. | 2506.23701 | null |
| 2025-06-30 | MedSAM-CA: A CNN-Augmented ViT with Attention-Enhanced Multi-Scale Fusion for Medical Image Segmentation | Peiting Tian et.al. | 2506.23700 | null |
| 2025-06-30 | If You Had to Pitch Your Ideal Software -- Evaluating Large Language Models to Support User Scenario Writing for User Experience Experts and Laypersons | Patrick Stadler et.al. | 2506.23694 | null |
| 2025-06-30 | Agent4S: The Transformation of Research Paradigms from the Perspective of Large Language Models | Boyuan Zheng et.al. | 2506.23692 | null |
| 2025-06-30 | SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation | Shuai Tan et.al. | 2506.23690 | null |
| 2025-06-30 | PokéAI: A Goal-Generating, Battle-Optimizing Multi-agent System for Pokemon Red | Zihao Liu et.al. | 2506.23689 | null |
| 2025-06-30 | Interactive Reasoning: Visualizing and Controlling Chain-of-Thought Reasoning in Large Language Models | Rock Yuren Pang et.al. | 2506.23678 | null |
| 2025-06-30 | Efficient Interleaved Speech Modeling through Knowledge Distillation | Mohammadmahdi Nouriborji et.al. | 2506.23670 | null |
| 2025-06-30 | L0: Reinforcement Learning to Become General Agents | Junjie Zhang et.al. | 2506.23667 | null |
| 2025-06-30 | On the Domain Robustness of Contrastive Vision-Language Models | Mario Koddenbrock et.al. | 2506.23663 | null |
| 2025-06-30 | Multiscale Turbulence Synthesis: Validation in 2D Hydrodynamics | Pierre Lesaffre et.al. | 2506.23659 | null |
| 2025-06-30 | Act-With-Think: Chunk Auto-Regressive Modeling for Generative Recommendation | Yifan Wang et.al. | 2506.23643 | null |
| 2025-06-30 | VAP-Diffusion: Enriching Descriptions with MLLMs for Enhanced Medical Image Generation | Peng Huang et.al. | 2506.23641 | null |
| 2025-06-30 | Unified Multimodal Understanding via Byte-Pair Visual Encoding | Wanpeng Zhang et.al. | 2506.23639 | null |
| 2025-06-30 | Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model | Mu-Chi Chen et.al. | 2506.23635 | null |
| 2025-06-30 | TurboVSR: Fantastic Video Upscalers and Where to Find Them | Zhongdao Wang et.al. | 2506.23618 | null |
| 2025-06-30 | Evaluating the Simulation of Human Personality-Driven Susceptibility to Misinformation with LLMs | Manuel Pratelli et.al. | 2506.23610 | null |
| 2025-06-30 | PGOV3D: Open-Vocabulary 3D Semantic Segmentation with Partial-to-Global Curriculum | Shiqi Zhang et.al. | 2506.23607 | null |
| 2025-06-30 | SG-LDM: Semantic-Guided LiDAR Generation via Latent-Aligned Diffusion | Zhengkang Xiang et.al. | 2506.23606 | null |
| 2025-06-30 | AI-Generated Lecture Slides for Improving Slide Element Detection and Retrieval | Suyash Maniyar et.al. | 2506.23605 | null |
| 2025-06-30 | SoK: Semantic Privacy in Large Language Models | Baihe Ma et.al. | 2506.23603 | null |
| 2025-06-30 | Semantic-guided Diverse Decoding for Large Language Model | Weijie Shi et.al. | 2506.23601 | null |
| 2025-06-30 | Transition Matching: Scalable and Flexible Generative Modeling | Neta Shaul et.al. | 2506.23589 | null |
| 2025-06-30 | Dataset Distillation via Vision-Language Category Prototype | Yawen Zou et.al. | 2506.23580 | null |
| 2025-06-30 | Evaluating Multi-Agent Defences Against Jailbreaking Attacks on Large Language Models | Maria Carolina Cornelia Wit et.al. | 2506.23576 | null |
| 2025-06-30 | MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI | Huanjin Yao et.al. | 2506.23563 | null |
| 2025-06-30 | JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching | Mingi Kwon et.al. | 2506.23552 | null |
| 2025-06-30 | Neural Langevin Machine: a local asymmetric learning rule can be creative | Zhendong Yu et.al. | 2506.23546 | null |
| 2025-06-30 | Comparative Analysis of the Code Generated by Popular Large Language Models (LLMs) for MISRA C++ Compliance | Malik Muhammad Umer et.al. | 2506.23535 | null |
| 2025-06-30 | On Recipe Memorization and Creativity in Large Language Models: Is Your Model a Creative Cook, a Bad Cook, or Merely a Plagiator? | Jan Kvapil et.al. | 2506.23527 | null |
| 2025-06-30 | NEU-ESC: A Comprehensive Vietnamese dataset for Educational Sentiment analysis and topic Classification toward multitask learning | Phan Quoc Hung Mai et.al. | 2506.23524 | null |
| 2025-07-01 | ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data | Yu Zhang et.al. | 2506.23520 | null |
| 2025-06-30 | Reinforcement Fine-Tuning Enables MLLMs Learning Novel Tasks Stably | Zhihao Zhang et.al. | 2506.23508 | null |
| 2025-06-30 | LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching | Mengxiao Tian et.al. | 2506.23502 | null |
| 2025-06-30 | Thought-Augmented Planning for LLM-Powered Interactive Recommender Agent | Haocheng Yu et.al. | 2506.23485 | null |
| 2025-06-30 | MTADiffusion: Mask Text Alignment Diffusion Model for Object Inpainting | Jun Huang et.al. | 2506.23482 | null |
| 2025-06-30 | Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks | Xian Zhang et.al. | 2506.23481 | null |
| 2025-06-30 | What to Keep and What to Drop: Adaptive Table Filtering Framework | Jang Won June et.al. | 2506.23463 | null |
| 2025-06-30 | Can We Predict the Unpredictable? Leveraging DisasterNet-LLM for Multimodal Disaster Classification | Manaswi Kulahara et.al. | 2506.23462 | null |
| 2025-06-30 | General Signal Model and Capacity Limit for Rydberg Quantum Information System | Jieao Zhu et.al. | 2506.23455 | null |
| 2025-06-30 | PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions | Mahesh Bhosale et.al. | 2506.23440 | null |
| 2025-06-29 | TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs | Felipe Nuti et.al. | 2506.23423 | null |
| 2025-06-29 | Datasets for Fairness in Language Models: An In-Depth Survey | Jiale Zhang et.al. | 2506.23411 | null |
| 2025-06-29 | Do LLMs Dream of Discrete Algorithms? | Claudionor Coelho Jr et.al. | 2506.23408 | null |
| 2025-06-29 | Perspective Dial: Measuring Perspective of Text and Guiding LLM Outputs | Taejin Kim et.al. | 2506.23377 | null |
| 2025-06-29 | Federated Timeline Synthesis: Scalable and Private Methodology For Model Training and Deployment | Pawel Renc et.al. | 2506.23358 | null |
| 2025-06-29 | GeoProg3D: Compositional Visual Reasoning for City-Scale 3D Language Fields | Shunsuke Yasuki et.al. | 2506.23352 | null |
| 2025-06-29 | ATGen: A Framework for Active Text Generation | Akim Tsvigun et.al. | 2506.23342 | null |
| 2025-06-29 | Information Loss in LLMs' Multilingual Translation: The Role of Training Data, Language Proximity, and Language Family | Yumeng Lin et.al. | 2506.23340 | null |
| 2025-06-29 | VALID-Mol: a Systematic Framework for Validated LLM-Assisted Molecular Design | Malikussaid et.al. | 2506.23339 | null |
| 2025-06-29 | XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs | Yitian Gong et.al. | 2506.23325 | null |
| 2025-06-29 | GATSim: Urban Mobility Simulation with Generative Agents | Qi Liu et.al. | 2506.23306 | null |
| 2025-07-01 | Exposing and Mitigating Calibration Biases and Demographic Unfairness in MLLM Few-Shot In-Context Learning for Medical Image Classification | Xing Shen et.al. | 2506.23298 | null |
| 2025-06-29 | Two Spelling Normalization Approaches Based on Large Language Models | Miguel Domingo et.al. | 2506.23288 | null |
| 2025-06-29 | MoMa: Modulating Mamba for Adapting Image Foundation Models to Video Recognition | Yuhuan Yang et.al. | 2506.23283 | null |
| 2025-06-29 | Autoregressive Denoising Score Matching is a Good Video Anomaly Detector | Hanwen Zhang et.al. | 2506.23282 | null |
| 2025-06-29 | Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games | David Guzman Piedrahita et.al. | 2506.23276 | null |
| 2025-06-27 | Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy | Yuhao Liu et.al. | 2506.22432 | null |
| 2025-06-27 | The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements | Bingchen Zhao et.al. | 2506.22419 | null |
| 2025-06-27 | HyperCLOVA X THINK Technical Report | NAVER Cloud HyperCLOVA X Team et.al. | 2506.22403 | null |
| 2025-06-27 | Refining Czech GEC: Insights from a Multi-Experiment Approach | Petr Pechman et.al. | 2506.22402 | null |
| 2025-06-27 | QuickSilver -- Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization | Danush Khanna et.al. | 2506.22396 | null |
| 2025-06-27 | What Makes ChatGPT Effective for Software Issue Resolution? An Empirical Study of Developer-ChatGPT Conversations in GitHub | Ramtin Ehsani et.al. | 2506.22390 | null |
| 2025-06-27 | Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment | Yue Zhang et.al. | 2506.22385 | null |
| 2025-06-27 | Probabilistic Optimality for Inference-time Scaling | Youkang Wang et.al. | 2506.22376 | null |
| 2025-06-27 | Towards Fair Rankings: Leveraging LLMs for Gender Bias Detection and Measurement | Maryam Mousavian et.al. | 2506.22372 | null |
| 2025-06-27 | Can Large Language Models Help Students Prove Software Correctness? An Experimental Study with Dafny | Carolina Carreira et.al. | 2506.22370 | null |
| 2025-06-27 | Concept-Level AI for Telecom: Moving Beyond Large Language Models | Viswanath Kumarskandpriya et.al. | 2506.22359 | null |
| 2025-06-27 | Optimal Estimation of Watermark Proportions in Hybrid AI-Human Texts | Xiang Li et.al. | 2506.22343 | null |
| 2025-06-27 | Evaluating Scoring Bias in LLM-as-a-Judge | Qingquan Li et.al. | 2506.22316 | null |
| 2025-06-27 | Detection of Personal Data in Structured Datasets Using a Large Language Model | Albert Agisha Ntwali et.al. | 2506.22305 | null |
| 2025-06-27 | Unfolding Generative Flows with Koopman Operators: Fast and Interpretable Sampling | Erkan Turan et.al. | 2506.22304 | null |
| 2025-06-27 | Rethinking Visual Token Reduction in LVLMs under Cross-modal Misalignment | Rui Xu et.al. | 2506.22283 | null |
| 2025-06-27 | Public Service Algorithm: towards a transparent, explainable, and scalable content curation for news content based on editorial values | Ahmad Mel et.al. | 2506.22270 | null |
| 2025-06-27 | Towards Operational Data Analytics Chatbots -- Virtual Knowledge Graph is All You Need | Junaid Ahmed Khan et.al. | 2506.22267 | null |
| 2025-06-27 | Projected Compression: Trainable Projection for Efficient Transformer Compression | Maciej Stefaniak et.al. | 2506.22255 | null |
| 2025-06-27 | Adapting University Policies for Generative AI: Opportunities, Challenges, and Policy Solutions in Higher Education | Russell Beale et.al. | 2506.22231 | null |
| 2025-06-27 | Cardiovascular disease classification using radiomics and geometric features from cardiac CT | Ajay Mittal et.al. | 2506.22226 | null |
| 2025-06-27 | Hybrid Generative Modeling for Incomplete Physics: Deep Grey-Box Meets Optimal Transport | Gurjeet Sangra Singh et.al. | 2506.22204 | null |
| 2025-06-27 | EFRame: Deeper Reasoning via Exploration-Filtering-Replay Reinforcement Learning Framework | Chen Wang et.al. | 2506.22200 | null |
| 2025-06-27 | Exploring Modularity of Agentic Systems for Drug Discovery | Laura van Weesep et.al. | 2506.22189 | null |
| 2025-06-27 | A Different Approach to AI Safety: Proceedings from the Columbia Convening on Openness in Artificial Intelligence and AI Safety | Camille François et.al. | 2506.22183 | null |
| 2025-06-27 | Training Language Model to Critique for Better Refinement | Tianshu Yu et.al. | 2506.22157 | null |
| 2025-06-27 | RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models | Ronald Fecso et.al. | 2506.22149 | null |
| 2025-06-27 | SAGE: Spliced-Audio Generated Data for Enhancing Foundational Models in Low-Resource Arabic-English Code-Switched Speech Recognition | Muhammad Umar Farooq et.al. | 2506.22143 | null |
| 2025-06-27 | Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs | Shaojie Zhang et.al. | 2506.22139 | null |
| 2025-06-27 | Reasoning in machine vision: learning to think fast and slow | Shaheer U. Saeed et.al. | 2506.22075 | null |
| 2025-06-27 | Query as Test: An Intelligent Driving Test and Data Storage Method for Integrated Cockpit-Vehicle-Road Scenarios | Shengyue Yao et.al. | 2506.22068 | null |
| 2025-06-27 | Lost at the Beginning of Reasoning | Baohao Liao et.al. | 2506.22058 | null |
| 2025-06-27 | Decoding Machine Translationese in English-Chinese News: LLMs vs. NMTs | Delu Kong et.al. | 2506.22050 | null |
| 2025-06-27 | GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling | Tianhao Chen et.al. | 2506.22049 | null |
| 2025-06-27 | Few-Shot Identity Adaptation for 3D Talking Heads via Global Gaussian Field | Hong Nie et.al. | 2506.22044 | null |
| 2025-06-27 | UniCA: Adapting Time Series Foundation Model to General Covariate-Aware Forecasting | Lu Han et.al. | 2506.22039 | null |
| 2025-06-27 | Can Peter Pan Survive MT? A Stylometric Study of LLMs, NMTs, and HTs in Children's Literature Translation | Delu Kong et.al. | 2506.22038 | null |
| 2025-06-27 | SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference | Yongchao He et.al. | 2506.22033 | null |
| 2025-06-27 | LMPVC and Policy Bank: Adaptive voice control for industrial robots with code generating LLMs and reusable Pythonic policies | Ossi Parikka et.al. | 2506.22028 | null |
| 2025-06-27 | RoboEnvision: A Long-Horizon Video Generation Model for Multi-Task Robot Manipulation | Liudi Yang et.al. | 2506.22007 | null |
| 2025-06-27 | LeanConjecturer: Automatic Generation of Mathematical Conjectures for Theorem Proving | Naoto Onda et.al. | 2506.22005 | null |
| 2025-06-27 | R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning | Biao Wang et.al. | 2506.21980 | null |
| 2025-06-27 | TASeg: Text-aware RGB-T Semantic Segmentation based on Fine-tuning Vision Foundation Models | Meng Yu et.al. | 2506.21975 | null |
| 2025-06-27 | Don't Trust Generative Agents to Mimic Communication on Social Networks Unless You Benchmarked their Empirical Realism | Simon Münker et.al. | 2506.21974 | null |
| 2025-06-27 | Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses | Mohamed Ahmed et.al. | 2506.21972 | null |
| 2025-06-27 | Using Large Language Models to Suggest Informative Prior Distributions in Bayesian Statistics | Michael A. Riegler et.al. | 2506.21964 | null |
| 2025-06-27 | PapersPlease: A Benchmark for Evaluating Motivational Values of Large Language Models Based on ERG Theory | Junho Myung et.al. | 2506.21961 | null |
| 2025-06-27 | Optimal Return-to-Go Guided Decision Transformer for Auto-Bidding in Advertisement | Hao Jiang et.al. | 2506.21956 | null |
| 2025-06-27 | Universal Modelling of Autocovariance Functions via Spline Kernels | Lachlan Astfalck et.al. | 2506.21953 | null |
| 2025-06-27 | CAL-RAG: Retrieval-Augmented Multi-Agent Generation for Content-Aware Layout Design | Najmeh Forouzandehmehr et.al. | 2506.21934 | null |
| 2025-06-27 | ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation | Reza Yousefi Maragheh et.al. | 2506.21931 | null |
| 2025-06-27 | A Survey of LLM Inference Systems | James Pan et.al. | 2506.21901 | null |
| 2025-06-27 | Bias, Accuracy, and Trust: Gender-Diverse Perspectives on Large Language Models | Aimen Gaba et.al. | 2506.21898 | null |
| 2025-06-27 | Exploring Task-Solving Paradigm for Generalized Cross-Domain Face Anti-Spoofing via Reinforcement Fine-Tuning | Fangling Jiang et.al. | 2506.21895 | null |
| 2025-06-27 | Integrating Multi-Modal Sensors: A Review of Fusion Techniques for Intelligent Vehicles | Chuheng Wei et.al. | 2506.21885 | null |
| 2025-06-27 | A Dual-Layered Evaluation of Geopolitical and Cultural Bias in LLMs | Sean Kim et.al. | 2506.21881 | null |
| 2025-06-27 | WildSpeech-Bench: Benchmarking Audio LLMs in Natural Speech Conversation | Jian Zhang et.al. | 2506.21875 | null |
| 2025-06-27 | On the Feasibility of Poisoning Text-to-Image AI Models via Adversarial Mislabeling | Stanley Wu et.al. | 2506.21874 | null |
| 2025-06-27 | Grounding-Aware Token Pruning: Recovering from Drastic Performance Drops in Visual Grounding Caused by Pruning | Tzu-Chun Chien et.al. | 2506.21873 | null |
| 2025-06-27 | RiverEcho: Real-Time Interactive Digital System for Ancient Yellow River Culture | Haofeng Wang et.al. | 2506.21865 | null |
| 2025-06-27 | DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE | Hang Shao et.al. | 2506.21864 | null |
| 2025-06-27 | LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs | Boyuan Sun et.al. | 2506.21862 | null |
| 2025-06-27 | SPADE: Spatial Transcriptomics and Pathology Alignment Using a Mixture of Data Experts for an Expressive Latent Space | Ekaterina Redekop et.al. | 2506.21857 | null |
| 2025-06-27 | Skill-Nav: Enhanced Navigation with Versatile Quadrupedal Locomotion via Waypoint Interface | Dewei Wang et.al. | 2506.21853 | null |
| 2025-06-27 | The Consistency Hypothesis in Uncertainty Quantification for Large Language Models | Quan Xiao et.al. | 2506.21849 | null |
| 2025-06-27 | Adversarial Threats in Quantum Machine Learning: A Survey of Attacks and Defenses | Archisman Ghosh et.al. | 2506.21842 | null |
| 2025-06-27 | PARSI: Persian Authorship Recognition via Stylometric Integration | Kourosh Shahnazari et.al. | 2506.21840 | null |
| 2025-06-27 | ProSAM: Enhancing the Robustness of SAM-based Visual Reference Segmentation with Probabilistic Prompts | Xiaoqi Wang et.al. | 2506.21835 | null |
| 2025-06-27 | TaleForge: Interactive Multimodal System for Personalized Story Creation | Minh-Loi Nguyen et.al. | 2506.21832 | null |
| 2025-06-27 | Few-Shot Segmentation of Historical Maps via Linear Probing of Vision Foundation Models | Rafael Sterzinger et.al. | 2506.21826 | null |
| 2025-06-26 | Exploring the change in scientific readability following the release of ChatGPT | Abdulkareem Alsudais et.al. | 2506.21825 | null |
| 2025-06-26 | Exploring the Structure of AI-Induced Language Change in Scientific English | Riley Galpin et.al. | 2506.21817 | null |
| 2025-06-26 | CAT-SG: A Large Dynamic Scene Graph Dataset for Fine-Grained Understanding of Cataract Surgery | Felix Holm et.al. | 2506.21813 | null |
| 2025-06-26 | Towards Transparent AI: A Survey on Explainable Large Language Models | Avash Palikhe et.al. | 2506.21812 | null |
| 2025-06-26 | CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation | Nicolas Bougie et.al. | 2506.21805 | null |
| 2025-06-26 | Multi-task parallelism for robust pre-training of graph foundation models on multi-source, multi-fidelity atomistic modeling data | Massimiliano Lupo Pasini et.al. | 2506.21788 | null |
| 2025-06-26 | MobiVerse: Scaling Urban Mobility Simulation with Hybrid Lightweight Domain-Specific Generator and Large Language Models | Yifan Liu et.al. | 2506.21784 | null |
| 2025-06-26 | Evaluating List Construction and Temporal Understanding capabilities of Large Language Models | Alexandru Dumitru et.al. | 2506.21783 | null |
| 2025-06-26 | M3PO: Massively Multi-Task Model-Based Policy Optimization | Aditya Narendra et.al. | 2506.21782 | null |
| 2025-06-26 | THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning? | Xin Wang et.al. | 2506.21763 | null |
| 2025-06-26 | (Fact) Check Your Bias | Eivind Morris Bakke et.al. | 2506.21745 | null |
| 2025-06-26 | Hierarchical Reasoning Model | Guan Wang et.al. | 2506.21734 | null |
| 2025-06-26 | Exploring Image Generation via Mutually Exclusive Probability Spaces and Local Correlation Hypothesis | Chenqiu Zhao et.al. | 2506.21731 | null |
| 2025-06-26 | FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering | Liangyu Zhong et.al. | 2506.21710 | null |
| 2025-06-26 | TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360° Panorama Generation | Hakan Çapuk et.al. | 2506.21681 | null |
| 2025-06-26 | Infrared foundations for quantum geometry I: Catalogue of totally symmetric rank-three field theories | Will Barker et.al. | 2506.21662 | null |
| 2025-06-26 | APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization | Minjie Hong et.al. | 2506.21655 | null |
| 2025-06-26 | Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test | Ziyue Li et.al. | 2506.21551 | null |
| 2025-06-26 | mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale | Xiaona Zhou et.al. | 2506.21550 | null |
| 2025-06-26 | SAM4D: Segment Anything in Camera and LiDAR Streams | Jianyun Xu et.al. | 2506.21547 | null |
| 2025-06-26 | PsyLite Technical Report | Fangjun Ding et.al. | 2506.21536 | null |
| 2025-06-26 | Exploring the Design Space of 3D MLLMs for CT Report Generation | Mohammed Baharoon et.al. | 2506.21535 | null |
| 2025-06-26 | "What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets | Akshay Paruchuri et.al. | 2506.21532 | null |
| 2025-06-26 | Potemkin Understanding in Large Language Models | Marina Mancoridis et.al. | 2506.21521 | null |
| 2025-06-26 | Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge | Boyu Gou et.al. | 2506.21506 | null |
| 2025-06-26 | Bridging Offline and Online Reinforcement Learning for LLMs | Jack Lanchantin et.al. | 2506.21495 | null |
| 2025-06-26 | Global and Local Entailment Learning for Natural World Imagery | Srikumar Sastry et.al. | 2506.21476 | null |
| 2025-06-26 | Efficient and Reuseable Cloud Configuration Search Using Discovery Spaces | Michael Johnston et.al. | 2506.21467 | null |
| 2025-06-26 | ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing | Huadai Liu et.al. | 2506.21448 | null |
| 2025-06-26 | Controllable 3D Placement of Objects with Scene-Aware Diffusion Models | Mohamed Omran et.al. | 2506.21446 | null |
| 2025-06-26 | Text2Cypher Across Languages: Evaluating Foundational Models Beyond English | Makbule Gulcin Ozsoy et.al. | 2506.21445 | null |
| 2025-06-26 | Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset Evaluation | Sweta Banerjee et.al. | 2506.21444 | null |
| 2025-06-26 | Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection | Ali Şenol et.al. | 2506.21443 | null |
| 2025-06-26 | Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning | Prajwal Koirala et.al. | 2506.21427 | null |
| 2025-06-26 | XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation | Bowen Chen et.al. | 2506.21416 | null |
| 2025-06-26 | Distributed Cross-Channel Hierarchical Aggregation for Foundation Models | Aristeidis Tsaris et.al. | 2506.21411 | null |
| 2025-06-26 | Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference | Colin Samplawski et.al. | 2506.21408 | null |
| 2025-06-26 | TableMoE: Neuro-Symbolic Routing for Structured Expert Reasoning in Multimodal Table Understanding | Junwen Zhang et.al. | 2506.21393 | null |
| 2025-06-26 | Early Stopping Tabular In-Context Learning | Jaris Küken et.al. | 2506.21387 | null |
| 2025-06-26 | Leveraging LLM-Assisted Query Understanding for Live Retrieval-Augmented Generation | Guanting Dong et.al. | 2506.21384 | null |
| 2025-06-26 | Canonical Quantization of a Memristive Leaky Integrate-and-Fire Neuron Circuit | Dean Brand et.al. | 2506.21363 | null |
| 2025-06-26 | Structuralist Approach to AI Literary Criticism: Leveraging Greimas Semiotic Square for Large Language Models | Fangzhou Dong et.al. | 2506.21360 | null |
| 2025-06-26 | CoPa-SG: Dense Scene Graphs with Parametric and Proto-Relations | Julian Lorenz et.al. | 2506.21357 | null |
| 2025-06-26 | SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning | Melanie Rieff et.al. | 2506.21355 | null |
| 2025-06-26 | DynamicBench: Evaluating Real-Time Report Generation in Large Language Models | Jingyao Li et.al. | 2506.21343 | null |
| 2025-06-26 | Active Inference AI Systems for Scientific Discovery | Karthik Duraisamy et.al. | 2506.21329 | null |
| 2025-06-26 | Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts | Jiajie Yang et.al. | 2506.21328 | null |
| 2025-06-26 | DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images | Badri Vishal Kasuba et.al. | 2506.21316 | null |
| 2025-06-26 | Exploring Adapter Design Tradeoffs for Low Resource Music Generation | Atharva Mehta et.al. | 2506.21298 | null |
| 2025-06-26 | Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models | Bram Willemsen et.al. | 2506.21294 | null |
| 2025-06-26 | Small Encoders Can Rival Large Decoders in Detecting Groundedness | Istabrak Abbes et.al. | 2506.21288 | null |
| 2025-06-26 | Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning | Xin Xu et.al. | 2506.21285 | null |
| 2025-06-26 | Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution | Lukas Sablica et.al. | 2506.21278 | null |
| 2025-06-26 | HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context | Qize Yang et.al. | 2506.21277 | null |
| 2025-06-26 | Cat and Mouse -- Can Fake Text Generation Outpace Detector Systems? | Andrea McGlinchey et.al. | 2506.21274 | null |
| 2025-06-26 | DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster | Ji Qi et.al. | 2506.21263 | null |
| 2025-06-26 | Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents | Tianyi Men et.al. | 2506.21252 | null |
| 2025-06-26 | ACTLLM: Action Consistency Tuned Large Language Model | Jing Bi et.al. | 2506.21250 | null |
| 2025-06-26 | GANet-Seg: Adversarial Learning for Brain Tumor Segmentation with Hybrid Generative Models | Qifei Cui et.al. | 2506.21245 | null |
| 2025-06-26 | Zero-Shot Learning for Obsolescence Risk Forecasting | Elie Saad et.al. | 2506.21240 | null |
| 2025-06-26 | Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval | Yongchan Chun et.al. | 2506.21222 | null |
| 2025-06-26 | Complexity-aware fine-tuning | Andrey Goncharov et.al. | 2506.21220 | null |
| 2025-06-26 | Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? | Haoang Chi et.al. | 2506.21215 | null |
| 2025-06-26 | Quanming Liu et.al. | 2506.21211 | null | |
| 2025-06-26 | BitMark for Infinity: Watermarking Bitwise Autoregressive Image Generative Models | Louis Kerner et.al. | 2506.21209 | null |
| 2025-06-26 | MedPrompt: LLM-CNN Fusion with Weight Routing for Medical Image Segmentation and Classification | Shadman Sobhan et.al. | 2506.21199 | null |
| 2025-06-26 | Prompt-Guided Turn-Taking Prediction | Koji Inoue et.al. | 2506.21191 | null |
| 2025-06-26 | GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding | Zijun Lin et.al. | 2506.21188 | null |
| 2025-06-26 | Task-Aware KV Compression For Cost-Effective Long Video Understanding | Minghao Qin et.al. | 2506.21184 | null |
| 2025-06-26 | Generative Adversarial Evasion and Out-of-Distribution Detection for UAV Cyber-Attacks | Deepak Kumar Panda et.al. | 2506.21142 | null |
| 2025-06-26 | How Good Are Synthetic Requirements ? Evaluating LLM-Generated Datasets for AI4RE | Abdelkarim El-Hajjami et.al. | 2506.21138 | null |
| 2025-06-26 | IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes | Yujia Liang et.al. | 2506.21116 | null |
| 2025-06-26 | OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography | Caoshuo Li et.al. | 2506.21101 | null |
| 2025-06-26 | Enhancing LLM Tool Use with High-quality Instruction Data from Knowledge Graph | Jingwei Wang et.al. | 2506.21071 | null |
| 2025-06-26 | MT2-CSD: A New Dataset and Multi-Semantic Knowledge Fusion Method for Conversational Stance Detection | Fuqiang Niu et.al. | 2506.21053 | null |
| 2025-06-26 | V2X-REALM: Vision-Language Model-Based Robust End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling | Junwei You et.al. | 2506.21041 | null |
| 2025-06-26 | Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning | Haodong Lu et.al. | 2506.21035 | null |
| 2025-06-26 | BLOCKS: Blockchain-supported Cross-Silo Knowledge Sharing for Efficient LLM Services | Zhaojiacheng Zhou et.al. | 2506.21033 | null |
| 2025-06-26 | Large Language Models Acing Chartered Accountancy | Jatin Gupta et.al. | 2506.21031 | null |
| 2025-06-26 | STEP Planner: Constructing cross-hierarchical subgoal tree as an embodied long-horizon task planner | Zhou Tianxing et.al. | 2506.21030 | null |
| 2025-06-26 | Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation | Ze Wang et.al. | 2506.21022 | null |
| 2025-06-26 | Multimodal Prompt Alignment for Facial Expression Recognition | Fuyan Ma et.al. | 2506.21017 | null |
| 2025-06-26 | HybridQ: Hybrid Classical-Quantum Generative Adversarial Network for Skin Disease Image Generation | Qingyue Jiao et.al. | 2506.21015 | null |
| 2025-06-26 | Distilling Normalizing Flows | Steven Walton et.al. | 2506.21003 | null |
| 2025-06-26 | SAC: A Framework for Measuring and Inducing Personality Traits in LLMs with Dynamic Intensity Control | Adithya Chittem et.al. | 2506.20993 | null |
| 2025-06-26 | Segment Anything in Pathology Images with Natural Language | Zhixuan Chen et.al. | 2506.20988 | null |
| 2025-06-26 | Our Coding Adventure: Using LLMs to Personalise the Narrative of a Tangible Programming Robot for Preschoolers | Martin Ruskov et.al. | 2506.20982 | null |
| 2025-06-26 | Response Quality Assessment for Retrieval-Augmented Generation via Conditional Conformal Factuality | Naihe Feng et.al. | 2506.20978 | null |
| 2025-06-26 | Where is AIED Headed? Key Topics and Emerging Frontiers (2020-2024) | Shihui Feng et.al. | 2506.20971 | null |
| 2025-06-26 | Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and Trends | Tian-Yu Xiang et.al. | 2506.20966 | null |
| 2025-06-26 | Evidence-based diagnostic reasoning with multi-agent copilot for human pathology | Chengkuan Chen et.al. | 2506.20964 | null |
| 2025-06-26 | EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora | Fangyuan Zhang et.al. | 2506.20963 | null |
| 2025-06-26 | Hierarchical Sub-action Tree for Continuous Sign Language Recognition | Dejie Yang et.al. | 2506.20947 | null |
| 2025-06-26 | Consistent Zero-shot 3D Texture Synthesis Using Geometry-aware Diffusion and Temporal Video Models | Donggoo Kang et.al. | 2506.20946 | null |
| 2025-06-26 | E-FreeM2: Efficient Training-Free Multi-Scale and Cross-Modal News Verification via MLLMs | Van-Hoang Phan et.al. | 2506.20944 | null |
| 2025-06-26 | Model State Arithmetic for Machine Unlearning | Keivan Rezaei et.al. | 2506.20941 | null |
| 2025-06-26 | ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks | Joshua H. Davis et.al. | 2506.20938 | null |
| 2025-06-26 | LLM-guided Chemical Process Optimization with a Multi-Agent Approach | Tong Zeng et.al. | 2506.20921 | null |
| 2025-06-26 | FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language | Guilherme Penedo et.al. | 2506.20920 | null |
| 2025-06-26 | Metadata Enrichment of Long Text Documents using Large Language Models | Manika Lamba et.al. | 2506.20918 | null |
| 2025-06-26 | ZKPROV: A Zero-Knowledge Approach to Dataset Provenance for Large Language Models | Mina Namazi et.al. | 2506.20915 | null |
| 2025-06-26 | FaSTA |
Advait Gupta et.al. | 2506.20911 | null |
| 2025-06-25 | Omniwise: Predicting GPU Kernels Performance with LLMs | Zixian Wang et.al. | 2506.20886 | null |
| 2025-06-25 | MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans | Shubhankar Borse et.al. | 2506.20879 | null |
| 2025-06-25 | 3DGH: 3D Head Generation with Composable Hair and Face | Chengan He et.al. | 2506.20875 | null |
| 2025-06-25 | Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation | Md Toufique Hasan et.al. | 2506.20869 | null |
| 2025-06-25 | Leaner Training, Lower Leakage: Revisiting Memorization in LLM Fine-Tuning with LoRA | Fei Wang et.al. | 2506.20856 | null |
| 2025-06-25 | Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision | Yuting He et.al. | 2506.20850 | null |
| 2025-06-25 | Uncovering Hidden Violent Tendencies in LLMs: A Demographic Analysis via Behavioral Vignettes | Quintin Myers et.al. | 2506.20822 | null |
| 2025-06-25 | MultiFinRAG: An Optimized Multimodal Retrieval-Augmented Generation (RAG) Framework for Financial Question Answering | Chinmay Gondhalekar et.al. | 2506.20821 | null |
| 2025-06-25 | GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization | Martin Andrews et.al. | [2506.20807](http://arxiv.org/abs |