Important
If you need to look at other conferences such as NeurIPS, ICLR, ICML, EMNLP, or ACL, you can check out Awesome-artist !!!🤩🤩🤩
Note
This project repository contains the long papers from ICLR 2025. Each paper’s framework diagrams, experimental figures, and other visuals are extracted to study their presentation techniques. Since the content is extensive and a single Markdown file cannot render everything reliably, we split it into 100 separate Markdown files, each covering approximately thirty-two papers. The following section indexes where each paper is located😁😁. Hope we can make progress together!
Warning
The README shown on the repository homepage may be automatically truncated. To view the full version, you can open the README file directly. 🤨
Important
Papers 1 to 211 are Oral.
Papers 212 to 586 are Spotlight.
The rest are Poster.
Total Papers: 3687
Split into 100 parts for better browsing
| column1 | column2 | column3 | column4 | column5 | column6 | column7 | column8 |
|---|---|---|---|---|---|---|---|
| Part 1: 37 papers | Part 2: 37 papers | Part 3: 37 papers | Part 4: 37 papers | Part 5: 37 papers | Part 6: 37 papers | Part 7: 37 papers | Part 8: 37 papers |
| Part 9: 37 papers | Part 10: 37 papers | Part 11: 37 papers | Part 12: 37 papers | Part 13: 37 papers | Part 14: 37 papers | Part 15: 37 papers | Part 16: 37 papers |
| Part 17: 37 papers | Part 18: 37 papers | Part 19: 37 papers | Part 20: 37 papers | Part 21: 37 papers | Part 22: 37 papers | Part 23: 37 papers | Part 24: 37 papers |
| Part 25: 37 papers | Part 26: 37 papers | Part 27: 37 papers | Part 28: 37 papers | Part 29: 37 papers | Part 30: 37 papers | Part 31: 37 papers | Part 32: 37 papers |
| Part 33: 37 papers | Part 34: 37 papers | Part 35: 37 papers | Part 36: 37 papers | Part 37: 37 papers | Part 38: 37 papers | Part 39: 37 papers | Part 40: 37 papers |
| Part 41: 37 papers | Part 42: 37 papers | Part 43: 37 papers | Part 44: 37 papers | Part 45: 37 papers | Part 46: 37 papers | Part 47: 37 papers | Part 48: 37 papers |
| Part 49: 37 papers | Part 50: 37 papers | Part 51: 37 papers | Part 52: 37 papers | Part 53: 37 papers | Part 54: 37 papers | Part 55: 37 papers | Part 56: 37 papers |
| Part 57: 37 papers | Part 58: 37 papers | Part 59: 37 papers | Part 60: 37 papers | Part 61: 37 papers | Part 62: 37 papers | Part 63: 37 papers | Part 64: 37 papers |
| Part 65: 37 papers | Part 66: 37 papers | Part 67: 37 papers | Part 68: 37 papers | Part 69: 37 papers | Part 70: 37 papers | Part 71: 37 papers | Part 72: 37 papers |
| Part 73: 37 papers | Part 74: 37 papers | Part 75: 37 papers | Part 76: 37 papers | Part 77: 37 papers | Part 78: 37 papers | Part 79: 37 papers | Part 80: 37 papers |
| Part 81: 37 papers | Part 82: 37 papers | Part 83: 37 papers | Part 84: 37 papers | Part 85: 37 papers | Part 86: 37 papers | Part 87: 37 papers | Part 88: 37 papers |
| Part 89: 37 papers | Part 90: 37 papers | Part 91: 37 papers | Part 92: 37 papers | Part 93: 37 papers | Part 94: 37 papers | Part 95: 37 papers | Part 96: 37 papers |
| Part 97: 37 papers | Part 98: 37 papers | Part 99: 37 papers | Part 100: 24 papers |
-
Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
-
Joint Graph Rewiring and Feature Denoising via Spectral Resonance
-
DSPO: Direct Score Preference Optimization for Diffusion Model Alignment
-
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
-
Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning
-
When Selection Meets Intervention: Additional Complexities in Causal Discovery
-
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
-
Diffusion-Based Planning for Autonomous Driving with Flexible Guidance
-
Open-World Reinforcement Learning over Long Short-Term Imagination
-
CyberHost: A One-stage Diffusion Framework for Audio-driven Talking Body Generation
-
Learning Distributions of Complex Fluid Simulations with Diffusion Graph Networks
-
Capturing the Temporal Dependence of Training Data Influence
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
-
Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective
-
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models
-
Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation
-
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
-
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
-
Tractable Multi-Agent Reinforcement Learning through Behavioral Economics
-
Do as We Do, Not as You Think: the Conformity of Large Language Models
-
Linear Representations of Political Perspective Emerge in Large Language Models
-
Rethinking Reward Modeling in Preference-based Large Language Model Alignment
-
Homomorphism Expressivity of Spectral Invariant Graph Neural Networks
-
PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration
-
Data Scaling Laws in Imitation Learning for Robotic Manipulation
-
MaestroMotif: Skill Design from Artificial Intelligence Feedback
-
DarkBench: Benchmarking Dark Patterns in Large Language Models
-
ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding
-
Transformers Provably Solve Parity Efficiently with Chain of Thought
-
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
-
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
-
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
-
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
-
LLM-SR: Scientific Equation Discovery via Programming with Large Language Models
-
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
-
Learning and aligning single-neuron invariance manifolds in visual cortex
-
Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization
-
Copyright-Protected Language Generation via Adaptive Model Fusion
-
Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment
-
Root Cause Analysis of Anomalies in Multivariate Time Series through Granger Causal Discovery
-
Instant Policy: In-Context Imitation Learning via Graph Diffusion
-
GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation
-
Training on the Test Task Confounds Evaluation and Emergence
-
Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics
-
Cross-Entropy Is All You Need To Invert the Data Generating Process
-
On the Role of Attention Heads in Large Language Model Safety
-
Learning stochastic dynamics from snapshots through regularized unbalanced optimal transport
-
Composing Unbalanced Flows for Flexible Docking and Relaxation
-
A Computational Framework for Modeling Emergence of Color Vision in the Human Brain
-
Improving Probabilistic Diffusion Models With Optimal Diagonal Covariance Matching
-
BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models
-
Combatting Dimensional Collapse in LLM Pre-Training Data via Submodular File Selection
-
Influence Functions for Scalable Data Attribution in Diffusion Models
-
Language Representations Can be What Recommenders Need: Findings and Potentials
-
Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition
-
Your Mixture-of-Experts LLM Is Secretly an Embedding Model for Free
-
Emergence of meta-stable clustering in mean-field transformer models
-
Comparing noisy neural population dynamics using optimal transport distances
-
RB-Modulation: Training-Free Stylization using Reference-Based Modulation
-
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
-
TopoLM: brain-like spatio-functional organization in a topographic language model
-
Latent Bayesian Optimization via Autoregressive Normalizing Flows
-
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
-
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
-
LaMPlace: Learning to Optimize Cross-Stage Metrics in Macro Placement
-
MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection
-
On Conformal Isometry of Grid Cells: Learning Distance-Preserving Position Embedding
-
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
-
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
-
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior
-
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding
-
Self-Improvement in Language Models: The Sharpening Mechanism
-
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
-
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
-
Reasoning Elicitation in Language Models via Counterfactual Feedback
-
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
-
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
-
HiRA: Parameter-Efficient Hadamard High-Rank Adaptation for Large Language Models
-
Proteina: Scaling Flow-based Protein Structure Generative Models
-
REEF: Representation Encoding Fingerprints for Large Language Models
-
Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance
-
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
-
Generator Matching: Generative modeling with arbitrary Markov processes
-
Brain Bandit: A Biologically Grounded Neural Network for Efficient Control of Exploration
-
Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
-
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias
-
From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions
-
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
-
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding
-
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
-
Steering Protein Family Design through Profile Bayesian Flow
-
On the Hölder Stability of Multiset and Graph Neural Networks
-
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
-
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
-
A Theoretically-Principled Sparse, Connected, and Rigid Graph Representation of Molecules
-
REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments
-
Limits to scalable evaluation at the frontier: LLM as judge won’t beat twice the data
-
SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers
-
Learning to Discover Regulatory Elements for Gene Expression Prediction
-
Simplifying, Stabilizing and Scaling Continuous-time Consistency Models
-
ShEPhERD: Diffusing shape, electrostatics, and pharmacophores for bioisosteric drug design
-
Accelerated training through iterative gradient propagation along the residual path
-
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
-
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models
-
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
-
MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions
-
GridMix: Exploring Spatial Modulation for Neural Fields in PDE Modeling
-
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
-
Population Transformer: Learning Population-level Representations of Neural Activity
-
Inference Scaling for Long-Context Retrieval Augmented Generation
-
Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs
-
Retrieval Head Mechanistically Explains Long-Context Factuality
-
How much of my dataset did you use? Quantitative Data Usage Inference in Machine Learning
-
SymmetricDiffusers: Learning Discrete Diffusion on Finite Symmetric Groups
-
Interpreting Emergent Planning in Model-Free Reinforcement Learning
-
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
-
Progressive Compression with Universally Quantized Diffusion Models
-
Training Language Models to Self-Correct via Reinforcement Learning
-
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
-
What should a neuron aim for? Designing local objective functions based on information theory
-
Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment
-
Global Convergence in Neural ODEs: Impact of Activation Functions
-
Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces
-
DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL
-
The Complexity of Two-Team Polymatrix Games with Independent Adversaries
-
Amortized Control of Continuous State Space Feynman-Kac Model for Irregular Time Series
-
TetSphere Splatting: Representing High-Quality Geometry with Lagrangian Volumetric Meshes
-
MoDeGPT: Modular Decomposition for Large Language Model Compression
-
Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects
-
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
-
Safety Alignment Should be Made More Than Just a Few Tokens Deep
-
Variational Diffusion Posterior Sampling with Midpoint Guidance
-
NeuralPlane: Structured 3D Reconstruction in Planar Primitives with Neural Fields
-
SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning
-
Energy-based Backdoor Defense Against Federated Graph Learning
-
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
-
Exploring The Loss Landscape Of Regularized Neural Networks Via Convex Duality
-
Flat Reward in Policy Parameter Space Implies Robust Reinforcement Learning
-
Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Parameters for Reasoning
-
Compositional Entailment Learning for Hyperbolic Vision-Language Models
-
On the Identification of Temporal Causal Representation with Instantaneous Dependence
-
Towards Understanding Why FixMatch Generalizes Better Than Supervised Learning
-
Open-Vocabulary Customization from CLIP via Data-Free Knowledge Distillation
-
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse
-
TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis
-
ProtComposer: Compositional Protein Structure Generation with 3D Ellipsoids
-
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
-
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
-
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
-
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI
-
MaRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers
-
Robust Function-Calling for On-Device Language Model via Function Masking
-
GOLD: Graph Out-of-Distribution Detection via Implicit Adversarial Latent Generation
-
Advantage-Guided Distillation for Preference Alignment in Small Language Models
-
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
-
Better autoregressive regression with LLMs via regression-aware fine-tuning
-
The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions
-
CausalRivers - Scaling up benchmarking of causal discovery for real-world time-series
-
Provably Accurate Shapley Value Estimation via Leverage Score Sampling
-
GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision
-
Effective Interplay between Sparsity and Quantization: From Theory to Practice
-
VLMaterial: Procedural Material Generation with Large Vision-Language Models
-
Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra
-
Test-time Alignment of Diffusion Models without Reward Over-optimization
-
SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models
-
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
-
UniMatch: Universal Matching from Atom to Task for Few-Shot Drug Discovery
-
SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints
-
How Much is Unseen Depends Chiefly on Information About the Seen
-
Can Watermarked LLMs be Identified by Users via Crafted Prompts?
-
IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning
-
Weighted Point Set Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric
-
On the Expressiveness of Rational ReLU Neural Networks With Bounded Depth
-
Enhancing the Scalability and Applicability of Kohn-Sham Hamiltonians for Molecular Systems
-
Temporal Heterogeneous Graph Generation with Privacy, Utility, and Efficiency
-
Knowledge Localization: Mission Not Accomplished? Enter Query Localization!
-
Boosting Ray Search Procedure of Hard-label Attacks with Transfer-based Priors
-
Effective post-training embedding compression via temperature control in contrastive training
-
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
-
Emergent Orientation Maps —— Mechanisms, Coding Efficiency and Robustness
-
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
-
Joint Gradient Balancing for Data Ordering in Finite-Sum Multi-Objective Optimization
-
Interleaved Scene Graphs for Interleaved Text-and-Image Generation Assessment
-
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
-
Improving Unsupervised Constituency Parsing via Maximizing Semantic Information
-
DEEM: Diffusion models serve as the eyes of large language models for image perception
-
Demystifying the Token Dynamics of Deep Selective State Space Models
-
Mitigating Information Loss in Tree-Based Reinforcement Learning via Direct Optimization
-
GETS: Ensemble Temperature Scaling for Calibration in Graph Neural Networks
-
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
-
Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision
-
CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control
-
Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction
-
Scalable and Certifiable Graph Unlearning: Overcoming the Approximation Error Barrier
-
Exploring Local Memorization in Diffusion Models via Bright Ending Attention
-
Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning
-
LLaVA-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
-
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
-
Not All LLM-Generated Data Are Equal: Rethinking Data Weighting in Text Classification
-
ZAPBench: A Benchmark for Whole-Brain Activity Prediction in Zebrafish
-
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
-
Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding
-
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
-
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
-
Probabilistic Geometric Principal Component Analysis with application to neural data
-
CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph
-
MQuAKE-Remastered: Multi-Hop Knowledge Editing Can Only Be Advanced with Reliable Evaluations
-
Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model
-
RESuM: A Rare Event Surrogate Model for Physics Detector Design
-
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
-
Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction
-
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
-
Online Reinforcement Learning in Non-Stationary Context-Driven Environments
-
Controlling Language and Diffusion Models by Transporting Activations
-
Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model
-
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
-
Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Models
-
Learning local equivariant representations for quantum operators
-
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation
-
MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility
-
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
-
Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning
-
Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation
-
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
-
AnalogGenie: A Generative Engine for Automatic Discovery of Analog Circuit Topologies
-
PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training
-
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
-
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
-
Modeling Complex System Dynamics with Flow Matching Across Time and Conditions
-
Understanding Factual Recall in Transformers via Associative Memories
-
MixEval-X: Any-to-any Evaluations from Real-world Data Mixture
-
MamKO: Mamba-based Koopman operator for modeling and predictive control
-
Probabilistic Neural Pruning via Sparsity Evolutionary Fokker-Planck-Kolmogorov Equation
-
Diffusion Bridge AutoEncoders for Unsupervised Representation Learning
-
Mixture-of-Agents Enhances Large Language Model Capabilities
-
Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
-
ImpScore: A Learnable Metric For Quantifying The Implicitness Level of Sentences
-
Representative Guidance: Diffusion Model Sampling with Coherence
-
Bilinear MLPs enable weight-based mechanistic interpretability
-
Linear Spherical Sliced Optimal Transport: A Fast Metric for Comparing Spherical Data
-
PhyMPGN: Physics-encoded Message Passing Graph Network for spatiotemporal PDE systems
-
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors
-
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
-
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
-
Reinforcement Learning for Control of Non-Markovian Cellular Population Dynamics
-
BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics
-
Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling
-
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
-
Exact Certification of (Graph) Neural Networks Against Label Poisoning
-
Credal Wrapper of Model Averaging for Uncertainty Estimation in Classification
-
X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale
-
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
-
Online Preference Alignment for Language Models via Count-based Exploration
-
Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution
-
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
-
Linear SCM Identification in the Presence of Confounders and Gaussian Noise
-
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
-
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
-
Harnessing Diversity for Important Data Selection in Pretraining Large Language Models
-
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
-
Targeted Attack Improves Protection against Unauthorized Diffusion Customization
-
A Geometric Framework for Understanding Memorization in Generative Models
-
Recovering Manifold Structure Using Ollivier Ricci Curvature
-
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?
-
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training
-
Min-K%++: Improved Baseline for Pre-Training Data Detection from Large Language Models
-
Can Large Language Models Understand Symbolic Graphics Programs?
-
Learning Transformer-based World Models with Contrastive Predictive Coding
-
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
-
Competition Dynamics Shape Algorithmic Phases of In-Context Learning
-
DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control
-
Perm: A Parametric Representation for Multi-Style 3D Hair Modeling
-
Learning to Solve Differential Equation Constrained Optimization Problems
-
Joint Reward and Policy Learning with Demonstrations and Human Feedback Improves Alignment
-
AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories
-
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
-
Nonlinear Sequence Embedding by Monotone Variational Inequality
-
On Disentangled Training for Nonlinear Transform in Learned Image Compression
-
InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences
-
Approaching Rate-Distortion Limits in Neural Compression with Lattice Transform Coding
-
Regularization by Texts for Latent Diffusion Inverse Solvers
-
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
-
Uncertainty Modeling in Graph Neural Networks via Stochastic Differential Equations
-
Surprising Effectiveness of pretraining Ternary Language Model at Scale
-
Provable Uncertainty Decomposition via Higher-Order Calibration
-
TopoNets: High performing vision and language models with brain-like topography
-
Deep Learning Alternatives Of The Kolmogorov Superposition Theorem
-
Attention with Markov: A Curious Case of Single-layer Transformers
-
Higher-Order Graphon Neural Networks: Approximation and Cut Distance
-
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
-
Meta-Dynamical State Space Models for Integrative Neural Data Analysis
-
Progressive Compositionality in Text-to-Image Generative Models
-
ODE-based Smoothing Neural Network for Reinforcement Learning Tasks
-
How to Find the Exact Pareto Front for Multi-Objective MDPs?
-
Easing Training Process of Rectified Flow Models Via Lengthening Inter-Path Distance
-
FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs
-
SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks
-
The Computational Complexity of Circuit Discovery for Inner Interpretability
-
Learning from End User Data with Shuffled Differential Privacy over Kernel Densities
-
VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning
-
Spectral Compressive Imaging via Unmixing-driven Subspace Diffusion Refinement
-
Topograph: An Efficient Graph-Based Framework for Strictly Topology Preserving Image Segmentation
-
MorphoDiff: Cellular Morphology Painting with Diffusion Models
-
LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression
-
Let SSMs be ConvNets: State-space Modeling with Optimal Tensor Contractions
-
SoftCVI: Contrastive variational inference with self-generated soft labels
-
Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity
-
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
-
Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking
-
Moner: Motion Correction in Undersampled Radial MRI with Unsupervised Neural Representation
-
SPA-BENCH: A COMPREHENSIVE BENCHMARK FOR SMARTPHONE AGENT EVALUATION
-
A Second-Order Perspective on Model Compositionality and Incremental Learning
-
Signature Kernel Conditional Independence Tests in Causal Discovery for Stochastic Processes
-
RAG-SR: Retrieval-Augmented Generation for Neural Symbolic Regression
-
Towards Automated Knowledge Integration From Human-Interpretable Representations
-
TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning
-
Multi-Draft Speculative Sampling: Canonical Decomposition and Theoretical Limits
-
Continuous Exposure Learning for Low-light Image Enhancement using Neural ODEs
-
3DIS: Depth-Driven Decoupled Image Synthesis for Universal Multi-Instance Generation
-
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
-
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement
-
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes
-
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation
-
Efficient and Accurate Explanation Estimation with Distribution Compression
-
Nonlinear multiregion neural dynamics with parametric impulse response communication channels
-
POTEC: Off-Policy Contextual Bandits for Large Action Spaces via Policy Decomposition
-
LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models
-
Reducing Hallucinations in Large Vision-Language Models via Latent Space Steering
-
OASIS Uncovers: High-Quality T2I Models, Same Old Stereotypes
-
TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks
-
Streaming Algorithms For $\ell_p$ Flows and $\ell_p$ Regression
-
PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction
-
Systems with Switching Causal Relations: A Meta-Causal Perspective
-
Enhancing Learning with Label Differential Privacy by Vector Approximation
-
Multi-session, multi-task neural decoding from distinct cell-types and brain regions
-
Sparse components distinguish visual pathways & their alignment to neural networks
-
Revisiting text-to-image evaluation with Gecko: on metrics, prompts, and human rating
-
Tell me about yourself: LLMs are aware of their learned behaviors
-
CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models
-
Implicit Bias of Mirror Flow for Shallow Neural Networks in Univariate Regression
-
Streamlining Redundant Layers to Compress Large Language Models
-
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding
-
Union-over-Intersections: Object Detection beyond Winner-Takes-All
-
Enhancing Pre-trained Representation Classifiability can Boost its Interpretability
-
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
-
Presto! Distilling Steps and Layers for Accelerating Music Generation
-
Grounding Video Models to Actions through Goal Conditioned Exploration
-
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
-
Realistic Evaluation of Deep Partial-Label Learning Algorithms
-
EmbedLLM: Learning Compact Representations of Large Language Models
-
Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling
-
Poison-splat: Computation Cost Attack on 3D Gaussian Splatting
-
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
-
Improved Approximation Algorithms for $k$-Submodular Maximization via Multilinear Extension
-
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials
-
Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences
-
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
-
Estimating the Probabilities of Rare Outputs in Language Models
-
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
-
$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
-
Test-time Adaptation for Cross-modal Retrieval with Query Shift
-
DeLLMa: Decision Making Under Uncertainty with Large Language Models
-
ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks
-
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
-
AutoCGP: Closed-Loop Concept-Guided Policies from Unlabeled Demonstrations
-
A CLIP-Powered Framework for Robust and Generalizable Data Selection
-
SplatFormer: Point Transformer for Robust 3D Gaussian Splatting
-
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
-
Towards a Unified and Verified Understanding of Group-Operation Networks
-
DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo
-
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
-
LiFT: Learning to Fine-Tune via Bayesian Parameter Efficient Meta Fine-Tuning
-
When Attention Sink Emerges in Language Models: An Empirical View
-
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
-
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
-
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
-
Scaling up the Banded Matrix Factorization Mechanism for Large Scale Differentially Private ML
-
BlendRL: A Framework for Merging Symbolic and Neural Policy Learning
-
COPER: Correlation-based Permutations for Multi-View Clustering
-
MAGNet: Motif-Agnostic Generation of Molecules from Scaffolds
-
RegMix: Data Mixture as Regression for Language Model Pre-training
-
Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds
-
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
-
Accelerating Goal-Conditioned Reinforcement Learning Algorithms and Research
-
Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks
-
ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction
-
Learning from negative feedback, or positive feedback or both
-
Planning in Natural Language Improves LLM Search for Code Generation
-
On Quantizing Neural Representation for Variable-Rate Video Coding
-
DiffPuter: Empowering Diffusion Models for Missing Data Imputation
-
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
-
Broaden your SCOPE! Efficient Multi-turn Conversation Planning for LLMs with Semantic Space
-
LeFusion: Controllable Pathology Synthesis via Lesion-Focused Diffusion Models
-
RelitLRM: Generative Relightable Radiance for Large Reconstruction Models
-
Learning Spatiotemporal Dynamical Systems from Point Process Observations
-
uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs
-
The Superposition of Diffusion Models Using the Itô Density Estimator
-
Towards hyperparameter-free optimization with differential privacy
-
Discovering Temporally Compositional Neural Manifolds with Switching Infinite GPFA
-
DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model
-
DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
-
CREIMBO: Cross-Regional Ensemble Interactions in Multi-view Brain Observations
-
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
-
NetMoE: Accelerating MoE Training through Dynamic Sample Placement
-
Bayesian Optimization via Continual Variational Last Layer Training
-
Fast Uncovering of Protein Sequence Diversity from Structure
-
Century: A Framework and Dataset for Evaluating Historical Contextualisation of Sensitive Images
-
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
-
In vivo cell-type and brain region classification via multimodal contrastive learning
-
Monitoring Latent World States in Language Models with Propositional Probes
-
Universal generalization guarantees for Wasserstein distributionally robust models
-
PETRA: Parallel End-to-end Training with Reversible Architectures
-
ThunderKittens: Simple, Fast, and $\textit{Adorable}$ Kernels
-
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage
-
Holistically Evaluating the Environmental Impact of Creating Language Models
-
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
-
Re-Imagining Multimodal Instruction Tuning: A Representation View
-
Inverse decision-making using neural amortized Bayesian actors
-
MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow
-
Fewer May Be Better: Enhancing Offline Reinforcement Learning with Reduced Dataset
-
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
-
Understanding the Stability-based Generalization of Personalized Federated Learning
-
IgGM: A Generative Model for Functional Antibody and Nanobody Design
-
Exact Community Recovery under Side Information: Optimality of Spectral Algorithms
-
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHR Data
-
Deconstructing What Makes a Good Optimizer for Autoregressive Language Models
-
Dataset Ownership Verification in Contrastive Pre-trained Models
-
System 1.x: Learning to Balance Fast and Slow Planning with Language Models
-
Semialgebraic Neural Networks: From roots to representations
-
h4rm3l: A Language for Composable Jailbreak Attack Synthesis
-
To Code or Not To Code? Exploring Impact of Code in Pre-training
-
Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks
-
Enhancing Clustered Federated Learning: Integration of Strategies and Improved Methodologies
-
FreSh: Frequency Shifting for Accelerated Neural Representation Learning
-
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching
-
RocketEval: Efficient automated LLM evaluation via grading checklist
-
A Skewness-Based Criterion for Addressing Heteroscedastic Noise in Causal Discovery
-
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles
-
Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs
-
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
-
NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer
-
Do Stochastic, Feel Noiseless: Stable Stochastic Optimization via a Double Momentum Mechanism
-
SSOLE: Rethinking Orthogonal Low-rank Embedding for Self-Supervised Learning
-
EgoSim: Egocentric Exploration in Virtual Worlds with Multi-modal Conditioning
-
Deep Kernel Relative Test for Machine-generated Text Detection
-
Integral Performance Approximation for Continuous-Time Reinforcement Learning Control
-
Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaptation
-
Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers
-
LICO: Large Language Models for In-Context Molecular Optimization
-
SiReRAG: Indexing Similar and Related Information for Multihop Reasoning
-
ELFS: Label-Free Coreset Selection with Proxy Training Dynamics
-
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation
-
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
-
Forewarned is Forearmed: Harnessing LLMs for Data Synthesis via Failure-induced Exploration
-
Effective and Efficient Time-Varying Counterfactual Prediction with State-Space Models
-
AIMS.au: A Dataset for the Analysis of Modern Slavery Countermeasures in Corporate Statements
-
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models
-
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models
-
Accelerating Diffusion Transformers with Token-wise Feature Caching
-
Point-SAM: Promptable 3D Segmentation Model for Point Clouds
-
Towards Understanding the Universality of Transformers for Next-Token Prediction
-
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding
-
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets
-
Causally Motivated Sycophancy Mitigation for Large Language Models
-
Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron
-
Pacmann: Efficient Private Approximate Nearest Neighbor Search
-
Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study
-
ReCogLab: a framework testing relational reasoning & cognitive hypotheses on LLMs
-
MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models
-
HERO: Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning
-
On the Price of Differential Privacy for Hierarchical Clustering
-
Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding
-
Image and Video Tokenization with Binary Spherical Quantization
-
Mitigating Object Hallucination in MLLMs via Data-augmented Phrase-level Alignment
-
Simple, Good, Fast: Self-Supervised World Models Free of Baggage
-
UniCO: On Unified Combinatorial Optimization via Problem Reduction to Matrix-Encoded General TSP
-
GPromptShield: Elevating Resilience in Graph Prompt Tuning Against Adversarial Attacks
-
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
-
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
-
Quality over Quantity in Attention Layers: When Adding More Heads Hurts
-
Language Agents Meet Causality -- Bridging LLMs and Causal World Models
-
Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos
-
PIG: Physics-Informed Gaussians as Adaptive Parametric Mesh Representations
-
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
-
PALMBENCH: A COMPREHENSIVE BENCHMARK OF COMPRESSED LARGE LANGUAGE MODELS ON MOBILE PLATFORMS
-
Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model
-
InstantSplamp: Fast and Generalizable Stenography Framework for Generative Gaussian Splatting
-
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
-
EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
-
SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models
-
MaxCutPool: differentiable feature-aware Maxcut for pooling in graph neural networks
-
Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain)
-
SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations
-
Revisiting In-context Learning Inference Circuit in Large Language Models
-
Optimistic Games for Combinatorial Bayesian Optimization with Application to Protein Design
-
Vector-ICL: In-context Learning with Continuous Vector Representations
-
Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
-
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
-
Deep Signature: Characterization of Large-Scale Molecular Dynamics
-
Structural-Entropy-Based Sample Selection for Efficient and Effective Learning
-
Context Steering: Controllable Personalization at Inference Time
-
UniRestore3D: A Scalable Framework For General Shape Restoration
-
Learning General-purpose Biomedical Volume Representations using Randomized Synthesis
-
EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIVACY LEAKAGE
-
SeCom: On Memory Construction and Retrieval for Personalized Conversational Agents
-
Enhancing Uncertainty Estimation and Interpretability with Bayesian Non-negative Decision Layer
-
MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra
-
Building, Reusing, and Generalizing Abstract Representations from Concrete Sequences
-
(Mis)Fitting Scaling Laws: A Survey of Scaling Law Fitting Techniques in Deep Learning
-
Beware of Calibration Data for Pruning Large Language Models
-
SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP
-
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
-
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
-
Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires
-
ZeroDiff: Solidified Visual-semantic Correlation in Zero-Shot Learning
-
Self-Supervised Diffusion MRI Denoising via Iterative and Stable Refinement
-
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
-
Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise
-
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
-
Feature Responsiveness Scores: Model-Agnostic Explanations for Recourse
-
GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians
-
Improving Instruction-Following in Language Models through Activation Steering
-
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
-
Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation
-
PFDiff: Training-Free Acceleration of Diffusion Models Combining Past and Future Scores
-
Learning to Communicate Through Implicit Communication Channels
-
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
-
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
-
Aligning Visual Contrastive learning models via Preference Optimization
-
ST-GCond: Self-supervised and Transferable Graph Dataset Condensation
-
Learning Successor Features with Distributed Hebbian Temporal Memory
-
Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct
-
When GNNs meet symmetry in ILPs: an orbit-based feature augmentation approach
-
SINGER: Stochastic Network Graph Evolving Operator for High Dimensional PDEs
-
FlashMask: Efficient and Rich Mask Extension of FlashAttention
-
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods
-
On Calibration of LLM-based Guard Models for Reliable Content Moderation
-
SBSC: Step-by-Step Coding for Improving Mathematical Olympiad Performance
-
LLM-based Typed Hyperresolution for Commonsense Reasoning with Knowledge Bases
-
Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions
-
LancBiO: Dynamic Lanczos-aided Bilevel Optimization via Krylov Subspace
-
Needle Threading: Can LLMs Follow Threads Through Near-Million-Scale Haystacks?
-
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
-
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation
-
Tree of Attributes Prompt Learning for Vision-Language Models
-
Intrinsic User-Centric Interpretability through Global Mixture of Experts
-
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
-
Is Large-scale Pretraining the Secret to Good Domain Generalization?
-
Modeling dynamic social vision highlights gaps between deep learning and humans
-
Efficient Low-Bit Quantization with Adaptive Scales for Multi-Task Co-Training
-
Hierarchical Uncertainty Estimation for Learning-based Registration in Neuroimaging
-
PIED: Physics-Informed Experimental Design for Inverse Problems
-
In-Context Editing: Learning Knowledge from Self-Induced Distributions
-
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
-
Field-DiT: Diffusion Transformer on Unified Video, 3D, and Game Field Generation
-
Growth Inhibitors for Suppressing Inappropriate Image Concepts in Diffusion Models
-
DistillHGNN: A Knowledge Distillation Approach for High-Speed Hypergraph Neural Networks
-
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models
-
Image-level Memorization Detection via Inversion-based Inference Perturbation
-
Youku Dense Caption: A Large-scale Chinese Video Dense Caption Dataset and Benchmarks
-
MAGE: Model-Level Graph Neural Networks Explanations via Motif-based Graph Generation
-
Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models
-
Param$\Delta$ for Direct Mixing: Post-Train Large Language Model At Zero Cost
-
On the Modeling Capabilities of Large Language Models for Sequential Decision Making
-
Training-free LLM-generated Text Detection by Mining Token Probability Sequences
-
Revolutionizing EMCCD Denoising through a Novel Physics-Based Learning Framework for Noise Modeling
-
$F^3Set$: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
-
IMDPrompter: Adapting SAM to Image Manipulation Detection by Cross-View Automated Prompt Learning
-
Linear Multistep Solver Distillation for Fast Sampling of Diffusion Models
-
Scalable Extraction of Training Data from Aligned, Production Language Models
-
High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity
-
MindSimulator: Exploring Brain Concept Localization via Synthetic fMRI
-
Neural Approximate Mirror Maps for Constrained Diffusion Models
-
Towards Empowerment Gain through Causal Structure Learning in Model-Based Reinforcement Learning
-
Approximating Full Conformal Prediction for Neural Network Regression with Gauss-Newton Influence
-
VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?
-
Learning to Explore and Exploit with GNNs for Unsupervised Combinatorial Optimization
-
Differentiable Optimization of Similarity Scores Between Models and Brains
-
Tracing Representation Progression: Analyzing and Enhancing Layer-Wise Similarity
-
The Pitfalls of Memorization: When Memorization Hurts Generalization
-
DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
-
ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
-
Scaling Laws for Downstream Task Performance in Machine Translation
-
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries
-
KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
-
Learning View-invariant World Models for Visual Robotic Manipulation
-
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
-
Neuron Platonic Intrinsic Representation From Dynamics Using Contrastive Learning
-
Beyond Canonicalization: How Tensorial Messages Improve Equivariant Message Passing
-
PINP: Physics-Informed Neural Predictor with latent estimation of fluid flows
-
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge
-
Multimodal Quantitative Language for Generative Recommendation
-
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
-
Schur's Positive-Definite Network: Deep Learning in the SPD cone with structure
-
VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking
-
Reconciling Model Multiplicity for Downstream Decision Making
-
Unbounded: A Generative Infinite Game of Character Life Simulation
-
Flow Matching with Gaussian Process Priors for Probabilistic Time Series Forecasting
-
FlowDec: A flow-based full-band general audio codec with high perceptual quality
-
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
-
Towards Certification of Uncertainty Calibration under Adversarial Attacks
-
VAE-Var: Variational Autoencoder-Enhanced Variational Methods for Data Assimilation in Meteorology
-
Deep Kernel Posterior Learning under Infinite Variance Prior Weights
-
EG4D: Explicit Generation of 4D Object without Score Distillation
-
The impact of allocation strategies in subset learning on the expressive power of neural networks
-
Learning on One Mode: Addressing Multi-modality in Offline Reinforcement Learning
-
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
-
OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs
-
ScImage: How good are multimodal large language models at scientific text-to-image generation?
-
Discriminating image representations with principal distortions
-
The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws
-
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
-
Data Distillation for extrapolative protein design through exact preference optimization
-
Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models
-
Zero-shot Model-based Reinforcement Learning using Large Language Models
-
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
-
Learning Splitting Heuristics in Divide-and-Conquer SAT Solvers with Reinforcement Learning
-
ReNovo: Retrieval-Based \emph{De Novo} Mass Spectrometry Peptide Sequencing
-
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
-
CryoGEN: Generative Energy-based Models for Cryogenic Electron Tomography Reconstruction
-
Lift Your Molecules: Molecular Graph Generation in Latent Euclidean Space
-
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
-
UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models
-
SeRA: Self-Reviewing and Alignment of LLMs using Implicit Reward Margins
-
Learning Partial Graph Matching via Optimal Partial Transport
-
Towards Neural Scaling Laws for Time Series Foundation Models
-
Adaptive Rank Allocation: Speeding Up Modern Transformers with RaNA Adapters
-
Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View
-
Deriving Causal Order from Single-Variable Interventions: Guarantees & Algorithm
-
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators
-
On the Performance Analysis of Momentum Method: A Frequency Domain Perspective
-
RAPID: Retrieval Augmented Training of Differentially Private Diffusion Models
-
Cross-Embodiment Dexterous Grasping with Reinforcement Learning
-
On Generalization Across Environments In Multi-Objective Reinforcement Learning
-
Bounds on $L_p$ Errors in Density Ratio Estimation via $f$-Divergence Loss Functions
-
Accelerating Neural ODEs: A Variational Formulation-based Approach
-
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
-
ProAdvPrompter: A Two-Stage Journey to Effective Adversarial Prompting for LLMs
-
Improving Large Language Model Planning with Action Sequence Similarity
-
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation
-
Rethinking Multiple-Instance Learning From Feature Space to Probability Space
-
KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA
-
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
-
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
-
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
-
BAMDP Shaping: a Unified Framework for Intrinsic Motivation and Reward Shaping
-
Training-Free Diffusion Model Alignment with Sampling Demons
-
Scaling Long Context Training Data by Long-Distance Referrals
-
A Theory for Token-Level Harmonization in Retrieval-Augmented Generation
-
Backdooring Vision-Language Models with Out-Of-Distribution Data
-
Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them
-
MIRAGE: Evaluating and Explaining Inductive Reasoning Process in Language Models
-
COFlowNet: Conservative Constraints on Flows Enable High-Quality Candidate Generation
-
Precedence-Constrained Winter Value for Effective Graph Data Valuation
-
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
-
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
-
Discrete Diffusion Schrödinger Bridge Matching for Graph Transformation
-
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
-
GLOMA: Global Video Text Spotting with Morphological Association
-
SplineGS: Learning Smooth Trajectories in Gaussian Splatting for Dynamic Scene Reconstruction
-
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency
-
MiniPLM: Knowledge Distillation for Pre-training Language Models
-
Holographic Node Representations: Pre-training Task-Agnostic Node Embeddings
-
ThinkBot: Embodied Instruction Following with Thought Chain Reasoning
-
Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning
-
Adversarial Generative Flow Network for Solving Vehicle Routing Problems
-
A Simple Approach to Unifying Diffusion-based Conditional Generation
-
Node Identifiers: Compact, Discrete Representations for Efficient Graph Learning
-
Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models
-
Adversarially Robust Anomaly Detection through Spurious Negative Pair Mitigation
-
Breaking Mental Set to Improve Reasoning through Diverse Multi-Agent Debate
-
Enhance Multi-View Classification Through Multi-Scale Alignment and Expanded Boundary
-
Context-Alignment: Activating and Enhancing LLMs Capabilities in Time Series
-
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix
-
TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation
-
Adding Conditional Control to Diffusion Models with Reinforcement Learning
-
How efficient is LLM-generated code? A rigorous & high-standard benchmark
-
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
-
Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback
-
Reward Dimension Reduction for Scalable Multi-Objective Reinforcement Learning
-
Provable Robust Overfitting Mitigation in Wasserstein Distributionally Robust Optimization
-
Flow-based Variational Mutual Information: Fast and Flexible Approximations
-
Towards Learning High-Precision Least Squares Algorithms with Sequence Models
-
MetaMetrics: Calibrating Metrics for Generation Tasks Using Human Preferences
-
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
-
BTBS-LNS: Binarized-Tightening, Branch and Search on Learning LNS Policies for MIP
-
Semantix: An Energy-guided Sampler for Semantic Style Transfer
-
Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective
-
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
-
JetFormer: An autoregressive generative model of raw images and text
-
FreeCG: Free the Design Space of Clebsch-Gordan Transform for Machine Learning Force Fields
-
Density estimation with LLMs: a geometric investigation of in-context learning trajectories
-
nGPT: Normalized Transformer with Representation Learning on the Hypersphere
-
Collaborative Discrete-Continuous Black-Box Prompt Learning for Language Models
-
pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation
-
CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation
-
C-CLIP: Multimodal Continual Learning for Vision-Language Model
-
Autocorrelation Matters: Understanding the Role of Initialization Schemes for State Space Models
-
Aioli: A Unified Optimization Framework for Language Model Data Mixing
-
Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models
-
Implicit Neural Surface Deformation with Explicit Velocity Fields
-
Efficient Masked AutoEncoder for Video Object Counting and A Large-Scale Benchmark
-
Data-adaptive Differentially Private Prompt Synthesis for In-Context Learning
-
6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering
-
One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs
-
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance
-
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
-
Understanding Optimization in Deep Learning with Central Flows
-
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
-
Adaptive Energy Alignment for Accelerating Test-Time Adaptation
-
Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model
-
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
-
Robust LLM safeguarding via refusal feature adversarial training
-
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
-
Deep Incomplete Multi-view Learning via Cyclic Permutation of VAEs
-
Swing-by Dynamics in Concept Learning and Compositional Generalization
-
On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning
-
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
-
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
-
Minimal Impact ControlNet: Advancing Multi-ControlNet Integration
-
Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation
-
Robust Transfer of Safety-Constrained Reinforcement Learning Agents
-
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
-
Faster Inference of Flow-Based Generative Models via Improved Data-Noise Coupling
-
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
-
Bayesian Image Regression with Soft-thresholded Conditional Autoregressive Prior
-
OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models
-
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
-
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
-
Optimal Protocols for Continual Learning via Statistical Physics and Control Theory
-
DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image
-
High-quality Text-to-3D Character Generation with SparseCubes and Sparse Transformers.
-
Broadening Target Distributions for Accelerated Diffusion Models via a Novel Analysis Approach
-
Warm Diffusion: Recipe for Blur-Noise Mixture Diffusion Models
-
CircuitFusion: Multimodal Circuit Representation Learning for Agile Chip Design
-
MoLEx: Mixture of Layer Experts for Fine-tuning with Sparse Upcycling
-
PaLD: Detection of Text Partially Written by Large Language Models
-
Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
-
Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers
-
Reasoning of Large Language Models over Knowledge Graphs with Super-Relations
-
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning
-
Smoothing the Shift: Towards Stable Test-Time Adaptation under Complex Multimodal Noises
-
N-ForGOT: Towards Not-forgetting and Generalization of Open Temporal Graph Learning
-
SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation
-
Generating Likely Counterfactuals Using Sum-Product Networks
-
Greener GRASS: Enhancing GNNs with Encoding, Rewiring, and Attention
-
TS-LIF: A Temporal Segment Spiking Neuron Network for Time Series Forecasting
-
Exposure Bracketing Is All You Need For A High-Quality Image
-
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving
-
Enhanced Diffusion Sampling via Extrapolation with Multiple ODE Solutions
-
Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
-
Neuralized Markov Random Field for Interaction-Aware Stochastic Human Trajectory Prediction
-
Monte Carlo Planning with Large Language Model for Text-Based Game Agents
-
What's the Move? Hybrid Imitation Learning via Salient Points
-
Breaking the Reclustering Barrier in Centroid-based Deep Clustering
-
Balancing Bias in Two-sided Markets for Fair Stable Matchings
-
ILLUSION: Unveiling Truth with a Comprehensive Multi-Modal, Multi-Lingual Deepfake Dataset
-
Do LLM Agents Have Regret? A Case Study in Online Learning and Games
-
Bridging the Semantic Gap Between Text and Table: A Case Study on NL2SQL
-
ACTIVE: Offline Reinforcement Learning via Adaptive Imitation and In-sample $V$-Ensemble
-
Causal Effect Estimation with Mixed Latent Confounders and Post-treatment Variables
-
How Much is a Noisy Image Worth? Data Scaling Laws for Ambient Diffusion.
-
Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling
-
Decoupled Finetuning for Domain Generalizable Semantic Segmentation
-
econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians
-
GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation
-
Laplace Sample Information: Data Informativeness Through a Bayesian Lens
-
Systematic Relational Reasoning With Epistemic Graph Neural Networks
-
Generalization, Expressivity, and Universality of Graph Neural Networks on Attributed Graphs
-
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
-
Multi-Perspective Data Augmentation for Few-shot Object Detection
-
Homomorphism Counts as Structural Encodings for Graph Learning
-
A new framework for evaluating model out-of-distribution generalisation for the biochemical domain
-
Generalization Bounds and Model Complexity for Kolmogorov–Arnold Networks
-
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
-
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
-
$\gamma-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
-
Forgetting Transformer: Softmax Attention with a Forget Gate
-
On the Optimization Landscape of Low Rank Adaptation Methods for Large Language Models
-
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
-
AdaFisher: Adaptive Second Order Optimization via Fisher Information
-
Learning from Imperfect Human Feedback: A Tale from Corruption-Robust Dueling
-
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
-
UV-Attack: Physical-World Adversarial Attacks on Person Detection via Dynamic-NeRF-based UV Mapping
-
Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis
-
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
-
Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling
-
Adaptive Shrinkage Estimation for Personalized Deep Kernel Regression in Modeling Brain Trajectories
-
Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics
-
SOREL: A Stochastic Algorithm for Spectral Risks Minimization
-
TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data
-
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
-
Explanations of GNN on Evolving Graphs via Axiomatic Layer edges
-
Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models
-
TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction
-
Leveraging Flatness to Improve Information-Theoretic Generalization Bounds for SGD
-
CONTRA: Conformal Prediction Region via Normalizing Flow Transformation
-
Bayesian WeakS-to-Strong from Text Classification to Generation
-
Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities in Biomedicine
-
Distribution-Free Data Uncertainty for Neural Network Regression
-
Jump Your Steps: Optimizing Sampling Schedule of Discrete Diffusion Models
-
Rethinking Fair Representation Learning for Performance-Sensitive Tasks
-
Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation
-
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
-
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
-
Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling
-
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation
-
Content-Style Learning from Unaligned Domains: Identifiability under Unknown Latent Dimensions
-
A Benchmark for Semantic Sensitive Information in LLMs Outputs
-
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning
-
GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning
-
Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models
-
Equivariant Denoisers Cannot Copy Graphs: Align Your Graph Diffusion Models
-
Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization
-
Large (Vision) Language Models are Unsupervised In-Context Learners
-
ColPali: Efficient Document Retrieval with Vision Language Models
-
Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning
-
On the Byzantine-Resilience of Distillation-Based Federated Learning
-
OMG: Opacity Matters in Material Modeling with Gaussian Splatting
-
Counterfactual Generative Modeling with Variational Causal Inference
-
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
-
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
-
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
-
Minimax Optimal Two-Stage Algorithm For Moment Estimation Under Covariate Shift
-
Dynamic Low-Rank Sparse Adaptation for Large Language Models
-
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents
-
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
-
Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG
-
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
-
Track-On: Transformer-based Online Point Tracking with Memory
-
DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation
-
Kernel-based Optimally Weighted Conformal Time-Series Prediction
-
Knowledge Graph Finetuning Enhances Knowledge Manipulation in Large Language Models
-
TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights
-
Physics of Language Models: Part 3.2, Knowledge Manipulation
-
SimpleTM: A Simple Baseline for Multivariate Time Series Forecasting
-
Looking into User’s Long-term Interests through the Lens of Conservative Evidential Learning
-
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
-
EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation
-
ANaGRAM: A Natural Gradient Relative to Adapted Model for efficient PINNs learning
-
An Effective Manifold-based Optimization Method for Distributionally Robust Classification
-
Revisiting Source-Free Domain Adaptation: a New Perspective via Uncertainty Control
-
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
-
Neuron-based Multifractal Analysis of Neuron Interaction Dynamics in Large Models
-
DynFrs: An Efficient Framework for Machine Unlearning in Random Forest
-
Constraint-Conditioned Actor-Critic for Offline Safe Reinforcement Learning
-
MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction
-
Neural Causal Graph for Interpretable and Intervenable Classification
-
KinFormer: Generalizable Dynamical Symbolic Regression for Catalytic Organic Reaction Kinetics
-
ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration
-
Tree-Wasserstein Distance for High Dimensional Data with a Latent Feature Hierarchy
-
InversionGNN: A Dual Path Network for Multi-Property Molecular Optimization
-
Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral Perspective
-
Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers
-
ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler
-
YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus
-
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow
-
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
-
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning
-
Causal Information Prioritization for Efficient Reinforcement Learning
-
HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics
-
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
-
Balanced Neural ODEs: nonlinear model order reduction and Koopman operator approximations
-
cryoSPHERE: Single-Particle HEterogeneous REconstruction from cryo EM
-
Towards Semantic Equivalence of Tokenization in Multimodal LLM
-
Linear combinations of latents in generative models: subspaces and beyond
-
Shedding Light on Time Series Classification using Interpretability Gated Networks
-
Generalization Bounds for Canonicalization: A Comparative Study with Group Averaging
-
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation
-
Layout-your-3D: Controllable and Precise 3D Generation with 2D Blueprint
-
Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity
-
From Layers to States: A State Space Model Perspective to Deep Neural Network Layer Dynamics
-
TIGeR: Unifying Text-to-Image Generation and Retrieval with Large Multimodal Models
-
Towards Federated RLHF with Aggregated Client Preference for LLMs
-
Subtask-Aware Visual Reward Learning from Segmented Demonstrations
-
From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs
-
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
-
Quamba: A Post-Training Quantization Recipe for Selective State Space Models
-
Active Learning for Continual Learning: Keeping the Past Alive in the Present
-
Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation
-
SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark
-
Preble: Efficient Distributed Prompt Scheduling for LLM Serving
-
From Tokens to Lattices: Emergent Lattice Structures in Language Models
-
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
-
Topological Zigzag Spaghetti for Diffusion-based Generation and Prediction on Graphs
-
E(3)-equivariant models cannot learn chirality: Field-based molecular generation
-
A General Framework for Off-Policy Learning with Partially-Observed Reward
-
Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection
-
CipherPrune: Efficient and Scalable Private Transformer Inference
-
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
-
Last Iterate Convergence of Incremental Methods as a Model of Forgetting
-
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
-
AgentSquare: Automatic LLM Agent Search in Modular Design Space
-
GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting
-
DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing
-
GOFA: A Generative One-For-All Model for Joint Graph Language Modeling
-
Image Watermarks are Removable using Controllable Regeneration from Clean Noise
-
Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax
-
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
-
AdvPaint: Protecting Images from Inpainting Manipulation via Adversarial Attention Disruption
-
Bridging the Gap Between f-divergences and Bayes Hilbert Spaces
-
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape View
-
CAT-3DGS: A Context-Adaptive Triplane Approach to Rate-Distortion-Optimized 3DGS Compression
-
Tracking objects that change in appearance with phase synchrony
-
CATCH: Channel-Aware Multivariate Time Series Anomaly Detection via Frequency Patching
-
Local Steps Speed Up Local GD for Heterogeneous Distributed Logistic Regression
-
Robust Conformal Prediction with a Single Binary Certificate
-
Is uniform expressivity too restrictive? Towards efficient expressivity of GNNs
-
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
-
From Promise to Practice: Realizing High-performance Decentralized Training
-
Causal Concept Graph Models: Beyond Causal Opacity in Deep Learning
-
Hidden in the Noise: Two-Stage Robust Watermarking for Images
-
Unifying Causal Representation Learning with the Invariance Principle
-
ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities
-
Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning
-
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
-
Task Descriptors Help Transformers Learn Linear Models In-Context
-
Do as I do (Safely): Mitigating Task-Specific Fine-tuning Risks in Large Language Models
-
Enhancing Robust Fairness via Confusional Spectral Regularization
-
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
-
PT-T2I/V: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Image/Video-Task
-
HMoRA: Making LLMs More Effective with Hierarchical Mixture of LoRA Experts
-
Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step
-
DCT-CryptoNets: Scaling Private Inference in the Frequency Domain
-
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity
-
Efficient Top-m Data Values Identification for Data Selection
-
Differentially Private Steering for Large Language Model Alignment
-
Agent S: An Open Agentic Framework that Uses Computers Like a Human
-
Contractive Dynamical Imitation Policies for Efficient Out-of-Sample Recovery
-
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data
-
RFMamba: Frequency-Aware State Space Model for RF-Based Human-Centric Perception
-
BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments
-
Rethinking Graph Neural Networks From A Geometric Perspective Of Node Features
-
CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling
-
Enhancing Graph Of Thought: Enhancing Prompts with LLM Rationales and Dynamic Temperature Control
-
HiBug2: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging
-
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
-
Robust Root Cause Diagnosis using In-Distribution Interventions
-
FlashRNN: I/O-Aware Optimization of Traditional RNNs on modern hardware
-
One for all and all for one: Efficient computation of partial Wasserstein distances on the line
-
InstaSHAP: Interpretable Additive Models Explain Shapley Values Instantly
-
INS: Interaction-aware Synthesis to Enhance Offline Multi-agent Reinforcement Learning
-
Finally Rank-Breaking Conquers MNL Bandits: Optimal and Efficient Algorithms for MNL Assortment
-
Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives
-
Provably Safeguarding a Classifier from OOD and Adversarial Samples
-
Transformer Block Coupling and its Correlation with Generalization in LLMs
-
ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning
-
Addressing Label Shift in Distributed Learning via Entropy Regularization
-
Circuit Transformer: A Transformer That Preserves Logical Equivalence
-
RMB: Comprehensively benchmarking reward models in LLM alignment
-
Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness
-
Reconsidering Faithfulness in Regular, Self-Explainable and Domain Invariant GNNs
-
Boosting the visual interpretability of CLIP via adversarial fine-tuning
-
Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
-
No Equations Needed: Learning System Dynamics Without Relying on Closed-Form ODEs
-
Re-evaluating Open-ended Evaluation of Large Language Models
-
PaPaGei: Open Foundation Models for Optical Physiological Signals
-
Diffusing States and Matching Scores: A New Framework for Imitation Learning
-
Safety Layers in Aligned Large Language Models: The Key to LLM Security
-
StringLLM: Understanding the String Processing Capability of Large Language Models
-
Open-Set Graph Anomaly Detection via Normal Structure Regularisation
-
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning
-
Beyond single neurons: population response geometry in digital twins of mouse visual cortex
-
Diffusion State-Guided Projected Gradient for Inverse Problems
-
AstroCompress: A benchmark dataset for multi-purpose compression of astronomical data
-
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
-
CHAMP: Conformalized 3D Human Multi-Hypothesis Pose Estimators
-
DeeperForward: Enhanced Forward-Forward Training for Deeper and Better Performance
-
Air Quality Prediction with Physics-Guided Dual Neural ODEs in Open Systems
-
Semantic Loss Guided Data Efficient Supervised Fine Tuning for Safe Responses in LLMs
-
Unearthing Skill-level Insights for Understanding Trade-offs of Foundation Models
-
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
-
Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment
-
ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents
-
RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data
-
Efficient Dictionary Learning with Switch Sparse Autoencoders
-
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
-
DUET: Decentralized Bilevel Optimization without Lower-Level Strong Convexity
-
Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context
-
CURIE: Evaluating LLMs on Multitask Scientific Long-Context Understanding and Reasoning
-
Neural Phylogeny: Fine-Tuning Relationship Detection among Neural Networks
-
Indirect Gradient Matching for Adversarial Robust Distillation
-
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models
-
Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration
-
Hotspot-Driven Peptide Design via Multi-Fragment Autoregressive Extension
-
Learning a Neural Solver for Parametric PDEs to Enhance Physics-Informed Methods
-
Quantifying Generalization Complexity for Large Language Models
-
To Clip or not to Clip: the Dynamics of SGD with Gradient Clipping in High-Dimensions
-
The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling
-
Long-Sequence Recommendation Models Need Decoupled Embeddings
-
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities
-
U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models
-
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
-
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
-
AssembleFlow: Rigid Flow Matching with Inertial Frames for Molecular Assembly
-
Extending Mercer's expansion to indefinite and asymmetric kernels
-
Learning Interleaved Image-Text Comprehension in Vision-Language Large Models
-
UniDrive: Towards Universal Driving Perception Across Camera Configurations
-
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
-
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
-
Positive-Unlabeled Diffusion Models for Preventing Sensitive Data Generation
-
BLEND: Behavior-guided Neural Population Dynamics Modeling via Privileged Knowledge Distillation
-
ZETA: Leveraging $Z$-order Curves for Efficient Top-$k$ Attention
-
Attribute-based Visual Reprogramming for Vision-Language Models
-
Minimalistic Predictions for Online Class Constraint Scheduling
-
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
-
Optimizing importance weighting in the presence of sub-population shifts
-
Temporal Difference Learning: Why It Can Be Fast and How It Will Be Faster
-
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
-
WeatherGFM: Learning a Weather Generalist Foundation Model via In-context Learning
-
CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking
-
Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
-
Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks
-
Risk-Sensitive Variational Actor-Critic: A Model-Based Approach
-
Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs
-
M^3PC: Test-time Model Predictive Control using Pretrained Masked Trajectory Model
-
Sparse autoencoders reveal selective remapping of visual concepts during adaptation
-
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
-
Representational Similarity via Interpretable Visual Concepts
-
SafeDiffuser: Safe Planning with Diffusion Probabilistic Models
-
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences
-
THE ROBUSTNESS OF DIFFERENTIABLE CAUSAL DISCOVERY IN MISSPECIFIED SCENARIOS
-
Language models scale reliably with over-training and on downstream tasks
-
PaCA: Partial Connection Adaptation for Efficient Fine-Tuning
-
End-to-end Learning of Gaussian Mixture Priors for Diffusion Sampler
-
REMEDY: Recipe Merging Dynamics in Large Vision-Language Models
-
Selective Aggregation for Low-Rank Adaptation in Federated Learning
-
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
-
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
-
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
-
Stochastic variance-reduced Gaussian variational inference on the Bures-Wasserstein manifold
-
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting
-
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
-
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding
-
Provably Robust Explainable Graph Neural Networks against Graph Perturbation Attacks
-
The Optimization Landscape of SGD Across the Feature Learning Strength
-
Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion
-
EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation
-
Improving Graph Neural Networks by Learning Continuous Edge Directions
-
Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs
-
Interpreting Language Reward Models via Contrastive Explanations
-
Disentangling 3D Animal Pose Dynamics with Scrubbed Conditional Latent Variables
-
MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS
-
SymDiff: Equivariant Diffusion via Stochastic Symmetrisation
-
Parameter and Memory Efficient Pretraining via Low-rank Riemannian Optimization
-
Deep Distributed Optimization for Large-Scale Quadratic Programming
-
Neural Stochastic Differential Equations for Uncertainty-Aware Offline RL
-
DPaI: Differentiable Pruning at Initialization with Node-Path Balance Principle
-
Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond
-
Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning
-
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
-
AutoG: Towards automatic graph construction from tabular data
-
DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory
-
Mind Control through Causal Inference: Predicting Clean Images from Poisoned Data
-
OvercookedV2: Rethinking Overcooked for Zero-Shot Coordination
-
Adversarial Search Engine Optimization for Large Language Models
-
Causal Representation Learning from Multimodal Biomedical Observations
-
Towards Robust Multimodal Open-set Test-time Adaptation via Adaptive Entropy-aware Optimization
-
Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction
-
Watch Less, Do More: Implicit Skill Discovery for Video-Conditioned Policy
-
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
-
Causal Graphical Models for Vision-Language Compositional Understanding
-
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
-
Immunogenicity Prediction with Dual Attention Enables Vaccine Target Selection
-
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization
-
DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors
-
GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation
-
Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration
-
SAGEPhos: Sage Bio-Coupled and Augmented Fusion for Phosphorylation Site Detection
-
OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?
-
NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
-
GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering
-
HAMSTER: Hierarchical Action Models for Open-World Robot Manipulation
-
Improving Complex Reasoning with Dynamic Prompt Corruption: A Soft Prompt Optimization Approach
-
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
-
BANGS: Game-theoretic Node Selection for Graph Self-Training
-
RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval
-
Mix-CPT: A Domain Adaptation Framework via Decoupling Knowledge Learning and Format Alignment
-
Sensitivity Verification for Additive Decision Tree Ensembles
-
An Auditing Test to Detect Behavioral Shift in Language Models
-
Rethinking the role of frames for SE(3)-invariant crystal structure modeling
-
Learning to Select Nodes in Branch and Bound with Sufficient Tree Representation
-
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches
-
T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data
-
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
-
Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension ability
-
Rethinking Visual Counterfactual Explanations Through Region Constraint
-
A Conditional Independence Test in the Presence of Discretization
-
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
-
ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models
-
Directional Gradient Projection for Robust Fine-Tuning of Foundation Models
-
SCBench: A KV Cache-Centric Analysis of Long-Context Methods
-
Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models
-
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation
-
Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search
-
Precise Parameter Localization for Textual Generation in Diffusion Models
-
MAI: A Multi-turn Aggregation-Iteration Model for Composed Image Retrieval
-
Residual-MPPI: Online Policy Customization for Continuous Control
-
Efficient Biological Data Acquisition through Inference Set Design
-
Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining
-
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
-
DocMIA: Document-Level Membership Inference Attacks against DocVQA Models
-
Scalable Influence and Fact Tracing for Large Language Model Pretraining
-
Towards Auto-Regressive Next-Token Prediction: In-context Learning Emerges from Generalization
-
Integrative Decoding: Improving Factuality via Implicit Self-consistency
-
CrossMPT: Cross-attention Message-passing Transformer for Error Correcting Codes
-
Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark
-
GNNs Getting ComFy: Community and Feature Similarity Guided Rewiring
-
Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness
-
ADBM: Adversarial Diffusion Bridge Model for Reliable Adversarial Purification
-
Eliminating Position Bias of Language Models: A Mechanistic Approach
-
Transformer Learns Optimal Variable Selection in Group-Sparse Classification
-
Robust-PIFu: Robust Pixel-aligned Implicit Function for 3D Human Digitalization from a Single Image
-
ADePT: Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
-
As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss
-
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling
-
FIG: Flow with Interpolant Guidance for Linear Inverse Problems
-
Weighted-Reward Preference Optimization for Implicit Model Fusion
-
Causal Graph Transformer for Treatment Effect Estimation Under Unknown Interference
-
Learned Reference-based Diffusion Sampler for multi-modal distributions
-
Exploring The Forgetting in Adversarial Training: A Novel Method for Enhancing Robustness
-
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
-
$\phi$-Update: A Class of Policy Update Methods with Policy Convergence Guarantee
-
Teaching Human Behavior Improves Content Understanding Abilities Of VLMs
-
Enhancing End-to-End Autonomous Driving with Latent World Model
-
Statistical Advantages of Perturbing Cosine Router in Mixture of Experts
-
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
-
Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
-
On Evaluating the Durability of Safeguards for Open-Weight LLMs
-
GReaTer: Gradients Over Reasoning Makes Smaller Language Models Strong Prompt Optimizers
-
Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets
-
LongMamba: Enhancing Mamba's Long-Context Capabilities via Training-Free Receptive Field Enlargement
-
Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning
-
What is Wrong with Perplexity for Long-context Language Modeling?
-
Recovery of Causal Graph Involving Latent Variables via Homologous Surrogates
-
OpenPRM: Building Open-domain Process-based Reward Models with Preference Trees
-
Efficient Discovery of Pareto Front for Multi-Objective Reinforcement Learning
-
TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models
-
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
-
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
-
API Pack: A Massive Multi-Programming Language Dataset for API Call Generation
-
Capability Localization: Capabilities Can be Localized rather than Individual Knowledge
-
Safety-Prioritizing Curricula for Constrained Reinforcement Learning
-
Parameter Expanded Stochastic Gradient Markov Chain Monte Carlo
-
What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models
-
TD-Paint: Faster Diffusion Inpainting Through Time-Aware Pixel Conditioning
-
Value-aligned Behavior Cloning for Offline Reinforcement Learning via Bi-level Optimization
-
BadRobot: Jailbreaking Embodied LLM Agents in the Physical World
-
Disentangled Representation Learning with the Gromov-Monge Gap
-
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
-
Learning Neural Networks with Distribution Shift: Efficiently Certifiable Guarantees
-
Filtered not Mixed: Filtering-Based Online Gating for Mixture of Large Language Models
-
Looking Inward: Language Models Can Learn About Themselves by Introspection
-
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation
-
Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning
-
Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood
-
How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?
-
Multi-Resolution Decomposable Diffusion Model for Non-Stationary Time Series Anomaly Detection
-
On Minimizing Adversarial Counterfactual Error in Adversarial Reinforcement Learning
-
Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives
-
Continuous Ensemble Weather Forecasting with Diffusion models
-
DataMan: Data Manager for Pre-training Large Language Models
-
UniDetox: Universal Detoxification of Large Language Models via Dataset Distillation
-
Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count
-
Look Before You Leap: Universal Emergent Mechanism for Retrieval in Language Models
-
3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling
-
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
-
Learning Clustering-based Prototypes for Compositional Zero-Shot Learning
-
Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling
-
Pairwise Elimination with Instance-Dependent Guarantees for Bandits with Cost Subsidy
-
Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
-
Beyond Worst-Case Dimensionality Reduction for Sparse Vectors
-
Dreamweaver: Learning Compositional World Models from Pixels
-
Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model
-
Ensembling Diffusion Models via Adaptive Feature Aggregation
-
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
-
Efficient Reinforcement Learning with Large Language Model Priors
-
Federated Residual Low-Rank Adaptation of Large Language Models
-
Convex Formulations for Training Two-Layer ReLU Neural Networks
-
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph
-
TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice
-
DeepTAGE: Deep Temporal-Aligned Gradient Enhancement for Optimizing Spiking Neural Networks
-
Learning Diagrams: A Graphical Language for Compositional Training Regimes
-
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace
-
Aligned Datasets Improve Detection of Latent Diffusion-Generated Images
-
Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation
-
Learning Structured Universe Graph with Outlier OOD Detection for Partial Matching
-
Fine-Tuning Attention Modules Only: Enhancing Weight Disentanglement in Task Arithmetic
-
Reasoning with Latent Thoughts: On the Power of Looped Transformers
-
Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness
-
Gaussian-Based Instance-Adaptive Intensity Modeling for Point-Supervised Facial Expression Spotting
-
SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation
-
Offline Hierarchical Reinforcement Learning via Inverse Optimization
-
FreeVS: Generative View Synthesis on Free Driving Trajectory
-
Training Free Exponential Context Extension via Cascading KV Cache
-
Tool-Planner: Task Planning with Clusters across Multiple Tools
-
MOFFlow: Flow Matching for Structure Prediction of Metal-Organic Frameworks
-
MixMax: Distributional Robustness in Function Space via Optimal Data Mixtures
-
Locality-aware Gaussian Compression for Fast and High-quality Rendering
-
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
-
PooDLe🐩: Pooled and dense self-supervised learning from naturalistic videos
-
CodePlan: Unlocking Reasoning Potential in Large Language Models by Scaling Code-form Planning
-
Large Language Models Assume People are More Rational than We Really are
-
Hybrid Regularization Improves Diffusion-based Inverse Problem Solving
-
Not All Language Model Features Are One-Dimensionally Linear
-
CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale
-
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
-
MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer
-
Identifying latent state transitions in non-linear dynamical systems
-
Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval
-
Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions
-
Continuous Autoregressive Modeling with Stochastic Monotonic Alignment for Speech Synthesis
-
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
-
Towards Generalization Bounds of GCNs for Adversarially Robust Node Classification
-
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
-
Occlusion-aware Non-Rigid Point Cloud Registration via Unsupervised Neural Deformation Correntropy
-
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
-
Towards Interpreting Visual Information Processing in Vision-Language Models
-
On-the-fly Preference Alignment via Principle-Guided Decoding
-
GeoILP: A Synthetic Dataset to Guide Large-Scale Rule Induction
-
Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond
-
From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation
-
Towards Homogeneous Lexical Tone Decoding from Heterogeneous Intracranial Recordings
-
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
-
Tight Time Complexities in Parallel Stochastic Optimization with Arbitrary Computation Dynamics
-
Accelerating Training with Neuron Interaction and Nowcasting Networks
-
CertainlyUncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
-
Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding
-
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
-
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
-
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
-
REvolve: Reward Evolution with Large Language Models using Human Feedback
-
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
-
Interaction Asymmetry: A General Principle for Learning Composable Abstractions
-
Beyond FVD: An Enhanced Evaluation Metrics for Video Generation Distribution Quality
-
Diff-PIC: Revolutionizing Particle-In-Cell Nuclear Fusion Simulation with Diffusion Models
-
Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models
-
Out-of-distribution Generalization for Total Variation based Invariant Risk Minimization
-
$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs
-
Repulsive Latent Score Distillation for Solving Inverse Problems
-
Breaking the $\log(1/\Delta_2)$ Barrier: Better Batched Best Arm Identification with Adaptive Grids
-
Test-time Adaptation for Image Compression with Distribution Regularization
-
GeoLoRA: Geometric integration for parameter efficient fine-tuning
-
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation
-
Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap
-
SOO-Bench: Benchmarks for Evaluating the Stability of Offline Black-Box Optimization
-
Selective Induction Heads: How Transformers Select Causal Structures in Context
-
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
-
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces
-
CycleResearcher: Improving Automated Research via Automated Review
-
Diffusion Transformers for Tabular Data Time Series Generation
-
No Preference Left Behind: Group Distributional Preference Optimization
-
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
-
Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective
-
Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass
-
Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs
-
Prototype antithesis for biological few-shot class-incremental learning
-
CoMRes: Semi-Supervised Time Series Forecasting Utilizing Consensus Promotion of Multi-Resolution
-
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
-
Bridging the Gap between Variational Inference and Stochastic Gradient MCMC in Function Space
-
Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations
-
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
-
MeshMask: Physics-Based Simulations with Masked Graph Neural Networks
-
Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer
-
Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning
-
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
-
$\text{I}^2\text{AM}$: Interpreting Image-to-Image Latent Diffusion Models via Bi-Attribution Maps
-
A Quantum Circuit-Based Compression Perspective for Parameter-Efficient Learning
-
Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures
-
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing
-
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models
-
Rethinking Shapley Value for Negative Interactions in Non-convex Games
-
Adapting Multi-modal Large Language Model to Concept Drift From Pre-training Onwards
-
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks
-
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
-
DeepGate4: Efficient and Effective Representation Learning for Circuit Design at Scale
-
Risk-Sensitive Diffusion: Robustly Optimizing Diffusion Models with Noisy Samples
-
Inner Information Analysis Algorithm for Deep Neural Network based on Community
-
Efficient Evolutionary Search Over Chemical Space with Large Language Models
-
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
-
RecDreamer: Consistent Text-to-3D Generation via Uniform Score Distillation
-
KAA: Kolmogorov-Arnold Attention for Enhancing Attentive Graph Neural Networks
-
Understanding and Enhancing the Transferability of Jailbreaking Attacks
-
Robust Representation Consistency Model via Contrastive Denoising
-
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws
-
Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling
-
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
-
Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement
-
Denoising Autoregressive Transformers for Scalable Text-to-Image Generation
-
Learning Geometric Reasoning Networks For Robot Task And Motion Planning
-
MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation
-
VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models
-
IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking
-
MotionClone: Training-Free Motion Cloning for Controllable Video Generation
-
InCoDe: Interpretable Compressed Descriptions For Image Generation
-
Searching for Optimal Solutions with LLMs via Bayesian Optimization
-
QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing
-
Confidence Elicitation: A New Attack Vector for Large Language Models
-
Injecting Universal Jailbreak Backdoors into LLMs in Minutes
-
High-Quality Joint Image and Video Tokenization with Causal VAE
-
Group-robust Sample Reweighting for Subpopulation Shifts via Influence Functions
-
Stabilized Neural Prediction of Potential Outcomes in Continuous Time
-
Robust Feature Learning for Multi-Index Models in High Dimensions
-
ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning
-
Wasserstein-Regularized Conformal Prediction under General Distribution Shift
-
Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering
-
Self-Updatable Large Language Models by Integrating Context into Model Parameters
-
SaMer: A Scenario-aware Multi-dimensional Evaluator for Large Language Models
-
Investigating Pattern Neurons in Urban Time Series Forecasting
-
Retrieval Augmented Diffusion Model for Structure-informed Antibody Design and Optimization
-
CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes
-
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
-
Self-Play Preference Optimization for Language Model Alignment
-
One-for-All Few-Shot Anomaly Detection via Instance-Induced Prompt Learning
-
Rethinking Spiking Neural Networks from an Ensemble Learning Perspective
-
DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models
-
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
-
PvNeXt: Rethinking Network Design and Temporal Motion for Point Cloud Video Recognition
-
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
-
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
-
InstantPortrait: One-Step Portrait Editing via Diffusion Multi-Objective Distillation
-
What Makes Large Language Models Reason in (Multi-Turn) Code Generation?
-
Generalized Consistency Trajectory Models for Image Manipulation
-
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
-
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
-
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
-
ASTrA: Adversarial Self-supervised Training with Adaptive-Attacks
-
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
-
Actions Speak Louder Than Words: Rate-Reward Trade-off in Markov Decision Processes
-
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
-
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
-
A Policy-Gradient Approach to Solving Imperfect-Information Games with Best-Iterate Convergence
-
ARB-LLM: Alternating Refined Binarizations for Large Language Models
-
Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate
-
Robust System Identification: Finite-sample Guarantees and Connection to Regularization
-
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs
-
Simulating Training Dynamics to Reconstruct Training Data from Deep Neural Networks
-
Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference
-
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
-
Improving Equivariant Networks with Probabilistic Symmetry Breaking
-
InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling
-
Cross-Domain Off-Policy Evaluation and Learning for Contextual Bandits
-
PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders
-
NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance
-
CameraCtrl: Enabling Camera Control for Video Diffusion Models
-
FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models
-
Scaling Laws for Adversarial Attacks on Language Model Activations and Tokens
-
Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs
-
Toward Efficient Multi-Agent Exploration With Trajectory Entropy Maximization
-
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks
-
CirT: Global Subseasonal-to-Seasonal Forecasting with Geometry-inspired Transformer
-
MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model
-
Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback
-
W-PCA Based Gradient-Free Proxy for Efficient Search of Lightweight Language Models
-
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
-
Dist Loss: Enhancing Regression in Few-Shot Region through Distribution Distance Constraint
-
Multilevel Generative Samplers for Investigating Critical Phenomena
-
Calibrating LLMs with Information-Theoretic Evidential Deep Learning
-
GOttack: Universal Adversarial Attacks on Graph Neural Networks via Graph Orbits Learning
-
BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models
-
A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement
-
COMBO: Compositional World Models for Embodied Multi-Agent Cooperation
-
On the Crucial Role of Initialization for Matrix Factorization
-
Progressive Parameter Efficient Transfer Learning for Semantic Segmentation
-
PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer
-
Reconstruction-Guided Policy: Enhancing Decision-Making through Agent-Wise State Consistency
-
SigDiffusions: Score-Based Diffusion Models for Time Series via Log-Signature Embeddings
-
Efficient Interpolation between Extragradient and Proximal Methods for Weak MVIs
-
Infinite-Resolution Integral Noise Warping for Diffusion Models
-
EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
-
GraphArena: Evaluating and Exploring Large Language Models on Graph Computation
-
Efficient Neuron Segmentation in Electron Microscopy by Affinity-Guided Queries
-
Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning
-
MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation
-
Geometry of Lightning Self-Attention: Identifiability and Dimension
-
Unlocking Guidance for Discrete State-Space Diffusion and Flow Models
-
Attributing Culture-Conditioned Generations to Pretraining Corpora
-
UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
-
IntersectionZoo: Eco-driving for Benchmarking Multi-Agent Contextual Reinforcement Learning
-
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
-
TSC-Net: Prediction of Pedestrian Trajectories by Trajectory-Scene-Cell Classification
-
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
-
COAT: Compressing Optimizer states and Activations for Memory-Efficient FP8 Training
-
Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining
-
See It from My Perspective: How Language Affects Cultural Bias in Image Understanding
-
Agree to Disagree: Demystifying Homogeneous Deep Ensembles through Distributional Equivalence
-
Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On
-
Graph Neural Networks for Edge Signals: Orientation Equivariance and Invariance
-
AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors
-
Going Beyond Static: Understanding Shifts with Time-Series Attribution
-
StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces
-
ContraDiff: Planning Towards High Return States via Contrastive Learning
-
Query-based Knowledge Transfer for Heterogeneous Learning Environments
-
DOCS: Quantifying Weight Similarity for Deeper Insights into Large Language Models
-
SimulPL: Aligning Human Preferences in Simultaneous Machine Translation
-
A Statistical Approach for Controlled Training Data Detection
-
MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses
-
Fine-tuning can Help Detect Pretraining Data from Large Language Models
-
Long-time asymptotics of noisy SVGD outside the population limit
-
Everything is Editable: Extend Knowledge Editing to Unstructured Data in Large Language Models
-
Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling
-
Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning
-
F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI
-
Reassessing How to Compare and Improve the Calibration of Machine Learning Models
-
Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator
-
Semantic Aware Representation Learning for Lifelong Learning
-
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
-
Controllable Blur Data Augmentation Using 3D-Aware Motion Estimation
-
Optimal Non-Asymptotic Rates of Value Iteration for Average-Reward Markov Decision Processes
-
A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops
-
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
-
Anyprefer: An Agentic Framework for Preference Data Synthesis
-
Small Models are LLM Knowledge Triggers for Medical Tabular Prediction
-
GenDataAgent: On-the-fly Dataset Augmentation with Synthetic Data
-
GMValuator: Similarity-based Data Valuation for Generative Models
-
Optimal Learning of Kernel Logistic Regression for Complex Classification Scenarios
-
Building Math Agents with Multi-Turn Iterative Preference Learning
-
Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies
-
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
-
A Large-scale Dataset and Benchmark for Commuting Origin-Destination Flow Generation
-
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning
-
Breaking Free from MMI: A New Frontier in Rationalization by Probing Input Utilization
-
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
-
Transition Path Sampling with Improved Off-Policy Training of Diffusion Path Samplers
-
Re-Evaluating the Impact of Unseen-Class Unlabeled Data on Semi-Supervised Learning Model
-
Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing
-
REBIND: Enhancing Ground-state Molecular Conformation Prediction via Force-Based Graph Rewiring
-
Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation
-
Training Robust Ensembles Requires Rethinking Lipschitz Continuity
-
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
-
Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inference
-
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
-
Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers
-
PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance
-
Adaptive Pruning of Pretrained Transformer via Differential Inclusions
-
Data Center Cooling System Optimization Using Offline Reinforcement Learning
-
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
-
Decision Information Meets Large Language Models: The Future of Explainable Operations Research
-
Forget the Data and Fine-Tuning! Just Fold the Network to Compress
-
Differentially Private Federated Learning with Time-Adaptive Privacy Spending
-
TimeInf: Time Series Data Contribution via Influence Functions
-
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
-
HaDeMiF: Hallucination Detection and Mitigation in Large Language Models
-
Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving
-
OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning
-
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models
-
Fengbo: a Clifford Neural Operator pipeline for 3D PDEs in Computational Fluid Dynamics
-
Learning Interpretable Hierarchical Dynamical Systems Models from Time Series Data
-
NeSyC: A Neuro-symbolic Continual Learner For Complex Embodied Tasks In Open Domains
-
K-HALU: Multiple Answer Korean Hallucination Benchmark for Large Language Models
-
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
-
Correlation and Navigation in the Vocabulary Key Representation Space of Language Models
-
CREAM: Consistency Regularized Self-Rewarding Language Models
-
ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding
-
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
-
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
-
Transformer Encoder Satisfiability: Complexity and Impact on Formal Reasoning
-
Convergence of Distributed Adaptive Optimization with Local Updates
-
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving
-
Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks
-
Designing Mechanical Meta-Materials by Learning Equivariant Flows
-
MCNC: Manifold-Constrained Reparameterization for Neural Compression
-
TypedThinker: Diversify Large Language Model Reasoning with Typed Thinking
-
SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
-
Accelerating 3D Molecule Generation via Jointly Geometric Optimal Transport
-
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
-
Divergence of Neural Tangent Kernel in Classification Problems
-
Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator
-
Advancing Prompt-Based Methods for Replay-Independent General Continual Learning
-
Robustness Auditing for Linear Regression: To Singularity and Beyond
-
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
-
RA-TTA: Retrieval-Augmented Test-Time Adaptation for Vision-Language Models
-
Decision Tree Induction Through LLMs via Semantically-Aware Evolution
-
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning
-
Think Then React: Towards Unconstrained Action-to-Reaction Motion Generation
-
Triples as the Key: Structuring Makes Decomposition and Verification Easier in LLM-based TableQA
-
How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning
-
Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Transformers
-
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
-
Mitigating Spurious Correlations in Zero-Shot Multimodal Models
-
The Breakdown of Gaussian Universality in Classification of High-dimensional Linear Factor Mixtures
-
Accurate and Scalable Graph Neural Networks via Message Invariance
-
Strong Preferences Affect the Robustness of Preference Models and Value Alignment
-
VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation
-
BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL
-
Adversarial Training for Defense Against Label Poisoning Attacks
-
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
-
Uncertainty Herding: One Active Learning Method for All Label Budgets
-
Trajectory-LLM: A Language-based Data Generator for Trajectory Prediction in Autonomous Driving
-
Learning Hierarchical Polynomials of Multiple Nonlinear Features
-
Circuit Representation Learning with Masked Gate Modeling and Verilog-AIG Alignment
-
Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
-
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
-
Self-Supervised Diffusion Models for Electron-Aware Molecular Representation Learning
-
Exploring the Design Space of Visual Context Representation in Video MLLMs
-
The 3D-PC: a benchmark for visual perspective taking in humans and machines
-
InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences
-
MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba
-
Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness
-
A Formal Framework for Understanding Length Generalization in Transformers
-
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
-
Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents
-
Poisson-Dirac Neural Networks for Modeling Coupled Dynamical Systems across Domains
-
GPS: A Probabilistic Distributional Similarity with Gumbel Priors for Set-to-Set Matching
-
Multi-Task Dense Predictions via Unleashing the Power of Diffusion
-
Discovering Group Structures via Unitary Representation Learning
-
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
-
T2V2: A Unified Non-Autoregressive Model for Speech Recognition and Synthesis via Multitask Learning
-
FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling
-
AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation
-
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
-
A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts
-
Learning Video-Conditioned Policy on Unlabelled Data with Joint Embedding Predictive Transformer
-
Real-time design of architectural structures with differentiable mechanics and neural networks
-
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice
-
Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach
-
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
-
Measuring And Improving Engagement of Text-to-Image Generation Models
-
Law of the Weakest Link: Cross Capabilities of Large Language Models
-
Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems
-
UTILITY: Utilizing Explainable Reinforcement Learning to Improve Reinforcement Learning
-
State Space Model Meets Transformer: A New Paradigm for 3D Object Detection
-
On the Relation between Trainability and Dequantization of Variational Quantum Learning Models
-
Boosting Neural Combinatorial Optimization for Large-Scale Vehicle Routing Problems
-
HyperPLR: Hypergraph Generation through Projection, Learning, and Reconstruction
-
Reliable and Diverse Evaluation of LLM Medical Knowledge Mastery
-
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
-
DECO: Unleashing the Potential of ConvNets for Query-based Detection and Segmentation
-
Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry
-
Metalic: Meta-Learning In-Context with Protein Language Models
-
CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification
-
Relation-Aware Diffusion for Heterogeneous Graphs with Partially Observed Features
-
ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension
-
Physics-informed Temporal Difference Metric Learning for Robot Motion Planning
-
Noise Separation guided Candidate Label Reconstruction for Noisy Partial Label Learning
-
PolyNet: Learning Diverse Solution Strategies for Neural Combinatorial Optimization
-
FIRING-Net: A filtered feature recycling network for speech enhancement
-
Improving Neural Network Accuracy by Concurrently Training with a Twin Network
-
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
-
Efficient Model Editing with Task-Localized Sparse Fine-tuning
-
Provence: efficient and robust context pruning for retrieval-augmented generation
-
Learning to Adapt Frozen CLIP for Few-Shot Test-Time Domain Adaptation
-
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
-
Learning Harmonized Representations for Speculative Sampling
-
You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs
-
PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks
-
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
-
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
-
SpaceGNN: Multi-Space Graph Neural Network for Node Anomaly Detection with Extremely Limited Labels
-
LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content
-
Fast Training of Sinusoidal Neural Fields via Scaling Initialization
-
Bridging the Gap between Database Search and \emph{De Novo} Peptide Sequencing with SearchNovo
-
Logical Consistency of Large Language Models in Fact-Checking
-
P-SPIKESSM: HARNESSING PROBABILISTIC SPIKING STATE SPACE MODELS FOR LONG-RANGE DEPENDENCY TASKS
-
GameArena: Evaluating LLM Reasoning through Live Computer Games
-
TabM: Advancing tabular deep learning with parameter-efficient ensembling
-
UNSURE: self-supervised learning with Unknown Noise level and Stein's Unbiased Risk Estimate
-
Spurious Forgetting in Continual Learning of Language Models
-
Release the Powers of Prompt Tuning: Cross-Modality Prompt Transfer
-
Heavy-Tailed Diffusion with Denoising Levy Probabilistic Models
-
KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks
-
Conservative Contextual Bandits: Beyond Linear Representations
-
Is In-Context Learning Sufficient for Instruction Following in LLMs?
-
VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing
-
Fat-to-Thin Policy Optimization: Offline Reinforcement Learning with Sparse Policies
-
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
-
Learnable Expansion of Graph Operators for Multi-Modal Feature Fusion
-
Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias
-
High-Dimensional Bayesian Optimisation with Gaussian Process Prior Variational Autoencoders
-
Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training
-
NExUME: Adaptive Training and Inference for DNNs under Intermittent Power Environments
-
TorchTitan: One-stop PyTorch native solution for production ready LLM pretraining
-
HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction
-
GROOT-2: Weakly Supervised Multimodal Instruction Following Agents
-
Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control
-
Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing
-
Reframing Structure-Based Drug Design Model Evaluation via Metrics Correlated to Practical Needs
-
Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors
-
A Riemannian Framework for Learning Reduced-order Lagrangian Dynamics
-
CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models
-
Diffusion Models as Cartoonists: The Curious Case of High Density Regions
-
3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery
-
Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression
-
Shape as Line Segments: Accurate and Flexible Implicit Surface Representation
-
HOPE for a Robust Parameterization of Long-memory State Space Models
-
Unleashing the Power of Task-Specific Directions in Parameter Efficient Fine-tuning
-
Kronecker Mask and Interpretive Prompts are Language-Action Video Learners
-
Closed-Form Merging of Parameter-Efficient Modules for Federated Continual Learning
-
GS-LiDAR: Generating Realistic LiDAR Point Clouds with Panoramic Gaussian Splatting
-
DiffPC: Diffusion-based High Perceptual Fidelity Image Compression with Semantic Refinement
-
A Distributional Approach to Uncertainty-Aware Preference Alignment Using Offline Demonstrations
-
Progress or Regress? Self-Improvement Reversal in Post-training
-
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
-
PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection
-
SGD with memory: fundamental properties and stochastic acceleration
-
A Simple Framework for Open-Vocabulary Zero-Shot Segmentation
-
Model-Free Offline Reinforcement Learning with Enhanced Robustness
-
Is Your Multimodal Language Model Oversensitive to Safe Queries?
-
Near, far: Patch-ordering enhances vision foundation models' scene understanding
-
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
-
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
-
Statistical Tractability of Off-policy Evaluation of History-dependent Policies in POMDPs
-
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
-
Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations
-
Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation
-
InstaTrain: Adaptive Training via Ultra-Fast Natural Annealing within Dynamical Systems
-
Transformers Handle Endogeneity in In-Context Linear Regression
-
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
-
Size-Generalizable RNA Structure Evaluation by Exploring Hierarchical Geometries
-
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
-
Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning
-
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
-
Microcanonical Langevin Ensembles: Advancing the Sampling of Bayesian Neural Networks
-
Scale-aware Recognition in Satellite Images under Resource Constraints
-
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences
-
Model-agnostic meta-learners for estimating heterogeneous treatment effects over time
-
State Space Models are Provably Comparable to Transformers in Dynamic Token Selection
-
ImageFolder: Autoregressive Image Generation with Folded Tokens
-
Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence
-
SoftMatcha: A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches
-
Efficient Source-Free Time-Series Adaptation via Parameter Subspace Disentanglement
-
Towards Self-Supervised Covariance Estimation in Deep Heteroscedastic Regression
-
A Closer Look at Machine Unlearning for Large Language Models
-
Debiasing Mini-Batch Quadratics for Applications in Deep Learning
-
SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
-
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
-
Beyond Random Masking: When Dropout meets Graph Convolutional Networks
-
Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model
-
RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards
-
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
-
Gap Preserving Distillation by Building Bidirectional Mappings with A Dynamic Teacher
-
SegLLM: Multi-round Reasoning Segmentation with Large Language Models
-
InfoGS: Efficient Structure-Aware 3D Gaussians via Lightweight Information Shaping
-
Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning
-
CL-DiffPhyCon: Closed-loop Diffusion Control of Complex Physical Systems
-
A transfer learning framework for weak to strong generalization
-
GenVP: Generating Visual Puzzles with Contrastive Hierarchical VAEs
-
Towards Continuous Reuse of Graph Models via Holistic Memory Diversification
-
Universal Image Restoration Pre-training via Degradation Classification
-
Connectome Mapping: Shape-Memory Network via Interpretation of Contextual Semantic Information
-
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
-
Execution-guided within-prompt search for programming-by-example
-
Interpretable Unsupervised Joint Denoising and Enhancement for Real-World low-light Scenarios
-
ReSi: A Comprehensive Benchmark for Representational Similarity Measures
-
Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization
-
Gaussian Ensemble Belief Propagation for Efficient Inference in High-Dimensional, Black-box Systems
-
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
-
Humanizing the Machine: Proxy Attacks to Mislead LLM Detectors
-
RTop-K: Ultra-Fast Row-Wise Top-K Selection for Neural Network Acceleration on GPUs
-
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment
-
Select before Act: Spatially Decoupled Action Repetition for Continuous Control
-
Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models
-
Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models
-
QuaDiM: A Conditional Diffusion Model For Quantum State Property Estimation
-
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
-
Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling
-
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
-
ACES: Automatic Cohort Extraction System for Event-Stream Datasets
-
GaussianAnything: Interactive Point Cloud Flow Matching for 3D Generation
-
InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales
-
Structure Language Models for Protein Conformation Generation
-
GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement
-
No Free Lunch: Fundamental Limits of Learning Non-Hallucinating Generative Models
-
Learning High-Degree Parities: The Crucial Role of the Initialization
-
Unlocking the Potential of Model Calibration in Federated Learning
-
Non-myopic Generation of Language Models for Reasoning and Planning
-
Fast Direct: Query-Efficient Online Black-box Guidance for Diffusion-model Target Generation
-
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction
-
Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models
-
A Theoretical Framework for Partially-Observed Reward States in RLHF
-
ET-SEED: EFFICIENT TRAJECTORY-LEVEL SE(3) EQUIVARIANT DIFFUSION POLICY
-
Robotouille: An Asynchronous Planning Benchmark for LLM Agents
-
Towards Unified Human Motion-Language Understanding via Sparse Interpretable Characterization
-
Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits
-
Rethinking Classifier Re-Training in Long-Tailed Recognition: Label Over-Smooth Can Balance
-
Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson–Romberg Extrapolation
-
SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects
-
Vertical Federated Learning with Missing Features During Training and Inference
-
Progressive Mixed-Precision Decoding for Efficient LLM Inference
-
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
-
DoF: A Diffusion Factorization Framework for Offline Multi-Agent Reinforcement Learning
-
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models
-
DS-LLM: Leveraging Dynamical Systems to Enhance Both Training and Inference of Large Language Models
-
Self-supervised contrastive learning performs non-linear system identification
-
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
-
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
-
Resolution Attack: Exploiting Image Compression to Deceive Deep Neural Networks
-
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
-
Deep Random Features for Scalable Interpolation of Spatiotemporal Data
-
SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback
-
Erasing Concept Combination from Text-to-Image Diffusion Model
-
Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning
-
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning
-
Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics
-
Bayesian Treatment of the Spectrum of the Empirical Kernel in (Sub)Linear-Width Neural Networks
-
Few for Many: Tchebycheff Set Scalarization for Many-Objective Optimization
-
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
-
Direct Distributional Optimization for Provable Alignment of Diffusion Models
-
Equivariant Masked Position Prediction for Efficient Molecular Representation
-
Decoupled Graph Energy-based Model for Node Out-of-Distribution Detection on Heterophilic Graphs
-
From Search to Sampling: Generative Models for Robust Algorithmic Recourse
-
ProtPainter: Draw or Drag Protein via Topology-guided Diffusion
-
Learning Mask Invariant Mutual Information for Masked Image Modeling
-
Understanding Virtual Nodes: Oversquashing and Node Heterogeneity
-
RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code
-
Probabilistic Conformal Prediction with Approximate Conditional Validity
-
Measuring And Improving Persuasiveness Of Large Language Models
-
Text2PDE: Latent Diffusion Models for Accessible Physics Simulation
-
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
-
ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints
-
EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition
-
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
-
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
-
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
-
Zero-shot Imputation with Foundation Inference Models for Dynamical Systems
-
CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding
-
Dynamic Modeling of Patients, Modalities and Tasks via Multi-modal Multi-task Mixture of Experts
-
[Plastic Learning with Deep Fourier Features](iclr_src/ICLR_2025_Main_Papers