-
Samsung Research
- Cambridge, UK
-
01:01
(UTC) - https://abaldrati.github.io
- in/alberto-baldrati
- @A_Baldrati
- https://scholar.google.com/citations?user=I1jaZecAAAAJ&hl=en
Stars
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
This repository contains code for the paper "Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training" by T. Bonnaire, R. Urfin, G. Biroli and M. Mézard.
Official implementation of the paper: "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models"
Official inference repo for FLUX.2 models
Mitigating Negative Flips via Margin Preserving Training
This repository contains the official implementation code of NeurIPS 2025 paper: "Instance-Level Composed Image Retrieval".
[ICCV 2025] What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
Fully Open Framework for Democratized Multimodal Training
Official PyTorch Implementation of "Vision-Free Retrieval: Rethinking Multimodal Search with Textual Scene Descriptions". Accepted at EMNLP 2025
Recurrence Meets Transformers for Universal Multimodal Retrieval
[IJCAI 2025] Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Official code for the paper "Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models" (ICLR 2025 Oral)
LorenzoAgnolucci / IISA
Forked from SonyResearch/IISA[ICCV 2025] - Image Intrinsic Scale Assessment: Bridging the Gap Between Quality and Resolution
[ICCV 2025] - Image Intrinsic Scale Assessment: Bridging the Gap Between Quality and Resolution
Official repository for "AM-RADIO: Reduce All Domains Into One"
Official code of Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning
Multi-scale Image Super Resolution with a Single Auto-Regressive Model
Single-pass Adaptive Image Tokenization for Minimum Program Search | What's the Kolmogorov Complexity of an Image?
Official implementation of the paper "Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals"
This is the official repository for the paper "Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction". ICCV 2025
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
OmniGen2: Exploration to Advanced Multimodal Generation. https://arxiv.org/abs/2506.18871
[CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations
This repo contains the official implementation of the paper "Attention, Please! Revisiting Attentive Probing Through the Lens of Efficiency"
[NeurIPS 2025 Spotlight] ReSim: Reliable World Simulation for Autonomous Driving