-
NUS
- Singapore
- https://orcid.org/0009-0009-1121-258X
Highlights
- Pro
Stars
Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.
[NeurIPS 2025] Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
[ICCV'25 Highlight] Derm1M: A Million‑Scale Vision‑Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology
[MICCAI‘25 Early Accept] MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
[CVPR 2025] Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
[EMNLP 2025 Industry] Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning
【CVPR 2025 Highlight】MonSter: Marry Monodepth to Stereo Unleashes Power
Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences (ICML 2025)
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment (CVPR 2025 Highlight)
Offical Repo of "Rethinking Brain Tumor Segmentation from the Frequency Domain Perspective" (IEEE TMI 2025)
A paper list for medical anomaly detection. ℱℯℯ𝓁 𝒻𝓇ℯℯ to contribute!
[ICLR 2025] NextBestPath: Efficient 3D Mapping of Unseen Environments
Offical implementation of "Auto-Regressively Generating Multi-View Consistent Images". (ICCV 2025)
[arXiv 2025] Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps
[TMLR 2025] Efficient Reasoning Models: A Survey
[AAAI'25] DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis
【ICLR 2025 🔥】The code for Consistent In-Context Editing, an approach for tuning language models through contextual distributions, overcoming the limitations of traditional fine-tuning methods that …
[ICCV2025] PyTorch implementation of "Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models"
Code repo for "Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning"
[MICCAI 2025] Adaptively Distilled ControlNet: Accelerated Training and Superior Sampling for Medical Image
[NeurIPS 2025 spotlight] Official implementation for "FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving"
[JAG 2022] Multitask consistency network with single temporal supervision for semi-supervised building change detection
[TGRS 2024] CutMix-CD: Advancing Semi-Supervised Change Detection via Mixed Sample Consistency
[ACL'25] UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench