Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View LIULINKAI's full-sized avatar

Block or report LIULINKAI

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning

Python 70 2 Updated May 23, 2025
Python 30 Updated Oct 28, 2025

Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision

250 7 Updated Oct 11, 2025

[ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models

Python 65 7 Updated Feb 16, 2025

This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]

Python 450 42 Updated Oct 24, 2025

[ACM MM25] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"

Python 94 5 Updated Aug 8, 2025
Python 1,027 304 Updated Jan 29, 2023

Collection of Composed Image Retrieval (CIR) papers.

269 18 Updated Aug 18, 2025
Python 125 2 Updated Mar 22, 2025

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Python 1,720 102 Updated Oct 28, 2025

[ICCV2025] Where, What, Why: Towards Explainable Driver Attention Prediction

Python 36 1 Updated Oct 27, 2025

MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tools.

Python 342 17 Updated Aug 26, 2025

Awesome things about generative recommendation models.

98 3 Updated Apr 28, 2025

Official implementation of BLIP3o-Series

Python 1,563 69 Updated Oct 27, 2025

[NeurIPS 2025] Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.

Python 65 3 Updated Sep 20, 2025

Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)

Python 166 16 Updated Oct 1, 2024
Python 84 3 Updated Jun 7, 2024
Python 93 4 Updated Aug 14, 2025

Collection of AWESOME vision-language models for vision tasks

2,983 221 Updated Oct 14, 2025

SkyRover, a modular and extensible simulator tailored for cross-domain pathfinding research.

Python 9 2 Updated Mar 4, 2025

[CVPR' 25] Interleaved-Modal Chain-of-Thought

Python 90 4 Updated Oct 23, 2025

Physical simulation of Marsupial UAV-UGV Systems Connected by a Variable-Length Hanging Tether

Python 32 7 Updated Aug 3, 2025

This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.

Python 1,379 65 Updated Aug 4, 2025

[Fully open] [Encoder-free MLLM] Vision as LoRA

Python 341 29 Updated Jun 12, 2025

✨✨Latest Advances on Multimodal Large Language Models

16,575 1,069 Updated Oct 30, 2025

A PyTorch implementation of ACRNet based on ICME 2023 paper "Weakly-supervised Temporal Action Localization with Adaptive Clustering and Refining Network"

Python 15 1 Updated Aug 29, 2023

[🚀ICML 2025] "Taming Rectified Flow for Inversion and Editing" Using FLUX and HunyuanVideo for image and video editing!

Python 592 15 Updated May 1, 2025

[CVPR'25]Tora: Trajectory-oriented Diffusion Transformer for Video Generation

Python 1,208 56 Updated Jul 9, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 14,583 2,094 Updated Jul 17, 2025

[WWW 2025] Official PyTorch Code for "CTR-Driven Advertising Image Generation with Multimodal Large Language Models"

Python 58 4 Updated Aug 3, 2025
Next