Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View countytown's full-sized avatar

Highlights

  • Pro

Block or report countytown

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)

Python 256 38 Updated Oct 6, 2025

Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities

1,081 52 Updated Jul 15, 2025

[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language

Python 656 38 Updated Oct 22, 2024

ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind (AAAI2025)

Python 16 3 Updated Apr 16, 2025

Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.

Python 496 28 Updated Aug 14, 2025
Python 78 5 Updated Mar 10, 2025

We propose MMAD, a novel automated pipeline for precise AD generation. MMAD introduces ambient music alongside visual and linguistic, enhancing the model's multimodal representation learning throug…

Python 15 2 Updated Dec 31, 2024

repo for active speaker detection for media videos.

Python 29 1 Updated Nov 19, 2023

an optimized, production-ready implementation of active speaker detection

Python 72 19 Updated May 29, 2024

MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

Python 32 2 Updated Jan 23, 2025

Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.

Jupyter Notebook 91 8 Updated Oct 18, 2023

(ICLR'25) A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

Python 87 3 Updated Feb 1, 2025
Python 58 6 Updated Jun 20, 2024

Multi-model analysis of sentiment and emotion in multi-speaker conversations.

Jupyter Notebook 27 6 Updated Jul 6, 2023

Multimodal Empathetic Chatbot

Python 51 7 Updated Jul 16, 2024

[NAACL 2025] The implementation of paper "Hello Again! LLM-powered Personalized Agent for Long-term Dialogue".

Python 62 3 Updated May 2, 2025

Wellbeing and Emotion Prediction (NeurIPS 2022)

Python 9 Updated Oct 19, 2022

A self-ailgnment method for role-play. Benchmark for role-play. Resources for "Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment".

Jupyter Notebook 204 18 Updated May 28, 2024

Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent

Python 386 25 Updated Apr 22, 2025

RoleInteract: Evaluating the Social Interaction of Role-Playing Agents

Python 61 5 Updated Oct 12, 2024

This repository contains a multi-agent chat application built using the autogen library. The application sets up various conversational agents with distinct personas and allows them to engage in gr…

Python 6 1 Updated Dec 7, 2024

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3,617 300 Updated Oct 20, 2025

[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding

Python 114 3 Updated Dec 10, 2024

The implementation of FINER-MLLM, which is accepted by MM2024.

Python 15 1 Updated Oct 8, 2024

[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei Li, Sishuo Chen, Xu Sun, Lu Hou

Python 125 4 Updated Apr 4, 2025

This repository contains a curated list of research papers and resources focusing on saliency and scanpath prediction, human attention, human visual search.

58 3 Updated May 9, 2025

[ICCV'23] Official PyTorch implementation for paper "Exploring Predicate Visual Context in Detecting Human-Object Interactions"

Python 85 13 Updated Jul 4, 2024

[ECCV 2020] DRG: Dual Relation Graph for Human-Object Interaction Detection

Python 70 19 Updated Mar 11, 2022

Method for GAZE 2021 Competition on EVE dataset

Python 33 2 Updated Aug 2, 2021
Next