countytown

rookie_lh countytown

1 follower · 1 following

Highlights

Stars

sotopia-lab / sotopia

Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)

Python 256 38 Updated Oct 6, 2025

MoonshotAI / Kimi-VL

Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities

1,081 52 Updated Jul 15, 2025

csuhan / OneLLM

[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language

Python 656 38 Updated Oct 22, 2024

nttmdlab-nlp / ToMATO

ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind (AAAI2025)

Python 16 3 Updated Apr 16, 2025

bytedance / tarsier

Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.

Python 496 28 Updated Aug 14, 2025

hyc2026 / StoryTeller

Python 78 5 Updated Mar 10, 2025

Daria8976 / MMAD

We propose MMAD, a novel automated pipeline for precise AD generation. MMAD introduces ambient music alongside visual and linguistic, enhancing the model's multimodal representation learning throug…

Python 15 2 Updated Dec 31, 2024

rash1993 / movie-asd

repo for active speaker detection for media videos.

Python 29 1 Updated Nov 19, 2023

sieve-community / fast-asd

an optimized, production-ready implementation of active speaker detection

Python 72 19 Updated May 29, 2024

xiaobai1217 / Awesome-Video-Datasets

Video datasets

1,538 108 Updated Mar 8, 2023

SCAI-JHU / MuMA-ToM

MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

Python 32 2 Updated Jan 23, 2025

FrenchKrab / IS2023-powerset-diarization

Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.

Jupyter Notebook 91 8 Updated Oct 18, 2023

YanqiDai / MMRole

(ICLR'25) A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

Python 87 3 Updated Feb 1, 2025

ZhangYiqun018 / StickerConv

Python 58 6 Updated Jun 20, 2024

peymanbateni / multimodal-emotion-analysis-in-conversations

Multi-model analysis of sentiment and emotion in multi-speaker conversations.

Jupyter Notebook 27 6 Updated Jul 6, 2023

scofield7419 / EmpathyEar

Multimodal Empathetic Chatbot

Python 51 7 Updated Jul 16, 2024

leolee99 / LD-Agent

[NAACL 2025] The implementation of paper "Hello Again! LLM-powered Personalized Agent for Long-term Dialogue".

Python 62 3 Updated May 2, 2025

hendrycks / emodiversity

Wellbeing and Emotion Prediction (NeurIPS 2022)

Python 9 Updated Oct 19, 2022

OFA-Sys / Ditto

A self-ailgnment method for role-play. Benchmark for role-play. Resources for "Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment".

Jupyter Notebook 204 18 Updated May 28, 2024

Alibaba-NLP / OmniSearch

Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent

Python 386 25 Updated Apr 22, 2025

X-PLUG / SocialBench

RoleInteract: Evaluating the Social Interaction of Role-Playing Agents

Python 61 5 Updated Oct 12, 2024

chanirban / autogen-multiagent-conversation

This repository contains a multi-agent chat application built using the autogen library. The application sets up various conversational agents with distinct personas and allows them to engage in gr…

Python 6 1 Updated Dec 7, 2024

NVlabs / VILA

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3,617 300 Updated Oct 20, 2025

gyxxyg / VTG-LLM

[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding

Python 114 3 Updated Dec 10, 2024

xianzhangzx / FINER-MLLM

The implementation of FINER-MLLM, which is accepted by MM2024.

Python 15 1 Updated Oct 8, 2024

llyx97 / TempCompass

[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei Li, Sishuo Chen, Xu Sun, Lu Hou

Python 125 4 Updated Apr 4, 2025

aimagelab / awesome-human-visual-attention

This repository contains a curated list of research papers and resources focusing on saliency and scanpath prediction, human attention, human visual search.

58 3 Updated May 9, 2025

fredzzhang / pvic

[ICCV'23] Official PyTorch implementation for paper "Exploring Predicate Visual Context in Detecting Human-Object Interactions"

Python 85 13 Updated Jul 4, 2024

vt-vl-lab / DRG

[ECCV 2020] DRG: Dual Relation Graph for Human-Object Interaction Detection

Python 70 19 Updated Mar 11, 2022

bjj9 / EVE_SCPT

Method for GAZE 2021 Competition on EVE dataset

Python 33 2 Updated Aug 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly