Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View patrick-tssn's full-sized avatar

Block or report patrick-tssn

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
patrick-tssn/README.md

Hey There 🎸

  • 🌱 I am Yuxuan Wang (汪宇轩), a research engineer at Qwen team, Alibaba. I completed my Master's degree at Peking University (PKU).
  • 🔭 I am keen to explore “o” for “omni”.
  • 🤝 I am continually open to all forms of collaborative opportunities.

Pinned Loading

  1. Qwen3-Omni Qwen3-Omni Public

    Forked from QwenLM/Qwen3-Omni

    Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

    Jupyter Notebook

  2. OmniMMI/M4 OmniMMI/M4 Public

    [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts

    Python 12

  3. bigai-nlco/VideoLLaMB bigai-nlco/VideoLLaMB Public

    [ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges

    Python 77 2

  4. VideoHallucer VideoHallucer Public

    VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)

    Python 38

  5. bigai-nlco/VideoTGB bigai-nlco/VideoTGB Public

    [EMNLP 2024] A Video Chat Agent with Temporal Prior

    Python 32 3

  6. VSTAR VSTAR Public

    [ACL 2023] VSTAR is a multimodal dialogue dataset with scene and topic transition information

    Python 15 2