-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Hi authors, thanks for your amazing work which contributes to long video understanding a lot!
I'm repeating your experiments on LLaVA-NeXT-Video. I meet some problems and would like to know how you are solving them.
- Would you mind providing details on which LLaVA-NeXT-Video model you are testing on, lmms-lab/LLaVA-NeXT-Video-34B-DPO or lmms-lab/LLaVA-NeXT-Video-7B-DPO or models without DPO?
- I experiment with lmms-lab/LLaVA-NeXT-Video-7B-DPO at first and find that current instruction didn't request the assistant to answer with one exact option. They sometimes reply with a paragraph of reasoning, which makes extracting answers not easy. Would you mind provide your answer extraction script or shed some light on how to evaluate based on raw response? (Or are you evaluating in a perplexity-based mode? like connecting four options with the instruction seperately to see which one has lower perpleixity.)
- When I'm evaluating on LVBench with the official repo of LLaVA-NeXT-Video, I find that some videos cannot be read in because decord library does not support AV1 codec currently. I edit video2dataset package as in this issue and re-download LVBench videos using your
download.sh
, but there are still four videos fail to be processed. I really appreciate it if you can share experiment details like how you are downloading and dealing with AV1 codec stuff. - There's a very interesting point mentioned in section 4.4 in the paper about "using large language models (LLMs) to filter question-answer pairs". But I'm a little bit confused about what LLM filtering means. Is it like only provide the instruction and question as input without the video and check which option will the LLM guess to choose? But it's hard to understand that this method get a even higher score than input with video. Would you mind making a simple clarify on this interesting discovery?
Thanks in advance for your helpful reply.
Metadata
Metadata
Assignees
Labels
No labels