Question about LLaVA fine-tuned models on LingoQA dataset

Thanks for your great work!
I can't reproduce the performance when I finetune LLaVA with LingoQA dataset.
![Image](https://github.com/user-attachments/assets/60a1ea95-dde0-4c7e-ace9-b952e3314cce)
Here please help me to align the experient setting
1. The single image selected is the middle one from the image sequence.
2. During training, is the multi-round QA or single-round QA format adopted?
3. During fine-tuning,  as the LLaVA fine-tuning approach is used, freeze the visual encoder weights and fine-tune the weights of the projection layer and the whole LLM in LLaVA.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Question about LLaVA fine-tuned models on LingoQA dataset #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Question about LLaVA fine-tuned models on LingoQA dataset #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions