Hi, thanks a lot for sharing this great work and the open-source code!
I saw that video finetune script is in the todo list. I want to use the video with SoM/ToM to finetune the model. Just wondering is there an estimated timeline for when that part of the code might be released?
Thanks!