Thank you for this excellent work! I have a few questions I hope the authors can clarify.
In the “Robotic Navigation” subsection on page 10 of the arXiv preprint, NVILA is proposed as a new baseline that can be fine-tuned.
- Could LoRA (or similar parameter-efficient methods) be used for fine-tuning?
- Would you be willing to share the training script to help the community reproduce and build on your results?
- Is the dataset the same one employed in your prior work, NaVILA?
Looking forward to your reply.