Has anyone tested step3(model name) with vlmEvalKit? I found from the logs that the model can not received images.