This is a Cog implementation of the Primser, from the paper, Prismer: A Vision-Language Model with An Ensemble of Experts.
First, download the pre-trained weights:
cog run script/download-weights
Then, you can run predictions:
cog predict -i image=...
Replicate also provides API:
import replicate
model = replicate.models.get("cjwbw/prismer")
model.predict(image=...)