Thank you very much for your excellent work on CoVT!
I would like to confirm whether it is possible to use the Hugging Face model Wakals/CoVT-7B-seg to perform image segmentation in the following way:
- Input: an image and an target object
- Process: obtain the segmentation tokens from the model for the specified object
- Output: decode the segmentation tokens to obtain the mask for the object in the input image
Thank you very much for your time and help!