There is 4 clouds in the sky, but the answer said it is 6..
And If you can also have results to visualize sam seg, depth, dino mask, it would be great.
And it is now always generate the only 4 percetion tokens, is it just the optional features, did it really work, how can I force it generate all 4 perception token
