Poor performance on PRAD downstream

Hi People, 
First, thanks for your astonishing work! I am currently trying out how HIPT performs in a Grading task and recognised, performance is underwhelming using the features I extracted with the proposed two-stage HIPT4K (using the weights provided in this repo). I then realized, that prepared features are supplied here and recreated my efforts using them. Sadly, to no avail :( I think there is also another ticket with similar issues  #19. This is the scatter plot of a 2-component PCA on the slide-level (mean) TCGA-PRAD features:

![image](https://github.com/mahmoodlab/HIPT/assets/22796428/c47dd263-2616-4157-835f-43e353cda5ab)

Note, that I filtered 16 relevant features of the total 192 by calculating the Pearson-r against the referred Gleason score (that the labels in the scatter also refer to). There appears to be a bit of clustering for Gleason 7 and 9, but overall, it doesn't seem the pretrained models capture important properties. My theory is, that this is since other Gleason scores have too few examples. I have, however, already worked with SSL vision transformers for prostate cancer histopathology and found the models had good extraction capabilities. I am also aware of work that confirms good SSL feature extraction capabilities when using the TCGA-PRAD data. 

Therefore, I wanted to ask if I really got things right here. So:
1. I am using the tensors stored in ```HIPT/3-Self-Supervised-Eval/embeddings_slide_lib/embeddings_slide_lib/vit256mean_tcga_slide_embeddings```. I assume correctly that these refer to the features of each WSI's extracted 4k patches?
2. The number of tumours used differs in this repo and the paper, but the provided models HAVE indeed been pretrained on the TCGA-PRAD as well, right?

Besides my technical questions, I'd really love to hear what you think about this. I'm a little short on time currently, but if you think it's worth the effort, I'd also volunteer for adapting the approach such that it achieves sufficient results on TCGA-PRAD. Maybe working on a different magnification could already do the trick, for prostate tumours cell-level information is not of uttermost importance (AFAIK). This would be of great value for the digital pathology world :)

Kind regards
M

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Poor performance on PRAD downstream #69

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Poor performance on PRAD downstream #69

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions