Possible bug in plotSimilarityMatrix

Hi, I'm testing this tool and I find it very interesting; however, I'm having a little problem (I am not sure if this is a bug or if I am missing something).

I have a similarity matrix that I've calculated by applying the Jaccard similarity to my data. In R this matrix is stored in a data frame, where equal individuals have a similarity of 1, and completely distinct individuals have a similarity of 0. I am using the function `plotSimilarityMatrix` and It seems to be correct:
![imagen](https://user-images.githubusercontent.com/11376639/161104662-083b4297-837d-42d3-9171-bf7d0944d0dc.png)

Nonetheless, I tried to recreate the clustering by using hclust. This library needs a dist object, so I did `1 - my similarity matrix` so that a similarity of 1 is translated into a distance of 0, and a similarity of 0 is translated into a distance of  1, and I did `as.dist(myDistanceMatrix) `in order to get a dist object to use with hclust. I used the default parameters for hclust (euclidean distance and complete method), however, the resulting clustering  is not as nice as I got before:
![imagen](https://user-images.githubusercontent.com/11376639/161106269-39a64493-bf77-441d-8472-3e82d9b8c2e0.png)

I do not know which cluster is the correct one, but I have checked the code of the function `plotSimilarityMatrix` and it is using the pheatmap library. If I am not wrong, the similarity matrix received as input by `plotSimilarityMatrix` is passed to pheatmat. I dived into the pheatmap function and I saw the following code used for calculating the dendrogram:

```
cluster_mat = function(mat, distance, method){
    if(!(method %in% c("ward.D", "ward.D2", "ward", "single", "complete", "average", "mcquitty", "median", "centroid"))){
        stop("clustering method has to one form the list: 'ward', 'ward.D', 'ward.D2', 'single', 'complete', 'average', 'mcquitty', 'median' or 'centroid'.")
    }
    if(!(distance[1] %in% c("correlation", "euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski")) & class(distance) != "dist"){
        stop("distance has to be a dissimilarity structure as produced by dist or one measure  form the list: 'correlation', 'euclidean', 'maximum', 'manhattan', 'canberra', 'binary', 'minkowski'")
    }
    if(distance[1] == "correlation"){
        d = as.dist(1 - cor(t(mat)))
    }
    else{
        if(class(distance) == "dist"){
            d = distance
        }
        else{
            d = dist(mat, method = distance)
        }
    }
    
    return(hclust(d, method = method))
}
```
This code checks if the type of the input matrix is a dist object. I think, in this case this would never be a dist object because the function `plotSimilarityMatrix` is expecting a similarity matrix, not a dissimilarity one. Thus, the above function from pheatmat assumes that the input matrix contains data, not distances, and it calculates a distance matrix through `d = dist(mat, method = distance)` Then, the clustering appearing in the plot from `plotSimilarityMatrix` is resulting from calculating the distance among the elements from the input similarity matrix.

Am I correct? I wish I've misunderstood something because I really like the first plot provided by your library, much more than the one I obtained after by applying hclust.

Kind regards,
Francisco Abad.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible bug in plotSimilarityMatrix #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Possible bug in plotSimilarityMatrix #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions