Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Possible bug in plotSimilarityMatrix #3

@fanavarro

Description

@fanavarro

Hi, I'm testing this tool and I find it very interesting; however, I'm having a little problem (I am not sure if this is a bug or if I am missing something).

I have a similarity matrix that I've calculated by applying the Jaccard similarity to my data. In R this matrix is stored in a data frame, where equal individuals have a similarity of 1, and completely distinct individuals have a similarity of 0. I am using the function plotSimilarityMatrix and It seems to be correct:
imagen

Nonetheless, I tried to recreate the clustering by using hclust. This library needs a dist object, so I did 1 - my similarity matrix so that a similarity of 1 is translated into a distance of 0, and a similarity of 0 is translated into a distance of 1, and I did as.dist(myDistanceMatrix) in order to get a dist object to use with hclust. I used the default parameters for hclust (euclidean distance and complete method), however, the resulting clustering is not as nice as I got before:
imagen

I do not know which cluster is the correct one, but I have checked the code of the function plotSimilarityMatrix and it is using the pheatmap library. If I am not wrong, the similarity matrix received as input by plotSimilarityMatrix is passed to pheatmat. I dived into the pheatmap function and I saw the following code used for calculating the dendrogram:

cluster_mat = function(mat, distance, method){
    if(!(method %in% c("ward.D", "ward.D2", "ward", "single", "complete", "average", "mcquitty", "median", "centroid"))){
        stop("clustering method has to one form the list: 'ward', 'ward.D', 'ward.D2', 'single', 'complete', 'average', 'mcquitty', 'median' or 'centroid'.")
    }
    if(!(distance[1] %in% c("correlation", "euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski")) & class(distance) != "dist"){
        stop("distance has to be a dissimilarity structure as produced by dist or one measure  form the list: 'correlation', 'euclidean', 'maximum', 'manhattan', 'canberra', 'binary', 'minkowski'")
    }
    if(distance[1] == "correlation"){
        d = as.dist(1 - cor(t(mat)))
    }
    else{
        if(class(distance) == "dist"){
            d = distance
        }
        else{
            d = dist(mat, method = distance)
        }
    }
    
    return(hclust(d, method = method))
}

This code checks if the type of the input matrix is a dist object. I think, in this case this would never be a dist object because the function plotSimilarityMatrix is expecting a similarity matrix, not a dissimilarity one. Thus, the above function from pheatmat assumes that the input matrix contains data, not distances, and it calculates a distance matrix through d = dist(mat, method = distance) Then, the clustering appearing in the plot from plotSimilarityMatrix is resulting from calculating the distance among the elements from the input similarity matrix.

Am I correct? I wish I've misunderstood something because I really like the first plot provided by your library, much more than the one I obtained after by applying hclust.

Kind regards,
Francisco Abad.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions