Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Documentation is wrong about KMeans.inertia_ #16594

@Volker-Weissmann

Description

@Volker-Weissmann

In the documentation , you wrote:

The algorithm supports sample weights, which can be given by a parameter sample_weight. This allows to assign more weight to some samples when computing cluster centers and values of inertia. For example, assigning a weight of 2 to a sample is equivalent to adding a duplicate of that sample to the dataset X.

This is disproved by the following code:

from sklearn.cluster import KMeans
vecs = [[0], [1]]
weights = [2 , 2]
kmeans = KMeans(n_clusters=1)
kmeans.fit(vecs, sample_weight=weights)
print(kmeans.inertia_) #Outputs 0.5

vecs = [[0], [0], [1], [1]]
weights = [1 , 1, 1, 1]
kmeans = KMeans(n_clusters=1)
kmeans.fit(vecs, sample_weight=weights)
print(kmeans.inertia_) #Outputs 1.0

Suggested fix:
Explain the correct definition of the inertia to this site and this site . I would suggest adding the weights to the equation on this site .

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions