-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
In the documentation , you wrote:
The algorithm supports sample weights, which can be given by a parameter sample_weight. This allows to assign more weight to some samples when computing cluster centers and values of inertia. For example, assigning a weight of 2 to a sample is equivalent to adding a duplicate of that sample to the dataset X.
This is disproved by the following code:
from sklearn.cluster import KMeans
vecs = [[0], [1]]
weights = [2 , 2]
kmeans = KMeans(n_clusters=1)
kmeans.fit(vecs, sample_weight=weights)
print(kmeans.inertia_) #Outputs 0.5
vecs = [[0], [0], [1], [1]]
weights = [1 , 1, 1, 1]
kmeans = KMeans(n_clusters=1)
kmeans.fit(vecs, sample_weight=weights)
print(kmeans.inertia_) #Outputs 1.0
Suggested fix:
Explain the correct definition of the inertia to this site and this site . I would suggest adding the weights to the equation on this site .