K-MEAN’S CLUSTERING
EX. NO: 04
AIM:
To implement the k-means clustering using python.
ALGORITHM:
Step 1: start the code cell.
Step 2: to assign cluster value to database.
Step 3: using SQL, Read the mean value to data frame.
Step 4: create SQL command to training and prediction
Step 5: Calculating error for K values to database.
Step 6: initialize the k mean value to clustering datas.
Step 7: python code to demonstrate the k mean data.
Step 8: store all the data in the variable.
Step 9: Stop.
SOURCE CODE:
class K-means:
k: int , number of clusters
seed: int, will be randomly set if None
max_iter: int, number of iterations to run algorithm, default: 200
centroids: array, k, number_features
def __init__(self, k, seed = None, max_iter = 200):
self.k = k
self.seed = seed
if self.seed is not None:
np.random.seed(self.seed)
self.max_iter = max_iter
def initialise_centroids(self, data):
initial_centroids = np.random.permutation(data.shape[0])[:self.k]
self.centroids = data[initial_centroids]
return self.centroids
def assign_clusters(self, data):
data: array or matrix, number_rows, number_features
if data.ndim == 1:
data = data.reshape(-1, 1)
dist_to_centroid = pairwise_distances(data, self.centroids, metric = 'euclidean')
self.cluster_labels = np.argmin(dist_to_centroid, axis = 1)
return self.cluster_labels
def update_centroids(self, data):
data: array or matrix, number_rows, number_features
self.centroids = np.array([data[self.cluster_labels == i].mean(axis = 0) for i in
range(self.k)])
return self.centroids
def predict(self, data):
data: array or matrix, number_rows, number_features
return self.assign_clusters(data)
self.centroids = self.initialise_centroids(data)
self.cluster_labels = self.assign_clusters(data)
self.centroids = self.update_centroids(data)
if iter % 100 == 0:
print("Running Model Iteration %d " %iter)
print("Model finished running")
return self
Output:
VAR1 and VAR2 predict a classification
VAR1 VAR2 CLASS
0.307 0.654 1
0.089 0.528 1
0.606 0.258 1
RESULT:
Thus the python program for K-mean’s clustering was successfully executed and the
output was verified.