Practical File
Of
Machine Learning
Submitted to: Submitted by:
Er.Zubair Fayaz Name: Diljeet Singh
Dept. of Computer Science & Engineering Class: B-Tech (Sem-4) AI-ML
AUID: 237106007
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
AKAL UNIVERSITY, TALWANDI SABO
May 2025
TABLE OF CONTENT
1
S. No. Name Of Practical Page No.
1. Implementation of vector algebra in 3-5
machine learning
2. Implementation of matrix algebra in 6-9
machine learning
3. Implementation of various data 10-25
preprocessing steps in Python
4. Implementation of Simple linear 26-36
regression in Python.
5. Implementation of Multiple linear 37-42
regression in Python.
6. Implementation of Support Vector 43-45
Machine using python.
7. Implementation of Decision Tree 46-49
Regression using python
8. Implementation of Random forest 50-53
classification using pythons
9. Implementation of Random forest 54-55
regression using python
10. Implementation of Logistic 56-59
Regression using python
11. Implementation of KNN regression 60-61
in python
12. Implementation of Clustering with 62-64
k-means in python
2
13. Implementation of agglomerative 65-67
hierarchal clustering in python *
14. Implementation of Naïve Bayes 68-70
using Pyton
15. Implementation of Hierarchical 71-72
clustering using Pyton
16. Implementation of Ridge and Lasso 73-74
Regression using Pyton
17. Implementation of DBSCAN using 75-76
Pyton
18. Implementation of K-Mean 77-80
Clustering using Pyton
2
Practical: 1
AIM: Implementation of vector algebra using python.
#import necessary libraries
importnumpy as np 1.
#create a vector
a=np.array([2,3,7])
b=np.array([1,2,3])
Output :
3
2. #vector addition
print(a+b) Output :
3. #vector subtraction
print(a-b) Output :
4. #vector
multiplication
print(a*b) Output :
5. #vector division
print(a/b) Output :
4
6. #vector scaler
multiplication
print(5*a) Output :
7. #vector exponential
print(a**b) Output
8. #vector dot product
print(np.dot(a,b))
Output :
9. #cross product
print(np.cross(a,b))
Output :
5
10. #vector norm
print(np.linalg.norm(
a)) Output :
Practical: 2
AIM: Implementation of basic matrix algebra using python.
1. # create a matrix Import
numpy as np
a=np.array([[1,3,5], [2,4,7],
[4,9,2]]) b=np.array([[1,2,3],
[4,5,6], [7,8,9]]) print (a) print
(b)
Output:
6
2. #addition
print(np.add(a,b)) Output:
3. #subtraction
print(np.subtract(a,b)) Output:
4. #matrix scaler
multiplication
7
print(np.multiply(5,a))
Output:
5. #vactor multiplication
v=np.array([[1],[3],[3]])
print(np.dot(a,v)) Output:
6. #matrix multiplication
print(np.matmul(a,b))
print(np.dot(a,b)) result=a@b
print(result) Output:
8
7.#determinant
print(np.linalg.det(a)) Output:
8. #transpose
print(np.transpose(a))
Output:
9
9. #inverse
print(np.linalg.inv(a))
Output:
Practical -3
Aim: Implementation of various data preprocessing steps in Python.
Description:- 1)Importing the libraries.
2)Importing the dataset
3)Taking care of missing data.
10
4)Encoding the Categorical data(description)
5)Feature Scaling(Normalization and Standardization)
#importing the dataset import
pandas as pd
data = pd.read_csv("employees.csv") data
#checking missing data data.isnull()
#check number of null values in each column
data.isnull().sum()
11
#total number of null values in the dataset data.isnull().sum().sum()
#total number of not null values in the dataset
data.notnull().sum().sum()
#drop all the missing values data.dropna()
#drop null values from a particular
column data["Gender"].dropna() #dropping
a row data.drop("Gender",axis=1)
#dropping a column
data.drop(0, axis=0)
#using dropna(how=?)
data.dropna().any()
data.dropna().all()
12
#dropping the row on basis of particular keyword
data[data["Team"].str.contains("Marketing") == False]
#using fillna import
numpy as np
data.fillna(50)
#data.fillna(method= 'pad')
#data['Team'].fillna(method= 'bfill', inplace= True)
data.replace(to_replace=np.NaN, value=50) data.head(5)
13
dict = {'FirstScore':[100, 90, np.nan, 95],
'SecondScore': [30, 45, 56, np.nan],
'ThirdScore':[np.nan, 40, 80, 98]}
# creating a dataframe from list
df = pd.DataFrame(dict) m=
df['FirstScore'].mean()
df['FirstScore'].fillna(m)
14
df.interpolate(method ='linear', limit_direction ='forward')
Encoding the Categorical data.
Description: - Encoding categorical data is a common task in machine learning and
data analysis, especially when working with algorithms that require numerical
input.
• One-hot encoding(1,0 form)
Description: - This method creates binary columns for each category and assigns a
1 or 0 to indicate the presence or absence of a category. For example, if you have
the same "red," "green," and "blue" categories, one-hot encoding would create
three columns: "red," "green," and "blue,"
#encoding categorical data import
category_encoders as ce
dict = {'City':['Delhi','Chennai','bangalore','Hyderabad','Jammu']}
df= pd.DataFrame(dict) df
encoder = ce.OneHotEncoder(cols="City", handle_unknown= True) encoded_data
= encoder.fit_transform(df)
encoded_data
OneHot= pd.get_dummies(df["City"])
OneHot
15
Merge= pd.concat([df, OneHot], axis=1)
#Merge
Merge.drop(["City"], axis=1)
#Merge
• Dummy encoding
Description: - Similar to one-hot encoding but drops one of the columns to avoid
multicollinearity. This is often used when building linear models to avoid
redundancy in the encoded variables.
#dummy encoding
16
dummy = pd.get_dummies(df["City"]) dummy
= dummy.drop("Delhi",axis=1) dummy
#effect encoding or deviation or sum encoding
effect = ce.sum_coding.SumEncoder(df["City"])
encoded_data=effect.fit_transform(df) encoded_data
• Label encoding
Description:- This involves assigning a unique integer to each category. For
example, if you have categories like "red," "green," and "blue," you could encode
them as 0, 1, and 2, respectively.
#label encoding
from sklearn.preprocessing import LabelEncoder le
= LabelEncoder()
dict = {'City':['Delhi','Chennai','bangalore','Chennai','Hyderabad','Jammu']}
df1= pd.DataFrame(dict) df1
df1["City_Label"]= le.fit_transform(df1["City"]) df1
17
• Ordinal encoding
Description:- This is suitable when there's an inherent order or hierarchy among the
categories. For instance, if you have categories like "low," "medium," and "high,"
you can assign them ordinal values like 1, 2, and 3, respectively.
#ordinal encoding
from sklearn.preprocessing import OrdinalEncoder encoder
= OrdinalEncoder()
dict = {'T-Shirt size':['Small','Medium','Large']}
df2= pd.DataFrame(dict) df2
encoded_data= encoder.fit_transform(df2) encoded_data
• Binary encoding
18
Description:- This method converts categories into binary digits and then splits
those digits into separate columns. It reduces the number of columns compared to
one-hot encoding while still preserving the information.
#binary encoding from category_encoders
import BinaryEncoder BE =
BinaryEncoder()
encoded_data = BE.fit_transform(df2) encoded_data
• Count encoding
Description:- Count encoding is a method used to transform categorical variables
into numerical representations based on the frequency of each category in the
dataset. It replaces each category with the number of times it appears in the dataset.
This encoding technique is particularly useful for high-cardinality categorical
variables, where one-hot encoding might lead to a high-dimensional sparse matrix.
#count encoding from category_encoders
import CountEncoder CE =
CountEncoder()
df3 = pd.DataFrame({'fruits':['Apple','banana','Cherry', 'Apple', 'Cherry']}) df3
a = df3['fruits'].value_counts() df3['fruits'].map(a)
encoded_data = CE.fit_transform(df3) encoded_data
19
• BaseN encoding
Description:- BaseN encoding is a method used to represent numerical data in a
different base system, such as binary, octal, or hexadecimal. In this encoding, each
digit in the original number is represented by a character in the chosen base system.
#baseN encoding import category_encoders as ce
dict = {'City':['Delhi','Chennai','bangalore','Chennai','Hyderabad','Jammu']}
df1= pd.DataFrame(dict) df1
encoder = ce.BaseNEncoder(cols=['City'], base=5, return_df=True)
encoded_data = encoder.fit_transform(df1) encoded_data
• Target encoding
Description:- Target encoding, also known as mean encoding or likelihood
encoding, is a method used to encode categorical variables into numerical
20
representations based on the target variable. In target encoding, each category of
the categorical variable is replaced with the mean (or another statistic) of the target
variable for that category. #target encoding import pandas as pd import
category_encoders as ce
car1={ "cars":"bmw", "price":20 } car2={ "cars":"audi",
"price":30 } list=[] for i in range(10000):
list.append(car1) for i
in range(10000):
list.append(car2)
21
df=pd.DataFrame(list)
df
ce.TargetEncoder().fit_transform(df["cars"],df['price'])
➢ Feature scaling
Feature scaling is a preprocessing technique used in machine learning to
standardize the range of independent variables or features of the dataset. It ensures
that all features have the same scale, which can be crucial for certain algorithms to
perform effectively, particularly those based on distance calculations or gradient
descent optimization.
Min-Max Scaling (Normalization)
Description:- Min-max scaling, also known as normalization, is a technique used in
data preprocessing to scale numeric features to a specific range. This process
involves transforming the data such that it falls within a pre-defined interval,
typically [0, 1] or [-1, 1].
#min max scaling
from sklearn.preprocessing import MinMaxScaler scaler
= MinMaxScaler()
dict= {'weight in grams': [500, 400, 300, 700, 800], 'price in dollars': [10, 8, 5, 12,
15]}
df5 = pd.DataFrame(dict) df5
22
import pandas as pd
data=pd.read_csv("pima-indians-diabetes.data.csv")
df6 = pd.DataFrame(data) scaled_data =
scaler.fit_transform(df6)
#labels = ('a','b','c','d','e','f','g','h','i')
frame = pd.DataFrame(scaled_data, columns=df6.columns) frame
Standardization
Description:- Standardization, also known as z-score normalization, is another data
preprocessing technique used to scale numeric features. Unlike min-max scaling,
standardization does not bound the data to a specific range like [0, 1] or [-1, 1].
Instead, it centers the data around the mean and scales it based on the standard
deviation. This results in transformed data with a mean of 0 and a standard
deviation of 1.
from sklearn .preprocessing import StandardScaler rescaled_data
=StandardScaler().fit_transform(df6) print(rescaled_data)
23
24
PRACTICAL-4
AIM: Implementation of Simple linear regression in Python.
Simple linear regression: Aims to find a linear relationship to
describe the correlation between an independent and possibly
dependent variable.
import pandas as pd
import pandas as pd
import matplotlib.pyplot as plt
d=pd.read_csv("/content/placement and cgpa.csv") print(d)
plt. scatter(d['cgpa'],
d['package']) plt. xlabel('cgpa')
plt. ylabel('package')
25
x=d.iloc[:, 0:1]
print(x)
y=d.iloc[:, 1:2]
26
print(y)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test =train_test_split(x, y, test_size=0.2)
print(x_train) print(x_test)
print(y_train)
MAKING OUR MODEL USING ALWAYS TRAINING DATASET AND THIS IS WHERE
ACTUALLY OUR MACHINE IS LEARNING!
from sklearn.linear_model import
LinearRegression lr=LinearRegression ()
lr.fit(x_train, y_train)
27
PRIDICTIONS ON TRAINING DATASET
import pandas as pd
predictions = lr.predict(x_train)
pre_d=pd.DataFrame (predictions, columns=['predictions'])
print(pre_d)
print(x_train)
28
print(y_train)
VISUALIZATION ON TRAINING DATASET
plt.scatter(d['cgpa'], d['package'])
plt.plot(x_train, lr.predict (x_train), color=
'red')
plt.xlabel('cgpa')
plt. ylabel('package(in lpa)) '
29
PREDICTIONS ON TEST DATASET
lr.predict(X_test.iloc[0].values.reshape(1,1))
# Visualization on Test Dataset
plt.scatter(df['cgpa'],df['package']) plt.plot(X_test,lr.predict(X_test),color='green') #It's
predicting the target variable values #for each input feature in X_train.
plt.xlabel('CGPA') plt.ylabel('Package(in
lpa)')
30
# DOING RANDOM PREDICTIONS FOR TESTING OUR MODEL
m=lr.coef_
print(m)
b=lr.intercept_
print(b)
y=m*3.58+b
print(y)
31
EVALUATION METRics
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score
y_pred=lr.predict(x_test) print(y_pred)
y_test.values
32
print("MAE",mean_absolute_error(y_test,y_pred))
print("MSE",mean_squared_error(y_test,y_pred))
multiple regression models the linear relationship between single dependent
variable and ore than one independent variable import pandas as pd
d=pd.read_csv("/content/50_Startups.csv")
print(d)
print("RMSE",np.sqrt(mean_squared_error(y_test,y_pred)))
33
print("R2 Score"
,r2_score
(y_test,y_pred
))
# Assuming y_test and y_pred are your actual and predicted values respectively
# Calculate Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
# Calculate the variance of the actual target values variance_y_test
= np.var(y_test)
# Calculate Relative MSE
relative_mse = mse / variance_y_test
print("Relative MSE:", relative_mse)
rmse = np.sqrt(mean_squared_error(y_test,y_pred))
print(rmse)
mean_y_test = np.mean(y_test)
mean_y_test
CV= rmse/ mean_y_test
CV
34
PRACTICAL-5
Aim: Implementation of Multiple linear Regression in python
import pandas as pd import
matplotlib.pyplot as plt df =
pd.read_csv('50_Startups.csv')
df.head()
Multiple Linear Regression is one of the important regression algorithms which
models the linear relationship between a single dependent continuous variable and
more than one independent variable. Example: Prediction of CO2 emission based
on engine size and number of cylinders in a car.
x=d.iloc[:, 0:4]
print(x)
y=d.iloc[:, 4:5]
print(y)
35
import pandas as pd
import matplotlib.pyplot as plt plt.
scatter(d['R&D Spend'], d['Profit'])
plt. xlabel('R&D Spend')
plt. ylabel('Profit')
plt. scatter(d['Administration'],
d['Profit']) plt. xlabel('Administration')
plt. ylabel('Profit')
36
plt. scatter(d['Marketing Spend'],
d['Profit']) plt. xlabel('Marketing Spend')
plt. ylabel('Profit')
plt. xlabel('State') plt.
37
scatter(d['State'], d['Profit'])
plt. ylabel('Profit')
Handling Categorical Variables
import category_encoders as ce import
pandas as pd
dp=pd.get_dummies(data=d,drop_first=True)
print(dp)
38
from sklearn.model_selection import train_test_split
# splitting the data
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42) from
sklearn.linear_model import LinearRegression
# creating an object of LinearRegression class
LR = LinearRegression()
# fitting the training data
LR.fit(x_train,y_train)
y_prediction = LR.predict(x_test) y_prediction
coefficients = LR.coef_ intercept
= LR.intercept_
print(coefficients)
print(intercept)
import numpy as np
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error
# predicting the accuracy score
39
score=r2_score(y_test,y_prediction)
print('r2 socre is ',score)
print('mean_sqrd_error is==',mean_squared_error(y_test,y_prediction))
print('root_mean_squared error of is==',np.sqrt(mean_squared_error(y_test,y_prediction)))
40
Practical-6
Aim-Implementation of Support vector Regression Using Python
41
42
43
Practical -7
Aim-Implementation of Decision Tree Regression using python
44
45
46
Practical: 8
Aim: Implementation of random forest classification using pythons
tep:1 Import the necessary libraries.
import numpy as np import pandas as pd Step:
2 Load dataset from sklearn.datasets import
load_breast_cance r data =
load_breast_cancer() data.data
data.feature_names
data.target
47
data.target_names
df = pd.DataFrame(np.c_[data.data, data.target],
columns=[list(data.feature_names)+['target']]) df.head()
df.tail() df.shape
Step: 4 Split Data X
= df.iloc[:, 0:-1] y =
df.iloc[:, -1]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print('Shape of X_train = ', X_train.shape) print('Shape of y_train = ', y_train.shape)
print('Shape of X_test = ', X_test.shape)
print('Shape of y_test = ',
y_test.shape)
48
Step: 5 Train Random Forest Classification Model from
sklearn.ensemble import RandomForestClassifier classifier =
RandomForestClassifier(n_estimators=100, criterion='gini')
classifier.fit(X_train, y_train) classifier.score(X_test, y_test)
Step: 6 Predict Cancer
patient1 = [17.99, 10.38, 122.8, 1001.0, 0.1184, 0.2776, 0.3001, 0.1471, 0.2419, 0.07871, 1.095,
0.9053, 8.589, 153.4, 0.006399, 0.04904, 0.05373, 0.01587, 0.03003, 0.006193, 25.38, 17.33,
184.6, 2019.0, 0.1622, 0.6656, 0.7119, 0.2654, 0.4601, 0.1189] patient1 = np.array([patient1])
patient1
classifier.predict(patient1)
pred = classifier.predict(patient1)
if pred[0] == 0:
print('Patient has Cancer (malignant tumor)')
else: print('Patient has no Cancer (benign
tumor)')
49
Practical –9
Aim: Implement Random Forest Regression using Python.
50
Step 1: Import necessary libraries.
51
Step 2: Load the Height-Age dataset.
52
Step 3: Separate the dataset into independent and dependent variables.
Step 4: Split the dataset into training and testing sets.
Step 5: Import the Random Forest Regressor.
Step 6: Create a Random Forest Regressor object.
Step 7: Train the model with the training data.
Step 8: Make predictions on the test dataset.
Step 9: Evaluate the model using R-Square.
53
Step 10: Visualize the Random Forest Regression.
Practical-10
Aim: Implementation of Logistic Regression using python
54
importnumpy
#X represents the size oftumor
a in centimeters.
X = numpy.array([
3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-1,1)
#Note: X has to be reshaped into a column from a row for the LogisticRegression() function to work.
#y represents whether or notthe tumor is cancerous (0 for "No", 1 for "Yes").
y = numpy.array([
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
fromsklearnimportlinear_model
logr = linear_model.LogisticRegression()
logr.fit(X,y)
#predict if tumor is cancerous where the size is 3.46mm:
predicted = logr.predict(numpy.array([
3.46]).reshape(-1,1))
print(predicted)
Example2:
importpandasas pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing
from sklearn.metrics import accuracy_score
from matplotlib import pyplot as plt
import seaborn as sns df =
pd.read_csv('creditcard.csv')
df.info()
55
df.head()
sum(df.duplicated())
df.drop_duplicates(inplace=True)
df.drop('Time', axis=1, inplace=True) X
= df.iloc[:,df.columns != 'Class']
56
y = df.Class
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.20, random_state=5, stratify=y)
from sklearn.preprocessing import StandardScaler
# Fit the scaler to the training data and transform both the training and test data scaler
= StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
model = LogisticRegression()
model.fit(X_train_scaled, y_train) #training the model
# Make predictions using the trained model
y_pred = model.predict(X_test_scaled)
train_acc = model.score(X_train_scaled, y_train)
print("The Accuracy for Training Set is {}".format(train_acc*100))
test_acc = accuracy_score(y_test, y_pred)
print("The Accuracy for Test Set is {}".format(test_acc*100))
57
Practical-11
Aim: Implementation of KNN Regression using python
#importing neccessary libraries
import pandas as pd import
numpy as np
gymdata=pd.DataFrame(data=DataValues,columns=ColumnNames) gymdata.head()
TargetVariable='Weight'
Predictors=['Hours','Calories']
x=gymdata[Predictors].values
y=gymdata[TargetVariable].values print(x)
print(y)
from sklearn.model_selection import train_test_split
58
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
from sklearn.neighbors import KNeighborsRegressor RegModel =
KNeighborsRegressor(n_neighbors=2) print(RegModel)
KNN=RegModel.fit(X_train,y_train)
prediction=KNN.predict(X_test) print(X_test)
from sklearn import metrics
print('R2 Value:',metrics.r2_score(y_train, KNN.predict(X_train)))
print('Accuracy',100- (np.mean(np.abs((y_test - prediction) / y_test)) * 100))
TestingDataResults=pd.DataFrame(data=X_test, columns=Predictors)
TestingDataResults[TargetVariable]=y_test
TestingDataResults[('Predicted'+TargetVariable)]=prediction TestingDataResults.head()
Practical-12
Aim: Implementation of Clustering with K means using python.
59
from sklearn.cluster import KMeans import
pandas as pd from sklearn.preprocessing import
MinMaxScaler from matplotlib import pyplot as
plt df = pd.read_csv("income.csv") df.head()
plt.scatter(df.Age,df['Income($)'])
plt.xlabel('Age') plt.ylabel('Income($)')
# Assuming df is your DataFrame containing the 'Age' and 'Income($)' columns
km = KMeans(n_clusters=3) y_predicted =
km.fit_predict(df[['Age','Income($)']])
# Now y_predicted should contain the cluster labels
print(y_predicted)
df['cluster']=y_predicted df.head()
60
km.cluster_centers_
df1 = df[df.cluster==0] df2 = df[df.cluster==1] df3 = df[df.cluster==2]
plt.scatter(df1.Age,df1['Income($)'],color='green')
plt.scatter(df2.Age,df2['Income($)'],color='red')
plt.scatter(df3.Age,df3['Income($)'],color='black')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='purple',marker=
'*',label='centroid')
plt.xlabel('Age')
plt.ylabel('Income ($)')
plt.legend()
61
Preprocessing using min max scale scaler
= MinMaxScaler()
scaler.fit(df[['Income($)']]) df['Income($)'] =
scaler.transform(df[['Income($)']])
scaler.fit(df[['Age']]) df['Age'] =
scaler.transform(df[['Age']]) df.head()
plt.scatter(df.Age,df['Income($)'])
62
km = KMeans(n_clusters=3) y_predicted =
km.fit_predict(df[['Age','Income($)']]) y_predicted
df['cluster']=y_predicted df.head()
km.cluster_centers_
63
df1 = df[df.cluster==0] df2 = df[df.cluster==1] df3 = df[df.cluster==2]
plt.scatter(df1.Age,df1['Income($)'],color='green')
plt.scatter(df2.Age,df2['Income($)'],color='red')
plt.scatter(df3.Age,df3['Income($)'],color='black')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='purple',marker=
'*',label='centroid')
plt.legend()
Elbow Plot
sse = [] k_rng =
range(1,10) for k in
k_rng:
km = KMeans(n_clusters=k)
km.fit(df[['Age','Income($)']])
sse.append(km.inertia_)
plt.xlabel('K') plt.ylabel('Sum of
squared error')
plt.plot(k_rng,sse)
64
Practical-13
Aim-Implementation of Agglomerative Hierarchical
Clustering in python.
# Importing the libraries
import numpy as nm import
matplotlib.pyplot as mtp
import pandas as pd import
warnings
# Define a function that triggers a specific warning def
trigger_warning():
warnings.warn("This is a warning message", Warning)
# Ignore the specific warning using a context manager
with warnings.catch_warnings():
warnings.simplefilter("ignore") trigger_warning()
# After the context manager, warnings are not ignored anymore
trigger_warning()
# Importing the dataset dataset =
pd.read_csv('Mall_Customers.csv')
dataset.head()
65
x = dataset.iloc[:, [3, 4]].values print(x)
#Finding the optimal number of clusters using the dendrogram import
scipy.cluster.hierarchy as shc dendro = shc.dendrogram(shc.linkage(x,
method="ward"))#ward is technique mtp.title("Dendrogram Plot")
mtp.ylabel("Euclidean Distances") mtp.xlabel("Customers") mtp.show()
66
#training the hierarchical model on dataset from sklearn.cluster import
AgglomerativeClustering hc= AgglomerativeClustering(n_clusters=5,
metric='euclidean', linkage='ward') y_pred= hc.fit_predict(x)
y_pred
67
#visulaizing the clusters mtp.scatter(x[y_pred == 0, 0], x[y_pred == 0, 1], s = 50,
c = 'blue', label = 'Cluster
1')
mtp.scatter(x[y_pred == 1, 0], x[y_pred == 1, 1], s = 100, c = 'green', label =
'Cluster 2') mtp.scatter(x[y_pred== 2, 0], x[y_pred == 2, 1], s = 100, c = 'red',
label = 'Cluster
3')
mtp.scatter(x[y_pred == 3, 0], x[y_pred == 3, 1], s = 100, c = 'cyan', label =
'Cluster 4') mtp.scatter(x[y_pred == 4, 0], x[y_pred == 4, 1], s = 100, c =
'magenta', label =
'Cluster 5') mtp.title('Clusters of
customers') mtp.xlabel('Annual
Income (k$)') mtp.ylabel('Spending
Score (1-100)') mtp.legend()
mtp.show()
68
Practical -14
Aim: Implementation of Naïve Bayes in Python.
69
70
71
Practical -15
Aim: Implementation of Hierarchical Clustering in Python.
72
73
Practical -16
Aim: Implementation of Ridge and Lasso Regression in Python.
74
75
76
Practical -15
Aim: Implementation of Hierarchical in Python.
77
78
Practical -18
Aim: Implementation of K-Mean Clustering in Python.
79
80
81