Ml-Final Record
Ml-Final Record
2
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
M2: To enhance the confidence level to develop sustainable solution to upgrade the
society forever.
M3: To empower the students with prerequisite professional skills for enhancing
3
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
4
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
6
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
7
S.A. ENGINEERING COLLEGE, CHENNAI 600 077
(An Autonomous Institution, Affiliated to Anna University)
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
ACADEMIC YEAR: 2024-2025 SEMESTER:05
SUBJECT CODE & NAME: CS1702A/MACHINE LEARNING LAB CLASS: III YEAR
SUBJECT HANDLED BY: Mrs. D. Keerthana
COURSE OBJECTIVES:
To understand the concepts of machine learning and types of problems tackled by machine
learning.
To explore the different supervised learning techniques.
To learn different aspects of unsupervised learning and reinforcement learning.
To learn the role of probabilistic methods for machine learning
To understand the basic concepts of neural networks and deep learning
Cognitive
COs code Knowledge Condition Criteria Course outcomes
Processes
supervised
Implement Practical real world Implement supervised learning
CS1702A.1 learning
(Apply) Constraint dataset algorithms for real world dataset.
algorithms
unsupervised Apply the concept of unsupervised
Practical
CS1702A.2 Apply learning application learning algorithms for suitable
Constraint
algorithms application.
Practical probabilistic real time Analyze appropriate probabilistic
CS1702A.3 Analyze
Constraint methods application methods for real time application
Analyze various tools
various tools machine
Practical Weka/MATLAB etc for
CS1702A.4 Analyze Weka/MATLA learning
Constraint implementing machine learning
B algorithms
algorithms
Machine Apply, analyze Machine Learning
Practical real world
CS1702A.5 Analyze Learning algorithms to solve real world
Constraint problems
algorithms problems
Implement Practical classificatio Implement DNN algorithm for
CS1702A.6 DNN algorithm
(Apply) Constraint n problem classification problem
AVER
COs code PO 1 PO 2 PO 3 PO 4 PO 5 PO 6 PO 7 PO 8 PO 9 PO 10 PO 11 PO 12
AGE
CS1702A.1 3 2 2 2 3 2 - - - - - - 2
CS1702A.2 3 2 2 2 3 2 - - - - - - 2
CS1702A.3 3 2 2 2 3 2 - - - - - - 2
CS1702A.4 3 2 2 2 3 2 - - - - - - 2
CS1702A.5 3 2 2 2 3 2 - - - - - - 2
CS1702A.6 3 2 2 2 3 2 - - - - - - 2
Average 3 2 2 2 3 2 - - - - - - 2
8
COs code PSO1 PSO2 PSO3 Average
CS1702A.1 1 1 1
CS1702A.2 1 1 1
CS1702A.3 1 1 1
CS1702A.4 1 1 1
CS1702A.5 1 1 1
CS1702A.6 1 1 1
Average 1 1 1
CS1702A.2: Apply the concept of unsupervised learning algorithms for suitable application.
POs Level Justification
Students should understand the basic of unsupervised learning algorithm
PO1 3
using the Engineering knowledge. So this PO is mapped with Level-3
By understanding the concept of unsupervised learning algorithm students
PO2 2 can identify, formulate by using the Problem Analysis. So this PO is
moderately mapped with Level-2.
Student need to understand the Design of different types of algorithm. So this
PO3 2
PO is moderately mapped with Level-2.
Mathematical models need to be Analyzed to conduct the experiment of
PO4 2
unsupervised learning algorithm. So it’s mapped with Level-2.
PO5 3 Students uses Modern tools to improve the efficiency of the machine
9
learning algorithm. So this PO is mapped with Level-3.
Students can analyze the local and global impact of the types of algorithm.
PO6 2
So this PO is mapped with Level-2.
PO7
PO8
PO9
PO10
PO11
PO12
Unsupervised learning algorithm should be analyzed and efficiently
PSO1 1
implemented for different applications. So this PSO is mapped with Level-1.
Students can be able simulate the current technology for unsupervised
PSO2 1
algorithms for suitable application. So this PSO is mapped with Level-1
PSO3
CS1702A.4: Analyze various tools Weka/MATLAB etc for implementing machine learning algorithms
POs Level Justification
PO1 3 Students should understand the basic of machine learning algorithm using the
Engineering knowledge. So this PO is mapped with Level-3
PO2 2 By understanding the concept of machine learning students can identify,
formulate by using the Problem Analysis. So this PO is moderately mapped
with Level-2.
PO3 2 Student need to understand the Design of different types of algorithm. So this
PO is moderately mapped with Level-2.
PO4 2 Mathematical models need to be Analyzed to conduct the experiment. So it’s
mapped with Level-2.
10
PO5 3 Students uses Modern tools to improve the efficiency of the machine
learning algorithm. So this PO is mapped with Level-3.
PO6 2 Students can analyze the local and global impact of the types of algorithm.
So this PO is mapped with Level-2.
PO7
PO8
PO9
PO10
PO11
PO12
PSO1 1 Student have the ability to apply the programming principles to understand
the algorithm and apply the solution to the problem. So this PO is mapped
with Level-1.
PSO2 1 Machine learning algorithm is implemented using the different Current
technology. So this PSO is mapped with Level-1.
PSO3
CS1702A.5: Apply, analyze Machine Learning algorithms to solve real world problems
POs Level Justification
By understanding the basic engineering knowledge the machine learning
PO1 3
algorithm is implemented. So this PO mapped with Level-1
By understanding the concept of machine learning students can identify,
PO2 2 formulate the real world problem by using the Problem Analysis. So this PO
is moderately mapped with Level-2.
Student need to understand the Design of different types of algorithm. So this
PO3 2
PO is moderately mapped with Level-2.
Mathematical models need to be Analyzed to conduct the experiment. So it’s
PO4 2
mapped with Level-2.
Students uses Modern tools to improve the efficiency of the machine
PO5 3 learning algorithm for real world problems. So this PO is mapped with Level-
3.
Students can analyze the impact of the machine learning algorithm. So this
PO6 2
PO is mapped with Level-2.
PO7
PO8
PO9
PO10
PO11
PO12
Student have the ability to identify the programming principles to
PSO1 1 understand the algorithm and apply the solution to the problem. So this PO is
mapped with Level-1.
Students can able to implement the algorithms with the current
PSO2 1
technologies. So this PO is mapped with Level-1.
PSO3
11
mapped with Level-2
Mathematical models need to be Analyzed to conduct the experiment. So it’s
PO4 2
mapped with Level-2.
Students uses Modern tools to improve the efficiency of the neural network
PO5 3
So this PO is mapped with Level-3.
Students can analyze the local and global impact of the types of algorithm.
PO6 2
So this PO is mapped with Level-1.
PO7
PO8
PO9
PO10
PO11
PO12
Student have the ability to apply the programming principles to understand
PSO1 1 the algorithm and apply the solution to the problem. So this PO is mapped
with Level-1.
Students can able to implement the Deep neural network with the current
PSO2 1
technologies. So this PO is mapped with Level-1.
PSO3
12
CS1702A MACHINE LEARNING LABORATORY L T P C
0 0 4 2
LIST OF EXPERIMENTS:
1. Implement the concept of decision trees with suitable data set from real world problem and
classify the data set to produce new sample.
2. Detecting Spam mails using Support vector machine
3. Implement facial recognition application with artificial neural network
4. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.
5. Implement character recognition using Multilayer Perceptron
6. Implement the kmeans algorithm
7. Implement the Dimensionality Reduction techniques
8. Write a program to construct a Bayesian network considering medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. You can use
Java/Python ML library classes/API.
9. Using Weka Tool Perform a. Data preprocessing by selecting or filtering attributes b. Data
preprocessing for handling missing value
TOTAL: 45 PERIODS
COURSE OUTCOMES:
HARDWARE: 30 terminals.
13
INDEX
Page
S.no Date Name of the experiment Co Marks Signature
no
1. Decision Tree 1 12
5. K means Algorithm 2 27
6. Character recognition 3
using Multilayer 29
Perceptron
7. Dimensionality 2 33
reduction techniques
8. Bayesian network 3 37
9. Data preprocessing 4 41
using weka tool
Content Beyond the syllabus
14
EXP NO:1
DATE:
DECISION TREES
AIM :
Implement the concept of decision trees with suitable data set from real world problem and
classify the data set to produce new sample.
ALGORITHM:
Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in step - 3.
Continue this process until a stage is reached where you cannot further classify the
nodes and called the final node as a leaf node.
PROGRAM:
INPUT (data.csv):
Age,Experience,Rank,Nationality,Go
36,10,9,UK,NO
42,12,4,USA,NO
23,4,6,N,NO
52,4,4,USA,NO
43,21,8,USA,YES
44,14,5,UK,NO
66,3,7,N,YES
35,14,9,UK,YES
52,13,7,N,YES
35,5,9,N,YES
24,3,5,USA,NO
18,3,7,UK,YES
45,9,9,UK,YES
16
OUTPUT:
5 5 5
RESULT:
Thus the decision tree is implemented successfully.
17
EXP NO:2
DATE:
DETECTING SPAM MAILS
AIM:
Implement Detecting Spam mails using Support vector machine.
ALGORITHM:
1. Import the Modules
2. read spam mails using pandas
3. Removing the duplicate rows and keeping only the first one. Also, we are using
Label Encoder to assign numbers to labels.
1. not spam
2. spam
4. Creating the functions to extract important features from the text. We are
removing any type of punctuation or stop words like the, he, she, etc. Stop words
are the words that contribute in the formation of the sentences but these are not
useful in detecting whether our SMS is spam or not.
5. Using the train_test_split function to convert our dataset into training and testing.
6. Creating our SVM model using inbuilt function of keras.
7. Finally predicting the result. Also, we are using pickle to save our trained model.
Later we will use this model in Tkinter to create GUI.
PROGRAM:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import svm
spam = pd.read_csv('data.csv')
z = spam['EmailText']
18
y = spam["Label"]
z_train, z_test,y_train, y_test = train_test_split(z,y,test_size = 0.2)
cv = CountVectorizer()
features = cv.fit_transform(z_train)
model = svm.SVC()
model.fit(features,y_train)
features_test = cv.transform(z_test)
print(model.score(features_test,y_test))
features_test = cv.transform(z_test)
print("Accuracy: {}".format(model.score(features_test,y_test)))
import pickle
from tkinter import *
def check_spam():
text = spam_text_Entry.get()
with open('data.csv') as file:
contents = file.read()
if text in contents:
print(text,"text is spam")
my_string_var.set("Result: text is spam")
else:
print(text,"text not a spam")
my_string_var.set("Result: text not a spam")
win = Tk()
win.geometry("400x600")
win.configure(background="cyan")
win.title("Email Spam Detector")
title = Label(win, text="Email Spam Detector",
bg="gray",width="300",height="2",fg="white",font=("Calibri 20 bold italic underline")).pack()
spam_text = Label(win, text="Enter your Text: ",bg="cyan", font=("Verdana
12")).place(x=12,y=100)
spam_text_Entry = Entry(win, textvariable=spam_text,width=33)
spam_text_Entry.place(x=155, y=105)
my_string_var = StringVar()
my_string_var.set("Result: ")
19
print_spam = Label(win,textvariable=my_string_var,bg="cyan",font=("Verdana
12")).place(x=12,y=200)
Button = Button(win,
text="Submit",width="12",height="1",activebackground="red",bg="Pink",command=check_spam,
font=("Verdana 12")).place(x=12,y=150)
win.mainloop()
Input (data.csv):
EmailText,Label
sale,spam
gasssss,ham
huge,spam
tint,spam
ginger,spam
20
OUTPUT:
5 5 5
RESULT:
Spam mails have been successfully detected using SVM.
21
EXP NO:3
DATE:
FACIAL RECOGNITION APPLICATION
AIM :
ALGORITHM:
PROGRAM:
import cv2
face_cascade = cv2.CascadeClassifier('C:\\Users\\Admin\\AppData\\
Local\\Programs\\Python\\Python310\\Lib\\site-packages\\cv2\\
data\\haarcascade_frontalface_default.xml')
img = cv2.imread('C:\\Users\\Admin\\Desktop\\New notes\\LAB\\kalam.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)
cv2.imshow('img', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
22
INPUT (kalam.png):
23
OUTPUT:
5 5 5
RESULT:
Thus the Face recognition application is implemented successfully.
24
EXP NO:4
DATE:
REGRESSION ALGORITHM
AIM:
Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.
ALGORITHM:
1. Read the Given data Sample to X and the curve (linear or non linear) to Y.
2. Set the value for Smoothening parameter or Free parameter say τ
3. Set the bias /Point of interest set x0 which is a subset of X.
4. Determine the weight matrix using :
5. Determine the value of model term parameter β using:
6. Prediction = x0*β
PROGRAM:
import numpy as np # linear algebra
import pandas as pd # data processing
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
class LocallyWeightedRegression:
#maths behind Linear Regression:
# theta = inv(X.T*W*X)*(X.T*W*Y)this will be our theta whic will
# be learnt for each point
# initializer of LocallyWeighted Regression that stores tau as parameters
def __init__(self, tau = 0.01):
self.tau = tau
def kernel(self, query_point, X):
Weight_matrix = np.mat(np.eye(len(X)))
for idx in range(len(X)):
25
Weight_matrix[idx,idx] = np.exp(np.dot(X[idx]-query_point, (X[idx]-query_point).T)/
(-2*self.tau*self.tau))
return Weight_matrix
# function that makes the predictions of the output of a given query point
def predict(self, X, Y, query_point):
q = np.mat([query_point, 1])
X = np.hstack((X, np.ones((len(X), 1))))
W = self.kernel(q, X)
theta = np.linalg.pinv(X.T*(W*X))*(X.T*(W*Y))
pred = np.dot(q, theta)
return pred
#function that fits and predicts the output of all query points
def fit_and_predict(self, X, Y):
Y_test, X_test = [], np.linspace(-np.max(X), np.max(X), len(X))
for x in X_test:
pred = self.predict(X, Y, x)
Y_test.append(pred[0][0])
Y_test = np.array(Y_test)
return Y_test
# function that computes the score rmse
def score(self, Y, Y_pred):
return np.sqrt(np.mean((Y-Y_pred)**2))
# function that fits as well as shows the scatter plot of all points
def fit_and_show(self, X, Y):
Y_test, X_test = [], np.linspace(-np.max(X), np.max(X), len(X))
for x in X_test:
pred = self.predict(X, Y, x)
Y_test.append(pred[0][0])
Y_test = np.array(Y_test)
plt.style.use('seaborn')
plt.title("The scatter plot for the value of tau = %.5f"% self.tau)
plt.scatter(X, Y, color = 'red')
plt.scatter(X_test, Y_test, color = 'green')
plt.show()
26
# reading the csv files of the given dataset
dfx = pd.read_csv('weightedX.csv')
dfy = pd.read_csv('weightedY.csv')
# store the values of dataframes in numpy arrays
X = dfx.values
Y = dfy.values
# normalising the data values
u = X.mean()
std = X.std()
X = ((X-u)/std)
tau = 0.2
model = LocallyWeightedRegression(tau)
Y_pred = model.fit_and_predict(X, Y)
model.fit_and_show(X, Y)
INPUT:(weightedX.csv)
1.2421
2.3348
0.13264
2.347
6.7389
3.7089
11.853
-1.8708
4.5025
3.2798
1.7573
3.3784
11.47
9.0595
-2.8174
9.3184
27
8.4211
0.86215
7.5544
-3.9883
INPUT:(weightedY.csv)
1.1718
1.8824
0.34283
2.1057
1.6477
2.3624
2.1212
-0.79712
2.0311
1.9795
1.471
2.4611
1.9819
1.1203
-1.3701
1.0287
1.3808
1.2178
1.4084
-1.5209
28
OUTPUT:
5 5 5
RESULT:
Non-parametric Locally Weighted Regression algorithm is implemented successfully.
29
EXP NO:5
DATE:
K MEANS ALGORITHM
AIM:
ALGORITHM:
PROGRAM:
import sys
import matplotlib
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
x=[4,5,10,4,3,11,14,6,10,12]
y=[21,19,24,17,16,25,24,22,21,21]
data = list(zip(x, y))
inertias = []
for i in range(1,11):
kmean=KMeans(n_clusters=i)
kmean.fit(data)
inertias.append(kmean.inertia_)
plt.plot(range(1,11),inertias,marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.show()
30
OUTPUT:
5 5 5
RESULT:
Thus the Program for K Means Algorithms has been executed successfully
.
31
EXP NO:6
DATE:
CHARACTER RECOGNITION
AIM :
Implement character recognition using Multilayer Perceptron
ALGORITHM
PROGRAM:
import numpy as np
import pandas as pd
# Load data
data=pd.read_csv('HR_comma_sep.csv')
data.head()
# Import LabelEncoder
from sklearn import preprocessing
# Creating labelEncoder
le = preprocessing.LabelEncoder()
32
# Spliting data into Feature and
X=data[['satisfaction_level','last_evaluation','number_project','average_montly_hours','time_spend
_company','Work_accident','promotion_last_5years','Departments','salary']]
y=data['left']
# Calcuate accuracy
accuracy_score(y_test,ypred)
33
INPUT :(HR_comma_sep.csv)
satisfaction_level,last_evaluation,number_project,average_montly_hours,
time_spend_company,Work_accident,left,promotion_last_5years,Departments, salary
0.38,0.53,2,157,3,0,1,0,sales,low
0.80,0.86,5,262,6,0,1,0,sales,medium
0.11,0.88,7,272,4,0,1,0, sales,medium
0.72,0.87,5,223,5,0,1,0, sales,low
0.37,0.52,2,159,3,0,1,0,sales,low
34
OUTPUT:
C:\Users\Administrator\PycharmProjects\veena\venv\Scripts\python.exe C:\Users\Administrator\
PycharmProjects\veena\percept.py
Iteration 1, loss = 0.00456078
Iteration 2, loss = 0.00055361
Iteration 3, loss = 0.00029325
Iteration 4, loss = 0.00025340
Iteration 5, loss = 0.00024135
Iteration 6, loss = 0.00023436
Iteration 7, loss = 0.00022864
Iteration 8, loss = 0.00022336
Iteration 9, loss = 0.00021831
Iteration 10, loss = 0.00021343
Iteration 11, loss = 0.00020870
Iteration 12, loss = 0.00020411
Iteration 13, loss = 0.00019966
Iteration 14, loss = 0.00019535
Training loss did not improve more than tol=0.000100 for 10 consecutive epochs. Stopping.
5 5 5
RESULT:
35
EXP NO:7
DATE:
DIMENSIONALITY REDUCTION TECHNIQUES
AIM:
Construct a Bayesian network considering medical data. Use this model to demonstrate the
diagnosis of heart patients using a standard Heart Disease Data Set.
ALGORITHM:
1. Import the python library and label encoder
2. Load the data using the read_csv function through input data (penguins.csv)
PROGRAM:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.decomposition import PCA, KernelPCA, FastICA, NMF, FactorAnalysis
36
penguins = sns.load_dataset("penguins")
penguins = (penguins.dropna())
penguins.head()
data = (penguins.select_dtypes(np.number))
data.head()
random_state = 0
pca_pl = make_pipeline(StandardScaler(),PCA(n_components=2,random_state=random_state))
pcs = pca_pl.fit_transform(data)
pcs[0:5,:]
pcs_df = pd.DataFrame(data = pcs ,columns = ['PC1', 'PC2'])
pcs_df['Species'] = penguins.species.values
pcs_df['Sex'] = penguins.sex.values
pcs_df.head()
plt.figure(figsize=(12,10))
with sns.plotting_context("talk",font_scale=1.25):
sns.scatterplot(x="PC1", y="PC2",data=pcs_df,hue="Species",style="Sex",s=100)
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.title("PCA", size=24)
plt.savefig("PCA_Example_in_Python.png",format='png',dpi=75)
plt.show()
37
INPUT:
38
OUTPUT:
5 5 5
RESULT:
Thus the Dimensionality reduction for image is successfully performed
39
EXP NO:8
DATE:
BAYESIAN NETWORK
AIM:
Construct a Bayesian network considering medical data. Use this model to demonstrate the
diagnosis of heart patients using a standard Heart Disease Data Set.
ALGORITHM:
1. age: age in years
2. sex: sex (1 = male; 0 = female)
3. cp: chest pain type
1. Value 1: typical angina
2. Value 2: atypical angina
3. Value 3: non-anginal pain
4. Value 4: asymptomatic
4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
5. chol: serum cholestoral in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results
5. Value 0: normal
6. Value 1: having ST-T wave abnormality (T wave inversions
and/or STelevation or depression of > 0.05 mV)
7. Value 2: showing probable or definite left ventricular hypertrophy
by Estes’criteria
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak = ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment
8. Value 1: upsloping
9. Value 2: flat
10. Value 3: downsloping
12. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
13. Heartdisease: It is integer valued from 0 (no presence) to 4.
40
PROGRAM:
import numpy as np
import csv
import pandas as pd
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
print(heartDisease.head())
Model=BayesianModel([('age','trestbps'),('age','fbs'),
('sex','trestbps'),('exang','trestbps'),('trestbps','heartdise
ase'),('fbs','heartdisease'),('heartdisease','restecg'),
('heartdisease','thalach'),('heartdisease','chol')])
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
HeartDisease_infer = VariableElimination(model)
q=HeartDisease_infer.query(variables=['heartdisease'],evidence
41
={'age':28})
print(q['heartdisease'])
q=HeartDisease_infer.query(variables=['heartdisease'],evidence ={'chol':100})
print(q['heartdisease'])
OUTPUT:
0 63 1 1 145 ... 3 0 6 0
1 67 1 4 160 ... 2 3 3 2
2 67 1 4 120 ... 2 2 7 1
3 37 1 3 130 ... 3 0 3 0
4 41 0 2 130 ... 1 0 3 0
[5 rows x 14 columns]
╒════════════════╤═════════════════════╕
│ heartdisease │ phi(heartdisease) │
╞════════════════╪═════════════════════╡
│ heartdisease_0 │ 0.6791 │
├────────────────┼─────────────────────┤
│ heartdisease_1 │ 0.1212 │
42
├────────────────┼─────────────────────┤
│ heartdisease_2 │ 0.0810 │
├────────────────┼─────────────────────┤
│ heartdisease_3 │ 0.0939 │
├────────────────┼─────────────────────┤
│ heartdisease_4 │ 0.0247 │
╘════════════════╧═════════════════════╛
╒════════════════╤═════════════════════╕
│ heartdisease │ phi(heartdisease) │
╞════════════════╪═════════════════════╡
│ heartdisease_0 │ 0.5400 │
├────────────────┼─────────────────────┤
│ heartdisease_1 │ 0.1533 │
├────────────────┼─────────────────────┤
│ heartdisease_2 │ 0.1303 │
├────────────────┼─────────────────────┤
│ heartdisease_3 │ 0.1259 │
├────────────────┼─────────────────────┤
│ heartdisease_4 │ 0.0506 │
5 5 5
RESULT:
Thus Bayesian Network on Medical Data(Heart Disease Dataset) is successfully Implemented.
43
44
EXP NO:9
DATE:
DATA PREPROCESSING
AIM :
Using Weka Tool Perform a. Data preprocessing by selecting or filtering attributes b. Data
preprocessing for handling missing value
ALGORITHM
Open file ... option under the Preprocess tag select the weather-nominal.arff file.
The weather database contains five fields - outlook, temperature, humidity, windy and play.
We select an attribute from this list by clicking on it, further details on the attribute
The table underneath this information shows the nominal values for
It also shows the count and weight in terms of a percentage for each
nominal value.
3. Removing Attributes
Many a time, the data that you want to use for model building comes with many
irrelevant fields. For example, the customer database may contain his mobile number
45
4. Applying Filters
Some of the machine learning techniques such as association rule mining requires
We will convert these to nominal by applying a filter on our raw data. Click on the
Choose button in the Filter subwindow and select the following filter −
weka→filters→supervised→attribute→Discretize
select the best attributes for deciding the play. Select and apply the following filter
weka→filters→supervised→attribute→AttributeSelection
The problem used for this example is the Pima Indians onset of diabetes dataset.
It is a classification problem where each instance represents medical details for one patient
and the task is to predict whether the patient will have an onset of diabetes within the next five
years.
46
OUTPUT:
47
48
49
50
51
S.A ENGINEERING COLLEGE, CHENNAI-77
DEPARTMENT OF AI&DS
5 5 5
RESULT:
Data preprocessing by selecting or filtering attributes, Data preprocessing for handling missing
value operations were performed using Weka tool.
EXP NO:10
DATE:
RANDOM FOREST ALGORITHM
AIM :
Construct a tree using the random forest algorithm by applying the ensemble technique
for the given data set.
ALGORITHM
PROGRAM:
# importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
#importing datasets
data_set= pd.read_csv('user_data.csv')
#feature Scaling
from sklearn.preprocessing import StandardScaler
st_x= StandardScaler()
x_train= st_x.fit_transform(x_train)
x_test= st_x.transform(x_test)
In the above code, we have pre-processed the data. Where we have loaded the dataset, which is given as:
53
2. Fitting the Random Forest algorithm to the training set:
Now we will fit the Random forest algorithm to the training set. To fit it, we will import
the RandomForestClassifier class from the sklearn.ensemble library. The code is given below:
o n_estimators= The required number of trees in the Random Forest. The default value is 10.
We can choose any number but need to take care of the overfitting issue.
o criterion= It is a function to analyze the accuracy of the split. Here we have taken "entropy"
for the information gain.
54
Output:
Since our model is fitted to the training set, so now we can predict the test result. For prediction,
we will create a new prediction vector y_pred. Below is the code for it:
Output:
55
By checking the above prediction vector and test set real vector, we can determine the incorrect
predictions done by the classifier.
Now we will create the confusion matrix to determine the correct and incorrect predictions. Below is the
code for it:
56
Output:
As we can see in the above matrix, there are 4+4= 8 incorrect predictions and 64+28= 92 correct
predictions.
Here we will visualize the training set result. To visualize the training set result we will plot a graph for
the Random forest classifier. The classifier will predict yes or No for the users who have either
Purchased or Not purchased the SUV car as we did in Logistic Regression. Below is the code for it:
57
12. mtp.title('Random Forest Algorithm (Training set)')
13. mtp.xlabel('Age')
14. mtp.ylabel('Estimated Salary')
15. mtp.legend()
16. mtp.show()
Output:
Now we will visualize the test set result. Below is the code for it:
Output:
5 5 5
RESULT:
Thus the random forest algorithm is successfully implemented.
59
----------------------------------------- COMPLETED -----------------------------------------
60