Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views60 pages

Ml-Final Record

The Department of Artificial Intelligence & Data Science aims to achieve academic excellence through quality education and ethical standards. Its mission includes fostering professionalism, enhancing employability, and empowering students with essential skills. The document outlines the program's educational objectives, outcomes, and specific outcomes related to machine learning, emphasizing the application of various learning techniques and tools.

Uploaded by

2224079
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views60 pages

Ml-Final Record

The Department of Artificial Intelligence & Data Science aims to achieve academic excellence through quality education and ethical standards. Its mission includes fostering professionalism, enhancing employability, and empowering students with essential skills. The document outlines the program's educational objectives, outcomes, and specific outcomes related to machine learning, emphasizing the application of various learning techniques and tools.

Uploaded by

2224079
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 60

1

2
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE

VISION OF THE DEPARTMENT

To conceive our department as Centre of Academic Excellence by catering quality

education with ethical standards

MISSION OF THE DEPARTMENT

M1: To create a conducive atmosphere to achieve active professionalism by fortifying

academic proficiency with ethical standards.

M2: To enhance the confidence level to develop sustainable solution to upgrade the

society forever.

M3: To empower the students with prerequisite professional skills for enhancing

employability and entrepreneurship, continuing education and research.

3
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE

PROGRAMME EDUCATIONAL OBJECTIVES (PEOS)

PEO 1: Graduates shall have professional competency in the field of Computer

Science and Engineering for pursuing higher education, research or as entrepreneurs.

PEO 2: Graduates shall work in a business environment with ethical standards,

leadership qualities and communication necessary for engineering principles.

PEO 3: Graduates shall adapt to emerging technologies and respond to the

challenges of the environment and society forever.

4
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE

PROGRAMME OUTCOMES (POs)


Engineering Graduates will be able to:
1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering fundamentals,
and an engineering specialization to the solution of complex engineering problems.
2. Problem analysis: Identify, formulate, review research literature, and analyze complex engineering
problems reaching substantiated conclusions using first principles of mathematics, natural sciences,
and engineering sciences.
3. Design/development of solutions: Design solutions for complex engineering problems and design
system components or processes that meet the specified needs with appropriate consideration for the
public health and safety, and the cultural, societal, and environmental considerations.
4. Conduct investigations of complex problems: Use research-based knowledge and research methods
including design of experiments, analysis and interpretation of data, and synthesis of the information
to provide valid conclusions.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modelling to complex engineering activities with an
understanding of the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess societal,
health, safety, legal and cultural issues and the consequent responsibilities relevant to the professional
engineering practice.
7. Environment and sustainability: Understand the impact of the professional engineering solutions in
societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable
development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of
the engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or leader in
diverse teams, and in multidisciplinary settings.
5
10. Communication: Communicate effectively on complex engineering activities with the engineering
community and with society at large, such as, being able to comprehend and write effective reports
and design documentation, make effective presentations, and give and receive clear instructions.
11. Project management and finance: Demonstrate knowledge and understanding of the engineering
and management principles and apply these to one’s own work, as a member and leader in a team, to
manage projects and in multidisciplinary environments.
12. Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.

6
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE

PROGRAMME SPECIFIC OUTCOMES [PSO]

PSO 1: To analyze, design and develop computing solutions by applying


foundational concepts of Computer Science and Engineering.

PSO 2: To apply software engineering principles and practices for developing


quality software for scientific and business applications.

PSO 3: To adapt to emerging Information and Communication Technologies


(ICT) to innovate ideas and solutions to existing/novel problems.

7
S.A. ENGINEERING COLLEGE, CHENNAI 600 077
(An Autonomous Institution, Affiliated to Anna University)
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
ACADEMIC YEAR: 2024-2025 SEMESTER:05
SUBJECT CODE & NAME: CS1702A/MACHINE LEARNING LAB CLASS: III YEAR
SUBJECT HANDLED BY: Mrs. D. Keerthana

COURSE OBJECTIVES:

 To understand the concepts of machine learning and types of problems tackled by machine
learning.
 To explore the different supervised learning techniques.
 To learn different aspects of unsupervised learning and reinforcement learning.
 To learn the role of probabilistic methods for machine learning
 To understand the basic concepts of neural networks and deep learning

Cognitive
COs code Knowledge Condition Criteria Course outcomes
Processes
supervised
Implement Practical real world Implement supervised learning
CS1702A.1 learning
(Apply) Constraint dataset algorithms for real world dataset.
algorithms
unsupervised Apply the concept of unsupervised
Practical
CS1702A.2 Apply learning application learning algorithms for suitable
Constraint
algorithms application.
Practical probabilistic real time Analyze appropriate probabilistic
CS1702A.3 Analyze
Constraint methods application methods for real time application
Analyze various tools
various tools machine
Practical Weka/MATLAB etc for
CS1702A.4 Analyze Weka/MATLA learning
Constraint implementing machine learning
B algorithms
algorithms
Machine Apply, analyze Machine Learning
Practical real world
CS1702A.5 Analyze Learning algorithms to solve real world
Constraint problems
algorithms problems
Implement Practical classificatio Implement DNN algorithm for
CS1702A.6 DNN algorithm
(Apply) Constraint n problem classification problem

AVER
COs code PO 1 PO 2 PO 3 PO 4 PO 5 PO 6 PO 7 PO 8 PO 9 PO 10 PO 11 PO 12
AGE
CS1702A.1 3 2 2 2 3 2 - - - - - - 2
CS1702A.2 3 2 2 2 3 2 - - - - - - 2
CS1702A.3 3 2 2 2 3 2 - - - - - - 2
CS1702A.4 3 2 2 2 3 2 - - - - - - 2
CS1702A.5 3 2 2 2 3 2 - - - - - - 2
CS1702A.6 3 2 2 2 3 2 - - - - - - 2

Average 3 2 2 2 3 2 - - - - - - 2

8
COs code PSO1 PSO2 PSO3 Average
CS1702A.1 1 1 1
CS1702A.2 1 1 1
CS1702A.3 1 1 1
CS1702A.4 1 1 1
CS1702A.5 1 1 1
CS1702A.6 1 1 1
Average 1 1 1

CO-PO mapping Justification


CS1702A.1: Implement supervised learning algorithms for real world dataset.
POs Level Justification
Students should understand the basic concept of supervised learning
PO1 3 techniques using the Basic Engineering knowledge. So this PO is mapped
with Level-3
Different types of supervised learning techniques are Analyzed. So this PO is
PO2 2
moderately mapped with Level-2
Students can understand and Evaluate performance of supervised algorithm.
PO3 2
So this PO is moderately mapped with Level-2
Analysis can be conducted to evaluate the performance of different
PO4 2
algorithm. So this PO is mapped with Level-2.
Students uses Modern tools to improve the efficiency of the supervised
PO5 3
learning algorithm. So this PO is mapped with Level-3
Local and Global impact of the algorithm is analyzed for the profession. So
PO6 2
this PO is mapped with Level-2.
PO7
PO8
PO9
PO10
PO11
PO12
Students will be able to apply the programming principles to efficiently
PSO1 1 implement and analyze the different kind of supervised learning algorithm.
So this PSO is mapped with Level-1
Students will be able to understand and simulate the current technologies for
PSO2 1 the implementation of the supervised learning algorithm. So this PSO is
mapped with Level-1
PSO3

CS1702A.2: Apply the concept of unsupervised learning algorithms for suitable application.
POs Level Justification
Students should understand the basic of unsupervised learning algorithm
PO1 3
using the Engineering knowledge. So this PO is mapped with Level-3
By understanding the concept of unsupervised learning algorithm students
PO2 2 can identify, formulate by using the Problem Analysis. So this PO is
moderately mapped with Level-2.
Student need to understand the Design of different types of algorithm. So this
PO3 2
PO is moderately mapped with Level-2.
Mathematical models need to be Analyzed to conduct the experiment of
PO4 2
unsupervised learning algorithm. So it’s mapped with Level-2.
PO5 3 Students uses Modern tools to improve the efficiency of the machine

9
learning algorithm. So this PO is mapped with Level-3.
Students can analyze the local and global impact of the types of algorithm.
PO6 2
So this PO is mapped with Level-2.
PO7
PO8
PO9
PO10
PO11
PO12
Unsupervised learning algorithm should be analyzed and efficiently
PSO1 1
implemented for different applications. So this PSO is mapped with Level-1.
Students can be able simulate the current technology for unsupervised
PSO2 1
algorithms for suitable application. So this PSO is mapped with Level-1
PSO3

CS1702A.3: Analyze appropriate probabilistic methods for real time application


POs Level Justification
Students should understand the basic concept of probabilistic method using
PO1 3
the Basic Engineering knowledge. So this PO is mapped with Level-3
Understand the different types of probabilistic method to solve the problems.
PO2 2
So this PO is mapped with Level-2
Design a probabilistic method for an application. So this PO is mapped with
PO3 2
Level-2
Students should analyze the usage an application to implement probabilistic
PO4 2
methods. So this PO is mapped with Level-2
For analyzing the different versions of application a modern tool is used. So
PO5 3
this PO is mapped with Level-3
Student can analyze the cause of impact in different algorithms. So this PO
PO6 2
is mapped with Level-2
PO7
PO8
PO9
PO10
PO11
PO12
Student have the ability to identify the programming principles to
PSO1 1 understand the algorithm and apply the solution to the problem. So this PO is
mapped with Level-1.
Students can able to implement the algorithms with the current
PSO2 1
technologies. So this PO is mapped with Level-1.
PSO3

CS1702A.4: Analyze various tools Weka/MATLAB etc for implementing machine learning algorithms
POs Level Justification
PO1 3 Students should understand the basic of machine learning algorithm using the
Engineering knowledge. So this PO is mapped with Level-3
PO2 2 By understanding the concept of machine learning students can identify,
formulate by using the Problem Analysis. So this PO is moderately mapped
with Level-2.
PO3 2 Student need to understand the Design of different types of algorithm. So this
PO is moderately mapped with Level-2.
PO4 2 Mathematical models need to be Analyzed to conduct the experiment. So it’s
mapped with Level-2.

10
PO5 3 Students uses Modern tools to improve the efficiency of the machine
learning algorithm. So this PO is mapped with Level-3.
PO6 2 Students can analyze the local and global impact of the types of algorithm.
So this PO is mapped with Level-2.
PO7
PO8
PO9
PO10
PO11
PO12
PSO1 1 Student have the ability to apply the programming principles to understand
the algorithm and apply the solution to the problem. So this PO is mapped
with Level-1.
PSO2 1 Machine learning algorithm is implemented using the different Current
technology. So this PSO is mapped with Level-1.
PSO3

CS1702A.5: Apply, analyze Machine Learning algorithms to solve real world problems
POs Level Justification
By understanding the basic engineering knowledge the machine learning
PO1 3
algorithm is implemented. So this PO mapped with Level-1
By understanding the concept of machine learning students can identify,
PO2 2 formulate the real world problem by using the Problem Analysis. So this PO
is moderately mapped with Level-2.
Student need to understand the Design of different types of algorithm. So this
PO3 2
PO is moderately mapped with Level-2.
Mathematical models need to be Analyzed to conduct the experiment. So it’s
PO4 2
mapped with Level-2.
Students uses Modern tools to improve the efficiency of the machine
PO5 3 learning algorithm for real world problems. So this PO is mapped with Level-
3.
Students can analyze the impact of the machine learning algorithm. So this
PO6 2
PO is mapped with Level-2.
PO7
PO8
PO9
PO10
PO11
PO12
Student have the ability to identify the programming principles to
PSO1 1 understand the algorithm and apply the solution to the problem. So this PO is
mapped with Level-1.
Students can able to implement the algorithms with the current
PSO2 1
technologies. So this PO is mapped with Level-1.
PSO3

CS1702A.6: Implement DNN algorithm for classification problem


POs Level Justification
Students need to understand the basic knowledge of Neural Network. So this
PO1 3
PO is mapped with Level-3.
Identify the basic concept of Deep neural network. So this PO is mapped
PO2 2
with Level-2
PO3 2 Students need to design a working of Deep neural network. So this PO is

11
mapped with Level-2
Mathematical models need to be Analyzed to conduct the experiment. So it’s
PO4 2
mapped with Level-2.
Students uses Modern tools to improve the efficiency of the neural network
PO5 3
So this PO is mapped with Level-3.
Students can analyze the local and global impact of the types of algorithm.
PO6 2
So this PO is mapped with Level-1.
PO7
PO8
PO9
PO10
PO11
PO12
Student have the ability to apply the programming principles to understand
PSO1 1 the algorithm and apply the solution to the problem. So this PO is mapped
with Level-1.
Students can able to implement the Deep neural network with the current
PSO2 1
technologies. So this PO is mapped with Level-1.
PSO3

12
CS1702A MACHINE LEARNING LABORATORY L T P C

0 0 4 2

LIST OF EXPERIMENTS:

1. Implement the concept of decision trees with suitable data set from real world problem and
classify the data set to produce new sample.
2. Detecting Spam mails using Support vector machine
3. Implement facial recognition application with artificial neural network
4. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.
5. Implement character recognition using Multilayer Perceptron
6. Implement the kmeans algorithm
7. Implement the Dimensionality Reduction techniques
8. Write a program to construct a Bayesian network considering medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. You can use
Java/Python ML library classes/API.
9. Using Weka Tool Perform a. Data preprocessing by selecting or filtering attributes b. Data
preprocessing for handling missing value

TOTAL: 45 PERIODS

COURSE OUTCOMES:

● Understand the implementation procedures for the machine learning algorithms.


● Design Python programs for various Learning algorithms.
● Apply appropriate Machine Learning algorithms to data sets
● Identify and apply Machine Learning algorithms to solve real world problems.
LIST OF EQUIPMENT FOR A BATCH OF 30 STUDENTS:

SOFTWARE: Python/Java with ML Package/R

HARDWARE: 30 terminals.

13
INDEX

Page
S.no Date Name of the experiment Co Marks Signature
no
1. Decision Tree 1 12

2. Detecting spam mails


using support vector 1 15
machine
3. Facial recognition 5 19
application
4. Regression Algorithm 1 22

5. K means Algorithm 2 27

6. Character recognition 3
using Multilayer 29
Perceptron
7. Dimensionality 2 33
reduction techniques
8. Bayesian network 3 37

9. Data preprocessing 4 41
using weka tool
Content Beyond the syllabus

10. Random forest 6


Algorithm

----------------------------------------- COMPLETED --------------------------------------------

14
EXP NO:1
DATE:
DECISION TREES
AIM :

Implement the concept of decision trees with suitable data set from real world problem and
classify the data set to produce new sample.

ALGORITHM:

Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in step - 3.
Continue this process until a stage is reached where you cannot further classify the
nodes and called the final node as a leaf node.

PROGRAM:

from matplotlib import pyplot as plt


import pandas
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
df = pandas.read_csv("data.csv")
d = {'UK': 0, 'USA': 1, 'N': 2}
df['Nationality'] = df['Nationality'].map(d)
d = {'YES': 1, 'NO': 0}
df['Go'] = df['Go'].map(d)
features = ['Age', 'Experience', 'Rank', 'Nationality']
X = df[features]
y = df['Go']
dtree = DecisionTreeClassifier()
15
dtree = dtree.fit(X, y)
tree.plot_tree(dtree,feature_names=features)
plt.show()

INPUT (data.csv):

Age,Experience,Rank,Nationality,Go
36,10,9,UK,NO
42,12,4,USA,NO
23,4,6,N,NO
52,4,4,USA,NO
43,21,8,USA,YES
44,14,5,UK,NO
66,3,7,N,YES
35,14,9,UK,YES
52,13,7,N,YES
35,5,9,N,YES
24,3,5,USA,NO
18,3,7,UK,YES
45,9,9,UK,YES

16
OUTPUT:

S.A ENGINEERING COLLEGE, CHENNAI-77


DEPARTMENT OF AI&DS
LAB EXPT EVALUATION SCHEME
Aim & Source Recor Viva-
Total
Algorithm Code d Voce Signature
(20)
(5) (5) (5) (5)

5 5 5

RESULT:
Thus the decision tree is implemented successfully.

17
EXP NO:2
DATE:
DETECTING SPAM MAILS

AIM:
Implement Detecting Spam mails using Support vector machine.

ALGORITHM:
1. Import the Modules
2. read spam mails using pandas
3. Removing the duplicate rows and keeping only the first one. Also, we are using
Label Encoder to assign numbers to labels.

1. not spam
2. spam
4. Creating the functions to extract important features from the text. We are
removing any type of punctuation or stop words like the, he, she, etc. Stop words
are the words that contribute in the formation of the sentences but these are not
useful in detecting whether our SMS is spam or not.

5. Using the train_test_split function to convert our dataset into training and testing.
6. Creating our SVM model using inbuilt function of keras.
7. Finally predicting the result. Also, we are using pickle to save our trained model.
Later we will use this model in Tkinter to create GUI.

PROGRAM:

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import svm
spam = pd.read_csv('data.csv')
z = spam['EmailText']

18
y = spam["Label"]
z_train, z_test,y_train, y_test = train_test_split(z,y,test_size = 0.2)
cv = CountVectorizer()
features = cv.fit_transform(z_train)
model = svm.SVC()
model.fit(features,y_train)
features_test = cv.transform(z_test)
print(model.score(features_test,y_test))
features_test = cv.transform(z_test)
print("Accuracy: {}".format(model.score(features_test,y_test)))
import pickle
from tkinter import *
def check_spam():
text = spam_text_Entry.get()
with open('data.csv') as file:
contents = file.read()
if text in contents:
print(text,"text is spam")
my_string_var.set("Result: text is spam")
else:
print(text,"text not a spam")
my_string_var.set("Result: text not a spam")
win = Tk()
win.geometry("400x600")
win.configure(background="cyan")
win.title("Email Spam Detector")
title = Label(win, text="Email Spam Detector",
bg="gray",width="300",height="2",fg="white",font=("Calibri 20 bold italic underline")).pack()
spam_text = Label(win, text="Enter your Text: ",bg="cyan", font=("Verdana
12")).place(x=12,y=100)
spam_text_Entry = Entry(win, textvariable=spam_text,width=33)
spam_text_Entry.place(x=155, y=105)
my_string_var = StringVar()
my_string_var.set("Result: ")
19
print_spam = Label(win,textvariable=my_string_var,bg="cyan",font=("Verdana
12")).place(x=12,y=200)
Button = Button(win,
text="Submit",width="12",height="1",activebackground="red",bg="Pink",command=check_spam,
font=("Verdana 12")).place(x=12,y=150)
win.mainloop()

Input (data.csv):

EmailText,Label
sale,spam
gasssss,ham
huge,spam
tint,spam
ginger,spam

20
OUTPUT:

S.A ENGINEERING COLLEGE, CHENNAI-77


DEPARTMENT OF AI&DS
LAB EXPT EVALUATION SCHEME
Aim & Source Recor Viva-
Total
Algorithm Code d Voce Signature
(20)
(5) (5) (5) (5)

5 5 5

RESULT:
Spam mails have been successfully detected using SVM.

21
EXP NO:3
DATE:
FACIAL RECOGNITION APPLICATION

AIM :

Implement facial recognition application with artificial neural network .

ALGORITHM:

1. Import the python packages opencv,cv-3,cv


2. load all the pre trained packages such as CascadeClassifier, face_cascade.detectMultiScale from the
location of python library
3. Read the image from the system
4. Convert the image to Grey Scale
5. Predict the scaling factor and others using the face_cascade.detectMultiScale methods in cv2 library
6. Start with testing the model which we have created

PROGRAM:

import cv2
face_cascade = cv2.CascadeClassifier('C:\\Users\\Admin\\AppData\\
Local\\Programs\\Python\\Python310\\Lib\\site-packages\\cv2\\
data\\haarcascade_frontalface_default.xml')
img = cv2.imread('C:\\Users\\Admin\\Desktop\\New notes\\LAB\\kalam.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)
cv2.imshow('img', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

22
INPUT (kalam.png):

23
OUTPUT:

S.A ENGINEERING COLLEGE, CHENNAI-77


DEPARTMENT OF AI&DS
LAB EXPT EVALUATION SCHEME
Aim & Source Recor Viva-
Total
Algorithm Code d Voce Signature
(20)
(5) (5) (5) (5)

5 5 5

RESULT:
Thus the Face recognition application is implemented successfully.

24
EXP NO:4
DATE:
REGRESSION ALGORITHM

AIM:

Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.

ALGORITHM:
1. Read the Given data Sample to X and the curve (linear or non linear) to Y.
2. Set the value for Smoothening parameter or Free parameter say τ
3. Set the bias /Point of interest set x0 which is a subset of X.
4. Determine the weight matrix using :
5. Determine the value of model term parameter β using:
6. Prediction = x0*β

PROGRAM:
import numpy as np # linear algebra
import pandas as pd # data processing
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
class LocallyWeightedRegression:
#maths behind Linear Regression:
# theta = inv(X.T*W*X)*(X.T*W*Y)this will be our theta whic will
# be learnt for each point
# initializer of LocallyWeighted Regression that stores tau as parameters
def __init__(self, tau = 0.01):
self.tau = tau
def kernel(self, query_point, X):
Weight_matrix = np.mat(np.eye(len(X)))
for idx in range(len(X)):

25
Weight_matrix[idx,idx] = np.exp(np.dot(X[idx]-query_point, (X[idx]-query_point).T)/
(-2*self.tau*self.tau))
return Weight_matrix
# function that makes the predictions of the output of a given query point
def predict(self, X, Y, query_point):
q = np.mat([query_point, 1])
X = np.hstack((X, np.ones((len(X), 1))))
W = self.kernel(q, X)
theta = np.linalg.pinv(X.T*(W*X))*(X.T*(W*Y))
pred = np.dot(q, theta)
return pred
#function that fits and predicts the output of all query points
def fit_and_predict(self, X, Y):
Y_test, X_test = [], np.linspace(-np.max(X), np.max(X), len(X))
for x in X_test:
pred = self.predict(X, Y, x)
Y_test.append(pred[0][0])
Y_test = np.array(Y_test)
return Y_test
# function that computes the score rmse
def score(self, Y, Y_pred):
return np.sqrt(np.mean((Y-Y_pred)**2))
# function that fits as well as shows the scatter plot of all points
def fit_and_show(self, X, Y):
Y_test, X_test = [], np.linspace(-np.max(X), np.max(X), len(X))
for x in X_test:
pred = self.predict(X, Y, x)
Y_test.append(pred[0][0])
Y_test = np.array(Y_test)
plt.style.use('seaborn')
plt.title("The scatter plot for the value of tau = %.5f"% self.tau)
plt.scatter(X, Y, color = 'red')
plt.scatter(X_test, Y_test, color = 'green')
plt.show()
26
# reading the csv files of the given dataset
dfx = pd.read_csv('weightedX.csv')
dfy = pd.read_csv('weightedY.csv')
# store the values of dataframes in numpy arrays
X = dfx.values
Y = dfy.values
# normalising the data values
u = X.mean()
std = X.std()
X = ((X-u)/std)
tau = 0.2
model = LocallyWeightedRegression(tau)
Y_pred = model.fit_and_predict(X, Y)
model.fit_and_show(X, Y)

INPUT:(weightedX.csv)

1.2421
2.3348
0.13264
2.347
6.7389
3.7089
11.853
-1.8708
4.5025
3.2798
1.7573
3.3784
11.47
9.0595
-2.8174
9.3184

27
8.4211
0.86215
7.5544
-3.9883

INPUT:(weightedY.csv)

1.1718
1.8824
0.34283
2.1057
1.6477
2.3624
2.1212
-0.79712
2.0311
1.9795
1.471
2.4611
1.9819
1.1203
-1.3701
1.0287
1.3808
1.2178
1.4084
-1.5209

28
OUTPUT:

S.A ENGINEERING COLLEGE, CHENNAI-77


DEPARTMENT OF AI&DS
LAB EXPT EVALUATION SCHEME
Aim & Source Recor Viva-
Total
Algorithm Code d Voce Signature
(20)
(5) (5) (5) (5)

5 5 5

RESULT:
Non-parametric Locally Weighted Regression algorithm is implemented successfully.

29
EXP NO:5
DATE:
K MEANS ALGORITHM

AIM:

To implement the optimal centroid locations using K Means Algorithm

ALGORITHM:

1. Randomly select the first centroid from the data points.


2. For each data point compute its distance from the nearest, previously chosen centroid.
3. Select the next centroid from the data points such that the probability of choosing a point as
centroid is directly proportional to its distance from the nearest, previously chosen centroid.
4. Repeat steps 2 and 3 until k centroids have been sampled

PROGRAM:

import sys
import matplotlib
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
x=[4,5,10,4,3,11,14,6,10,12]
y=[21,19,24,17,16,25,24,22,21,21]
data = list(zip(x, y))
inertias = []
for i in range(1,11):
kmean=KMeans(n_clusters=i)
kmean.fit(data)
inertias.append(kmean.inertia_)
plt.plot(range(1,11),inertias,marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.show()
30
OUTPUT:

S.A ENGINEERING COLLEGE, CHENNAI-77


DEPARTMENT OF AI&DS
LAB EXPT EVALUATION SCHEME
Aim & Source Recor Viva-
Total
Algorithm Code d Voce Signature
(20)
(5) (5) (5) (5)

5 5 5

RESULT:
Thus the Program for K Means Algorithms has been executed successfully
.

31
EXP NO:6
DATE:
CHARACTER RECOGNITION

AIM :
Implement character recognition using Multilayer Perceptron

ALGORITHM

1. Import the python library and label encoder

2. Load the data using the read_csv function

3. Display the data .

4. Convert all the string labels into numbers.

5. Split the dataset into training and testing dataset.

6. Visualizing the learnt weights of the input layer

7. Calculate the accuracy score.

PROGRAM:
import numpy as np
import pandas as pd
# Load data
data=pd.read_csv('HR_comma_sep.csv')
data.head()
# Import LabelEncoder
from sklearn import preprocessing

# Creating labelEncoder
le = preprocessing.LabelEncoder()

# Converting string labels into numbers.


data['salary']=le.fit_transform(data['salary'])
data['Departments']=le.fit_transform(data['Departments'])

32
# Spliting data into Feature and
X=data[['satisfaction_level','last_evaluation','number_project','average_montly_hours','time_spend
_company','Work_accident','promotion_last_5years','Departments','salary']]
y=data['left']

# Import train_test_split function


from sklearn.model_selection import train_test_split

# Split dataset into training set and test set


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=45) # 70%
training and 30% test
# Import MLPClassifer
from sklearn.neural_network import MLPClassifier

# Create model object


clf = MLPClassifier(hidden_layer_sizes=(6,5),
random_state=5,
verbose=True,
learning_rate_init=0.01)

# Fit data onto the model


clf.fit(X_train,y_train)
# Make prediction on test dataset
ypred=clf.predict(X_test)

# Import accuracy score


from sklearn.metrics import accuracy_score

# Calcuate accuracy
accuracy_score(y_test,ypred)

33
INPUT :(HR_comma_sep.csv)

satisfaction_level,last_evaluation,number_project,average_montly_hours,
time_spend_company,Work_accident,left,promotion_last_5years,Departments, salary
0.38,0.53,2,157,3,0,1,0,sales,low
0.80,0.86,5,262,6,0,1,0,sales,medium
0.11,0.88,7,272,4,0,1,0, sales,medium
0.72,0.87,5,223,5,0,1,0, sales,low
0.37,0.52,2,159,3,0,1,0,sales,low

34
OUTPUT:
C:\Users\Administrator\PycharmProjects\veena\venv\Scripts\python.exe C:\Users\Administrator\
PycharmProjects\veena\percept.py
Iteration 1, loss = 0.00456078
Iteration 2, loss = 0.00055361
Iteration 3, loss = 0.00029325
Iteration 4, loss = 0.00025340
Iteration 5, loss = 0.00024135
Iteration 6, loss = 0.00023436
Iteration 7, loss = 0.00022864
Iteration 8, loss = 0.00022336
Iteration 9, loss = 0.00021831
Iteration 10, loss = 0.00021343
Iteration 11, loss = 0.00020870
Iteration 12, loss = 0.00020411
Iteration 13, loss = 0.00019966
Iteration 14, loss = 0.00019535
Training loss did not improve more than tol=0.000100 for 10 consecutive epochs. Stopping.

Process finished with exit code 0

S.A ENGINEERING COLLEGE, CHENNAI-77


DEPARTMENT OF AI&DS
LAB EXPT EVALUATION SCHEME
Aim & Source Recor Viva-
Total
Algorithm Code d Voce Signature
(20)
(5) (5) (5) (5)

5 5 5

RESULT:

Thus the Character recognition is performed successfully using ML

35
EXP NO:7
DATE:
DIMENSIONALITY REDUCTION TECHNIQUES

AIM:
Construct a Bayesian network considering medical data. Use this model to demonstrate the
diagnosis of heart patients using a standard Heart Disease Data Set.

ALGORITHM:
1. Import the python library and label encoder

2. Load the data using the read_csv function through input data (penguins.csv)

3. Display the data .

• species: species of penguin


• island: places in island
• bill length: in mm
• bill depth: in mm
• flipp length : in mm
• body mass : in mm
• sex: male/female
4. Convert all the string labels into numbers.
5. Split the dataset into training and testing dataset.
6. Reduce the dimensions of data set and create the diagram on principle component analysis.
7. Calculate the accuracy score.

PROGRAM:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.decomposition import PCA, KernelPCA, FastICA, NMF, FactorAnalysis
36
penguins = sns.load_dataset("penguins")
penguins = (penguins.dropna())
penguins.head()
data = (penguins.select_dtypes(np.number))
data.head()
random_state = 0
pca_pl = make_pipeline(StandardScaler(),PCA(n_components=2,random_state=random_state))
pcs = pca_pl.fit_transform(data)
pcs[0:5,:]
pcs_df = pd.DataFrame(data = pcs ,columns = ['PC1', 'PC2'])
pcs_df['Species'] = penguins.species.values
pcs_df['Sex'] = penguins.sex.values
pcs_df.head()
plt.figure(figsize=(12,10))
with sns.plotting_context("talk",font_scale=1.25):
sns.scatterplot(x="PC1", y="PC2",data=pcs_df,hue="Species",style="Sex",s=100)
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.title("PCA", size=24)
plt.savefig("PCA_Example_in_Python.png",format='png',dpi=75)
plt.show()

37
INPUT:

38
OUTPUT:

S.A ENGINEERING COLLEGE, CHENNAI-77


DEPARTMENT OF AI&DS
LAB EXPT EVALUATION SCHEME
Aim & Source Recor Viva-
Total
Algorithm Code d Voce Signature
(20)
(5) (5) (5) (5)

5 5 5

RESULT:
Thus the Dimensionality reduction for image is successfully performed

39
EXP NO:8
DATE:
BAYESIAN NETWORK

AIM:

Construct a Bayesian network considering medical data. Use this model to demonstrate the
diagnosis of heart patients using a standard Heart Disease Data Set.

ALGORITHM:
1. age: age in years
2. sex: sex (1 = male; 0 = female)
3. cp: chest pain type
1. Value 1: typical angina
2. Value 2: atypical angina
3. Value 3: non-anginal pain
4. Value 4: asymptomatic
4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
5. chol: serum cholestoral in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results
5. Value 0: normal
6. Value 1: having ST-T wave abnormality (T wave inversions
and/or STelevation or depression of > 0.05 mV)
7. Value 2: showing probable or definite left ventricular hypertrophy
by Estes’criteria
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak = ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment
8. Value 1: upsloping
9. Value 2: flat
10. Value 3: downsloping
12. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
13. Heartdisease: It is integer valued from 0 (no presence) to 4.

40
PROGRAM:
import numpy as np

import csv

import pandas as pd

from pgmpy.models import BayesianModel

from pgmpy.estimators import MaximumLikelihoodEstimator


from pgmpy.inference import VariableElimination

#read Cleveland Heart Disease data

heartDisease = pd.read_csv('heart.csv')

heartDisease = heartDisease.replace('?',np.nan)

#display the data

print('Few examples from the dataset are given below')

print(heartDisease.head())

#Model Bayesian Network

Model=BayesianModel([('age','trestbps'),('age','fbs'),

('sex','trestbps'),('exang','trestbps'),('trestbps','heartdise

ase'),('fbs','heartdisease'),('heartdisease','restecg'),

('heartdisease','thalach'),('heartdisease','chol')])

#Learning CPDs using Maximum Likelihood Estimators

print('\n Learning CPD using Maximum likelihood estimators')

model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)

# Inferencing with Bayesian Network

print('\n Inferencing with Bayesian Network:')

HeartDisease_infer = VariableElimination(model)

#computing the Probability of HeartDisease given Age

print('\n 1. Probability of HeartDisease given Age=30')

q=HeartDisease_infer.query(variables=['heartdisease'],evidence

41
={'age':28})

print(q['heartdisease'])

#computing the Probability of HeartDisease given cholesterol

print('\n 2. Probability of HeartDisease given cholesterol=100')

q=HeartDisease_infer.query(variables=['heartdisease'],evidence ={'chol':100})

print(q['heartdisease'])

OUTPUT:

Few examples from the dataset are given below

age sex cp trestbps ...slope ca thal heartdisease

0 63 1 1 145 ... 3 0 6 0

1 67 1 4 160 ... 2 3 3 2

2 67 1 4 120 ... 2 2 7 1

3 37 1 3 130 ... 3 0 3 0

4 41 0 2 130 ... 1 0 3 0

[5 rows x 14 columns]

Learning CPD using Maximum likelihood estimators

Inferencing with Bayesian Network:

1. Probability of HeartDisease given Age=28

╒════════════════╤═════════════════════╕

│ heartdisease │ phi(heartdisease) │

╞════════════════╪═════════════════════╡

│ heartdisease_0 │ 0.6791 │

├────────────────┼─────────────────────┤

│ heartdisease_1 │ 0.1212 │

42
├────────────────┼─────────────────────┤

│ heartdisease_2 │ 0.0810 │

├────────────────┼─────────────────────┤

│ heartdisease_3 │ 0.0939 │
├────────────────┼─────────────────────┤

│ heartdisease_4 │ 0.0247 │

╘════════════════╧═════════════════════╛

2. Probability of HeartDisease given cholesterol=100

╒════════════════╤═════════════════════╕

│ heartdisease │ phi(heartdisease) │

╞════════════════╪═════════════════════╡

│ heartdisease_0 │ 0.5400 │

├────────────────┼─────────────────────┤

│ heartdisease_1 │ 0.1533 │

├────────────────┼─────────────────────┤

│ heartdisease_2 │ 0.1303 │

├────────────────┼─────────────────────┤

│ heartdisease_3 │ 0.1259 │

├────────────────┼─────────────────────┤

│ heartdisease_4 │ 0.0506 │

S.A ENGINEERING COLLEGE, CHENNAI-77


DEPARTMENT OF AI&DS
LAB EXPT EVALUATION SCHEME
Aim & Source Recor Viva-
Total
Algorithm Code d Voce Signature
(20)
(5) (5) (5) (5)

5 5 5

RESULT:
Thus Bayesian Network on Medical Data(Heart Disease Dataset) is successfully Implemented.
43
44
EXP NO:9
DATE:
DATA PREPROCESSING

AIM :

Using Weka Tool Perform a. Data preprocessing by selecting or filtering attributes b. Data
preprocessing for handling missing value

ALGORITHM

1. To demonstrate the preprocessing, we will use the Weather database that is

provided in the installation.

 Open file ... option under the Preprocess tag select the weather-nominal.arff file.

 The weather database contains five fields - outlook, temperature, humidity, windy and play.

 We select an attribute from this list by clicking on it, further details on the attribute

itself are displayed on the right hand side.

2. In the Selected Attribute subwindow, you can observe the following −

 The name and the type of the attribute are displayed.

 The type for the temperature attribute is Nominal.

 The number of Missing values is zero.

 There are three distinct values with no unique value.

 The table underneath this information shows the nominal values for

this field as hot, mild and cold.

 It also shows the count and weight in terms of a percentage for each

nominal value.

3. Removing Attributes

 Many a time, the data that you want to use for model building comes with many

irrelevant fields. For example, the customer database may contain his mobile number

which is relevant in analysing his credit rating.

45
4. Applying Filters

 Some of the machine learning techniques such as association rule mining requires

categorical data. To illustrate the use of filters, we will use weather-numeric.arff

database that contains two numeric attributes - temperature and humidity.

 We will convert these to nominal by applying a filter on our raw data. Click on the

Choose button in the Filter subwindow and select the following filter −

 weka→filters→supervised→attribute→Discretize

 select the best attributes for deciding the play. Select and apply the following filter

 weka→filters→supervised→attribute→AttributeSelection

5. Data preprocessing for handling missing value

6. Predict the Onset of Diabetes

The problem used for this example is the Pima Indians onset of diabetes dataset.

It is a classification problem where each instance represents medical details for one patient

and the task is to predict whether the patient will have an onset of diabetes within the next five

years.

46
OUTPUT:

47
48
49
50
51
S.A ENGINEERING COLLEGE, CHENNAI-77
DEPARTMENT OF AI&DS

LAB EXPT EVALUATION SCHEME


Aim & Source Recor Viva-
Total
Algorithm Code d Voce Signature
(20)
(5) (5) (5) (5)

5 5 5

RESULT:
Data preprocessing by selecting or filtering attributes, Data preprocessing for handling missing
value operations were performed using Weka tool.

EXP NO:10
DATE:
RANDOM FOREST ALGORITHM

AIM :

Construct a tree using the random forest algorithm by applying the ensemble technique
for the given data set.

ALGORITHM

o Data Pre-processing step


o Fitting the Random forest algorithm to the Training set
o Predicting the test result
52
o Test accuracy of the result (Creation of Confusion matrix)
o Visualizing the test set result.

PROGRAM:

1.Data Pre-Processing Step:

Below is the code for the pre-processing step:

# importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd

#importing datasets
data_set= pd.read_csv('user_data.csv')

#Extracting Independent and dependent Variable


x= data_set.iloc[:, [2,3]].values
y= data_set.iloc[:, 4].values

# Splitting the dataset into training and test set.


from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)

#feature Scaling
from sklearn.preprocessing import StandardScaler
st_x= StandardScaler()
x_train= st_x.fit_transform(x_train)
x_test= st_x.transform(x_test)

In the above code, we have pre-processed the data. Where we have loaded the dataset, which is given as:

53
2. Fitting the Random Forest algorithm to the training set:

Now we will fit the Random forest algorithm to the training set. To fit it, we will import
the RandomForestClassifier class from the sklearn.ensemble library. The code is given below:

1. #Fitting Decision Tree classifier to the training set


2. from sklearn.ensemble import RandomForestClassifier
3. classifier= RandomForestClassifier(n_estimators= 10, criterion="entropy")
4. classifier.fit(x_train, y_train)

In the above code, the classifier object takes below parameters:

o n_estimators= The required number of trees in the Random Forest. The default value is 10.
We can choose any number but need to take care of the overfitting issue.
o criterion= It is a function to analyze the accuracy of the split. Here we have taken "entropy"
for the information gain.

54
Output:

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',


max_depth=None, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10,
n_jobs=None, oob_score=False, random_state=None,
verbose=0, warm_start=False)

3. Predicting the Test Set result

Since our model is fitted to the training set, so now we can predict the test result. For prediction,
we will create a new prediction vector y_pred. Below is the code for it:

1. #Predicting the test set result


2. y_pred= classifier.predict(x_test)

Output:

The prediction vector is given as:

55
By checking the above prediction vector and test set real vector, we can determine the incorrect
predictions done by the classifier.

4. Creating the Confusion Matrix

Now we will create the confusion matrix to determine the correct and incorrect predictions. Below is the
code for it:

1. #Creating the Confusion matrix


2. from sklearn.metrics import confusion_matrix
3. cm= confusion_matrix(y_test, y_pred)

56
Output:

As we can see in the above matrix, there are 4+4= 8 incorrect predictions and 64+28= 92 correct
predictions.

5. Visualizing the training Set result

Here we will visualize the training set result. To visualize the training set result we will plot a graph for
the Random forest classifier. The classifier will predict yes or No for the users who have either
Purchased or Not purchased the SUV car as we did in Logistic Regression. Below is the code for it:

1. from matplotlib.colors import ListedColormap


2. x_set, y_set = x_train, y_train
3. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() -1, stop = x_set[:, 0].max() +
1, step =0.01),
4. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
5. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x
1.shape),
6. alpha = 0.75, cmap = ListedColormap(('purple','green' )))
7. mtp.xlim(x1.min(), x1.max())
8. mtp.ylim(x2.min(), x2.max())
9. for i, j in enumerate(nm.unique(y_set)):
10. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
11. c = ListedColormap(('purple', 'green'))(i), label = j)

57
12. mtp.title('Random Forest Algorithm (Training set)')
13. mtp.xlabel('Age')
14. mtp.ylabel('Estimated Salary')
15. mtp.legend()
16. mtp.show()

Output:

6. Visualizing the test set result

Now we will visualize the test set result. Below is the code for it:

1. #Visulaizing the test set result


2. from matplotlib.colors import ListedColormap
3. x_set, y_set = x_test, y_test
4. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1
, step =0.01),
5. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
6. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.sh
ape),
7. alpha = 0.75, cmap = ListedColormap(('purple','green' )))
8. mtp.xlim(x1.min(), x1.max())
9. mtp.ylim(x2.min(), x2.max())
10. for i, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('purple', 'green'))(i), label = j)
13. mtp.title('Random Forest Algorithm(Test set)')
14. mtp.xlabel('Age')
58
15. mtp.ylabel('Estimated Salary')
16. mtp.legend()
17. mtp.show()

Output:

S.A ENGINEERING COLLEGE, CHENNAI-77


DEPARTMENT OF AI&DS
LAB EXPT EVALUATION SCHEME
Aim & Source Recor Viva-
Total
Algorithm Code d Voce Signature
(20)
(5) (5) (5) (5)

5 5 5

RESULT:
Thus the random forest algorithm is successfully implemented.

59
----------------------------------------- COMPLETED -----------------------------------------

60

You might also like