0% found this document useful (0 votes)

19 views33 pages

Machine Learning Lab

The document outlines a series of Python programming tasks related to data manipulation, preprocessing, and machine learning using libraries such as pandas, numpy, and scikit-learn. It covers importing/exporting data, handling missing values, implementing PCA for dimensionality reduction, and various supervised learning algorithms including linear and logistic regression. Each week focuses on different aspects of data science, providing code examples and explanations for practical implementation.

Uploaded by

leebux64

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views33 pages

Machine Learning Lab

Uploaded by

leebux64

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

DEPARTMENT OF CSE

Week1:Write a python program to import and export the data using pandas library
1. Manual Function

def load_csv(filepath):
data = []
col = []
checkcol = False
with open(filepath) as f:
for val in f.readlines():
val = val.replace("\n","")
val = val.split(',')
if checkcol is False:
col = val
checkcol = True
else:
data.append(val)
df = pd.DataFrame(data=data, columns=col)
return df
2. Numpy.loadtxt function
df = np.loadtxt('convertcsv.csv', delimeter = ',')
print(df[:5,:])
3. Numpy.genfromtxt()
data = np.genfromtxt('100 Sales Records.csv', delimiter=',')
>>> pd.DataFrame(data)
4. Pandas.read_csv()
>>> pdDf = pd.read_csv('100 Sales Record.csv')
>>> pdDf.head()
5. Pickle
with open('test.pkl','wb') as f:
pickle.dump(pdDf, f)

Machine learning lab Page 1

DEPARTMENT OF CSE

WEEK-2: Data preprocessing

1. Handling missing values
• isnull()
• notnull()
• dropna()
• fillna()
• replace()
• interpolate()

# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}

# creating a dataframe from list

df = pd.DataFrame(dict)

# using isnull() function

df.isnull()

# importing pandas package

import pandas as pd

# making data frame from csv file

data = pd.read_csv("employees.csv")

Machine learning lab Page 2

DEPARTMENT OF CSE

# creating bool series True for NaN values

bool_series = pd.isnull(data["Gender"])

# filtering data
# displaying data only with Gender = NaN
data[bool_series]
# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}

# creating a dataframe using dictionary

df = pd.DataFrame(dict)

# using notnull() function

df.notnull()
# importing pandas package
import pandas as pd

# making data frame from csv file

data = pd.read_csv("employees.csv")

# creating bool series True for NaN values

bool_series = pd.notnull(data["Gender"])

Machine learning lab Page 3

DEPARTMENT OF CSE

# filtering data
# displaying data only with Gender = Not NaN
data[bool_series]

# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}

# creating a dataframe from dictionary

df = pd.DataFrame(dict)

# filling missing value using fillna()

df.fillna(0)
# importing pandas as pd

import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}

Machine learning lab Page 4

DEPARTMENT OF CSE

# creating a dataframe from dictionary

df = pd.DataFrame(dict)

# filling a missing value with

# previous ones
df.fillna(method ='pad')
# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}

# creating a dataframe from dictionary

df = pd.DataFrame(dict)

# filling null value using fillna() function

df.fillna(method ='bfill')

Machine learning lab Page 5

DEPARTMENT OF CSE

WEEK-3: Dimensionality Reduction

1. Implementation of PCA
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
#import the breast _cancer dataset
from sklearn.datasets import load_breast_cancer
data=load_breast_cancer()
data.keys()

# Check the output classes

print(data['target_names'])

# Check the input attributes

print(data['feature_names'])
# construct a dataframe using pandas
df1=pd.DataFrame(data['data'],columns=data['feature_names'])

# Scale data before applying PCA

scaling=StandardScaler()

# Use fit and transform method

scaling.fit(df1)
Scaled_data=scaling.transform(df1)
# Set the n_components=3
principal=PCA(n_components=3)
principal.fit(Scaled_data)
x=principal.transform(Scaled_data)

Machine learning lab Page 6

DEPARTMENT OF CSE

# Check the dimensions of data after PCA

print(x.shape)
# Check the values of eigen vectors
# prodeced by principal components
principal.components_
plt.figure(figsize=(10,10))
plt.scatter(x[:,0],x[:,1],c=data['target'],cmap='plasma')
plt.xlabel('pc1')
plt.ylabel('pc2')
# import relevant libraries for 3d graph
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(10,10))

# choose projection 3d for creating a 3d graph

axis = fig.add_subplot(111, projection='3d')

# x[:,0]is pc1,x[:,1] is pc2 while x[:,2] is pc3

axis.scatter(x[:,0],x[:,1],x[:,2], c=data['target'],cmap='plasma')
axis.set_xlabel("PC1", fontsize=10)
axis.set_ylabel("PC2", fontsize=10)
axis.set_zlabel("PC3", fontsize=10)

Machine learning lab Page 7

DEPARTMENT OF CSE

WEEK-4: Write a python program to demonstrate various data visualisation

# importing pandas package
import pandas as pd

# making data frame from csv file

data = pd.read_csv("employees.csv")
# Printing the first 10 to 24 rows of
# the data frame for visualization
data[10:25]

# importing pandas package

import pandas as pd
# making data frame from csv file
data = pd.read_csv("employees.csv")

# Printing the first 10 to 24 rows of

# the data frame for visualization
data[10:25]

# importing pandas package

import pandas as pd

# making data frame from csv file

data = pd.read_csv("employees.csv")

# will replace Nan value in dataframe with value -99

data.replace(to_replace = np.nan, value = -99)

# importing pandas as pd
import pandas as pd

# Creating the dataframe

Machine learning lab Page 8

DEPARTMENT OF CSE
df = pd.DataFrame({"A":[12, 4, 5, None, 1],
"B":[None, 2, 54, 3, None],
"C":[20, 16, None, 3, 8],
"D":[14, 3, None, None, 6]})

# Print the dataframe

# importing the required module

import matplotlib.pyplot as plt

# x axis values
x = [1,2,3]
# corresponding y axis values
y = [2,4,1]

# plotting the points

plt.plot(x, y)

# naming the x axis

plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')

# giving a title to my graph

plt.title('My first graph!')

# function to show the plot

plt.show()

Machine learning lab Page 9

DEPARTMENT OF CSE

return probabilities

def predict(info, test):

probabilities = calculateClassProbabilities(info, test)
bestLabel, bestProb = None, -1
for classValue, probability in probabilities.items():
if bestLabel is None or probability > bestProb:
bestProb = probability
bestLabel = classValue
return bestLabel

def getPredictions(info, test):

predictions = []
for i in range(len(test)):
result = predict(info, test[i])
predictions.append(result)
return predictions

def accuracy_rate(test, predictions):

correct = 0
for i in range(len(test)):
if test[i][-1] == predictions[i]:
correct += 1
return (correct / float(len(test))) * 100.0

filename = r'E:\user\MACHINE LEARNING\machine learning algos\Naive bayes\filedata.csv'

mydata = csv.reader(open(filename, "rt"))
mydata = list(mydata)
mydata = encode_class(mydata)
for i in range(len(mydata)):
mydata[i] = [float(x) for x in mydata[i]]
ratio = 0.7

Machine learning lab Page 10

DEPARTMENT OF CSE

train_data, test_data = splitting(mydata, ratio)

print('Total number of examples are: ', len(mydata))
print('Out of these, training examples are: ', len(train_data))
print("Test examples are: ", len(test_data))
info = MeanAndStdDevForClass(train_data)
predictions = getPredictions(info, test_data)
accuracy = accuracy_rate(test_data, predictions)
print("Accuracy of your model is: ", accuracy)
1. Implementation of SVM Classification

# importing scikit learn with make_blobs

from sklearn.datasets.samples_generator import make_blobs
# creating datasets X containing n_samples
# Y containing two classes
X, Y = make_blobs(n_samples=500, centers=2,random_state=0, cluster_std=0.40)
import matplotlib.pyplot as plt
# plotting scatters
plt.scatter(X[:, 0], X[:, 1], c=Y, s=50, cmap='spring');
plt.show()
# creating linspace between -1 to 3.5
xfit = np.linspace(-1, 3.5)
# plotting scatter
plt.scatter(X[:, 0], X[:, 1], c=Y, s=50, cmap='spring')
# plot a line between the different sets of data
for m, b, d in [(1, 0.65, 0.33), (0.5, 1.6, 0.55), (-0.2, 2.9, 0.2)]:
yfit = m * xfit + b
plt.plot(xfit, yfit, '-k')
plt.fill_between(xfit, yfit - d, yfit + d, edgecolor='none',
color='#AAAAAA', alpha=0.4)
plt.xlim(-1, 3.5);
plt.show()

Machine learning lab Page 11

DEPARTMENT OF CSE

# importing required libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
x = pd.read_csv("C:\...\cancer.csv")
a = np.array(x)
y = a[:,30] # classes having 0 and 1
x = np.column_stack((x.malignant,x.benign))
x.shape
print (x),(y)

Machine learning lab Page 12

DEPARTMENT OF CSE

WEEK-5: Supervised Learning

1. Implementation of Linear Regression
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
n = np.size(x)
m_x = np.mean(x)
m_y = np.mean(y)
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",marker = "o", s = 30)
y_pred = b[0] + b[1]*x
plt.plot(x, y_pred, color = "g")
plt.xlabel('x')
plt.ylabel('y')
plt.show()
def main():
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {}\\nb_1 = {}".format(b[0], b[1]))
plot_regression_line(x, y, b)
if name == " main ":

Machine learning lab Page 13

DEPARTMENT OF CSE

WEEK-6 : Implementation of Logistic regression

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings( "ignore" )
class LogitRegression() :
def init ( self, learning_rate, iterations ) :
self.learning_rate = learning_rate
self.iterations = iterations
def fit( self, X, Y ) :
self.m, self.n = X.shape
self.W = np.zeros( self.n )
self.b = 0
self.X = X
self.Y = Y
for i in range( self.iterations ) :
self.update_weights()
return self
def update_weights( self ) :
A = 1 / ( 1 + np.exp( - ( self.X.dot( self.W ) + self.b ) ) )
tmp = ( A - self.Y.T )
tmp = np.reshape( tmp, self.m )
dW = np.dot( self.X.T, tmp ) / self.m
db = np.sum( tmp ) / self.m
self.W = self.W - self.learning_rate * dW
self.b = self.b - self.learning_rate * db
return self
def predict( self, X ) :
Z = 1 / ( 1 + np.exp( - ( X.dot( self.W ) + self.b ) ) )

Machine learning lab Page 14

DEPARTMENT OF CSE

Y = np.where( Z > 0.5, 1, 0 )

return Y
def main() :
df = pd.read_csv( "diabetes.csv" )
X = df.iloc[:,:-1].values
Y = df.iloc[:,-1:].values
X_train, X_test, Y_train, Y_test = train_test_split(
X, Y, test_size = 1/3, random_state = 0 )
model = LogitRegression( learning_rate = 0.01, iterations = 1000 )
model.fit( X_train, Y_train )
model1 = LogisticRegression()
model1.fit( X_train, Y_train)
Y_pred = model.predict( X_test )
Y_pred1 = model1.predict( X_test )
correctly_classified = 0
correctly_classified1 = 0
count = 0
for count in range( np.size( Y_pred ) ) :
if Y_test[count] == Y_pred[count] :
correctly_classified = correctly_classified + 1
if Y_test[count] == Y_pred1[count] :
correctly_classified1 = correctly_classified1 + 1
count = count + 1
print( "Accuracy on test set by our model : ", (
correctly_classified / count ) * 100 )
print( "Accuracy on test set by sklearn model : ", (
correctly_classified1 / count ) * 100 )
if name == " main " :
main()
# importing pandas package
import pandas as pd
Machine learning lab Page 15
DEPARTMENT OF CSE

# making data frame from csv file

data = pd.read_csv("employees.csv")
# Printing the first 10 to 24 rows of
# the data frame for visualization
data[10:25]

Machine learning lab Page 16

DEPARTMENT OF CSE

WEEK-7: Supervised Learning

1. Implementation of Decision tree classification
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
def importdata():
balance_data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-
'+'databases/balance-scale/balance-scale.data',sep= ',', header = None)
print ("Dataset Length: ", len(balance_data))
print ("Dataset Shape: ", balance_data.shape)
print ("Dataset: ",balance_data.head())
return balance_data
def splitdataset(balance_data):
X = balance_data.values[:, 1:5]
Y = balance_data.values[:, 0]
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size = 0.3, random_state = 100)
return X, Y, X_train, X_test, y_train, y_test
def train_using_gini(X_train, X_test, y_train):
clf_gini = DecisionTreeClassifier(criterion = "gini",random_state = 100,max_depth=3,
min_samples_leaf=5)
clf_gini.fit(X_train, y_train)
return clf_gini
def tarin_using_entropy(X_train, X_test, y_train):
clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100,max_depth = 3,
min_samples_leaf = 5)
clf_entropy.fit(X_train, y_train)

Machine learning lab Page 17

DEPARTMENT OF CSE

return clf_entropy
def prediction(X_test, clf_object):
y_pred = clf_object.predict(X_test)
print("Predicted values:")
print(y_pred)
return y_pred
def cal_accuracy(y_test, y_pred):
print("Confusion Matrix: ",confusion_matrix(y_test, y_pred))print ("Accuracy :
",accuracy_score(y_test,y_pred)*100)
print("Report : ",
classification_report(y_test, y_pred))
def main():
data = importdata()
X, Y, X_train, X_test, y_train, y_test = splitdataset(data)
clf_gini = train_using_gini(X_train, X_test, y_train)
clf_entropy = tarin_using_entropy(X_train, X_test, y_train)
print("Results Using Gini Index:")
y_pred_gini = prediction(X_test, clf_gini)
cal_accuracy(y_test, y_pred_gini)
print("Results Using Entropy:")
y_pred_entropy = prediction(X_test, clf_entropy)
cal_accuracy(y_test, y_pred_entropy)
if name ==" main ":
main()
1. Implementation of K-nearest Neighbor
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import numpy as np
import matplotlib.pyplot as plt

y = irisData.target

Machine learning lab Page 18

DEPARTMENT OF CSE

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=42)neighbors

= np.arange(1, 9)
train_accuracy = np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))
for i, k in enumerate(neighbors):
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
train_accuracy[i] = knn.score(X_train, y_train)
test_accuracy[i] = knn.score(X_test, y_test)
plt.plot(neighbors, test_accuracy, label = 'Testing dataset Accuracy')
plt.plot(neighbors, train_accuracy, label = 'Training dataset Accuracy')
plt.legend()
plt.xlabel('n_neighbors')
plt.ylabel('Accuracy')
plt.show()

Machine learning lab Page 19

DEPARTMENT OF CSE

WEEK-8

Implementation of Naïve Bayes classifier algorithm

import math
import random
import csv
def encode_class(mydata):
classes = []
for i in range(len(mydata)):
if mydata[i][-1] not in classes:
classes.append(mydata[i][-1])
for i in range(len(classes)):
for j in range(len(mydata)):
if mydata[j][-1] == classes[i]:
mydata[j][-1] = i
return mydata
def splitting(mydata, ratio):
train_num = int(len(mydata) * ratio)
train = []
test = list(mydata)
while len(train) < train_num:
index = random.randrange(len(test))
train.append(test.pop(index))
return train, test
def groupUnderClass(mydata):
dict = {}
for i in range(len(mydata)):
if (mydata[i][-1] not in dict):
dict[mydata[i][-1]] = []
dict[mydata[i][-1]].append(mydata[i])
return dict

Machine learning lab Page 20

DEPARTMENT OF CSE

return sum(numbers) / float(len(numbers))

def std_dev(numbers):
avg = mean(numbers)
variance = sum([pow(x - avg, 2) for x in numbers]) / float(len(numbers) - 1)
return math.sqrt(variance)

def MeanAndStdDev(mydata):
info = [(mean(attribute), std_dev(attribute)) for attribute in zip(*mydata)]
del info[-1]
return info

def MeanAndStdDevForClass(mydata):
info = {}
dict = groupUnderClass(mydata)
for classValue, instances in dict.items():
info[classValue] = MeanAndStdDev(instances)
return info

def calculateGaussianProbability(x, mean, stdev):

expo = math.exp(-(math.pow(x - mean, 2) / (2 * math.pow(stdev, 2))))
return (1 / (math.sqrt(2 * math.pi) * stdev)) * expo
def calculateClassProbabilities(info, test):
probabilities = {}

for classValue, classSummaries in info.items():

probabilities[classValue] = 1
for i in range(len(classSummaries)):
mean, std_dev = classSummaries[i]
x = test[i]
probabilities[classValue] *= calculateGaussianProbability(x, mean, std_dev)

Machine learning lab Page 21

DEPARTMENT OF CSE

Week-9: Implementation of K-nearest Neighbor

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import numpy as np
import matplotlib.pyplot as plt

y = irisData.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=42)
neighbors = np.arange(1, 9)
train_accuracy = np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))
for i, k in enumerate(neighbors):
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
train_accuracy[i] = knn.score(X_train, y_train)
test_accuracy[i] = knn.score(X_test, y_test)
plt.plot(neighbors, test_accuracy, label = 'Testing dataset Accuracy')
plt.plot(neighbors, train_accuracy, label = 'Training dataset Accuracy')
plt.legend()
plt.xlabel('n_neighbors')
plt.ylabel('Accuracy')
plt.show()

Machine learning lab Page 22

DEPARTMENT OF CSE

WEEK-10: Build Artificial Neural Network model with back propagation

Let’s first understand the term neural networks. In a neural network, where neurons are
fed inputs which then neurons consider the weighted sum over them and pass it by an
activation function and passes out the output to next neuron.

Python: To run our script

Pip: Necessary to install Python packages
pip install tensorflow
pip install keras
# Importing libraries
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence# Our dictionary will contain
only of the top 7000 words appearing most frequently
top_words = 7000# Now we split our data-set into training and test data
(X_train, y_train), (X_test, y_test) =
imdb.load_data(num_words=top_words)# Looking at the nature of training
data
print(X_train[0])
print(y_train[0])print('Shape of training data: ')
print(X_train.shape)
print(y_train.shape)print('Shape of test data: ')
print(X_test.shape)
print(y_test.shape)

Machine learning lab Page 23

DEPARTMENT OF CSE

Output :
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36,
256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172,
112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192,
50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16,
43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62,
386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12,
16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28,
77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766,
5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4,
381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 2, 18, 4, 226, 22, 21, 134,
476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65,
16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19,
178, 32]
1
Shape of training data:
(25000,)
(25000,)
Shape of test data:
(25000,)
(25000,)

# Padding the data samples to a maximum review length in words

max_words = 450X_train = sequence.pad_sequences(X_train,
maxlen=max_words)
X_test = sequence.pad_sequences(X_test, maxlen=max_words)# Building the
CNN Model
model = Sequential() # initilaizing the Sequential nature for CNN
model# Adding the embedding layer which will take in maximum of 450
words as input and provide a 32 dimensional output of those words which

Machine learning lab Page 24

DEPARTMENT OF CSE

belong in the top_words dictionary

model.add(Embedding(top_words, 32, input_length=max_words))
model.add(Conv1D(32, 3, padding='same', activation='relu'))
model.add(MaxPooling1D())
model.add(Flatten())
model.add(Dense(250, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
model.summary()

Machine learning lab Page 25

DEPARTMENT OF CSE

WEEK-11
Implementing Random Forest
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('Salaries.csv')
print(data)
# Fitting Random Forest Regression to the dataset
# import the regressor
from sklearn.ensemble import RandomForestRegressor
# create regressor object
regressor = RandomForestRegressor(n_estimators = 100, random_state = 0)
# fit the regressor with x and y data
regressor.fit(x, y)
Y_pred = regressor.predict(np.array([6.5]).reshape(1, 1)) # test the output by changing values
# Visualising the Random Forest Regression results
# arrange for creating a range of values
# from min value of x to max
# value of x with a difference of 0.01
# between two consecutive values
X_grid = np.arrange(min(x), max(x), 0.01)
# reshape for reshaping the data into a len(X_grid)*1 array,
# i.e. to make a column out of the X_grid value
X_grid = X_grid.reshape((len(X_grid), 1))
# Scatter plot for original data
plt.scatter(x, y, color = 'blue')
# plot predicted data
plt.plot(X_grid, regressor.predict(X_grid),color = 'green')
plt.title('Random Forest Regression')

Machine learning lab Page 26

DEPARTMENT OF CSE

plt.xlabel('Position level') plt.ylabel('Salary')

WEEK-11(B) : Model Selection, Bagging and Boosting

1. Cross Validation
# This code may not be run on GFG IDE
# as required packages are not found.
# importing cross-validation from sklearn package.from sklearn import cross_validation
# value of K is 10.
data = cross_validation.KFold(len(train_set), n_folds=10, indices=False)
2. Implementing AdaBoost
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
import warnings
warnings.filterwarnings("ignore")
# Reading the dataset from the csv file
# separator is a vertical line, as seen in the dataset
data = pd.read_csv("Iris.csv")

# Printing the shape of the dataset

print(data.shape)
data = data.drop('Id',axis=1)
X = data.iloc[:,:-1]
y = data.iloc[:,-1]
print("Shape of X is %s and shape of y is %s"%(X.shape,y.shape))
total_classes = y.nunique()
print("Number of unique species in dataset are: ",total_classes)
distribution = y.value_counts()
print(distribution)
X_train,X_val,Y_train,Y_val = train_test_split(X,y,test_size=0.25,random_state=28)

Machine learning lab Page 27

DEPARTMENT OF CSE

print("The accuracy of the model on validation set is", adb_model.score(X_val,Y_val))

Machine learning lab Page 28

DEPARTMENT OF CSE

WEEK-12: Unsupervised Learning

Implementing K-means Clustering
def ReadData(fileName):
# Read the file, splitting by lines
f = open(fileName, 'r');
lines = f.read().splitlines();
f.close();
items = [];
for i in range(1, len(lines)):
line = lines[i].split(',');
itemFeatures = [];
for j in range(len(line)-1):
# Convert feature value to float
v = float(line[j]);
# Add feature value to dict
itemFeatures.append(v);
items.append(itemFeatures);
shuffle(items);
return items;

def FindColMinMax(items):n
= len(items[0]);
minima = [sys.maxint for i in range(n)];
maxima = [-sys.maxint -1 for i in range(n)];
for item in items:
for f in range(len(item)):
if (item[f] < minima[f]):
minima[f] = item[f];
if (item[f] > maxima[f]):
maxima[f] = item[f];
return minima,maxima;

Machine learning lab Page 29

DEPARTMENT OF CSE

def InitializeMeans(items, k, cMin, cMax):

# Initialize means to random numbers between
# the min and max of each column/feature
f = len(items[0]); # number of features
means = [[0 for i in range(f)] for j in range(k)];
for mean in means:
for i in range(len(mean)):
# Set value to a random float
# (adding +-1 to avoid a wide placement of a mean)
mean[i] = uniform(cMin[i]+1, cMax[i]-1);
return means;

def EuclideanDistance(x, y):

S = 0; # The sum of the squared differences of the elements
for i in range(len(x)):
S += math.pow(x[i]-y[i], 2)
#The square root of the sum
return math.sqrt(S)

def UpdateMean(n,mean,item):
for i in range(len(mean)):
m = mean[i];
m = (m*(n-1)+item[i])/float(n);
mean[i] = round(m, 3);
return mean;

def Classify(means,item):
# Classify item to the mean with minimum distance
minimum = sys.maxint;
index = -1;
for i in range(len(means)):
# Find distance from item to mean

Machine learning lab Page 30

DEPARTMENT OF CSE

dis = EuclideanDistance(item, means[i]);

if (dis < minimum):
minimum = dis;
index = i;
return index;

def CalculateMeans(k,items,maxIterations=100000):
# Find the minima and maxima for columns
cMin, cMax = FindColMinMax(items);
# Initialize means at random points
means = InitializeMeans(items,k,cMin,cMax);
# Initialize clusters, the array to hold
# the number of items in a class
clusterSizes= [0 for i in range(len(means))];
# An array to hold the cluster an item is in
belongsTo = [0 for i in range(len(items))];
# Calculate means
for e in range(maxIterations):
# If no change of cluster occurs, halt
noChange = True;
for i in range(len(items)):
item = items[i];
# Classify item into a cluster and update the
# corresponding means.
index = Classify(means,item);
clusterSizes[index] += 1;
cSize = clusterSizes[index];
means[index] = UpdateMean(cSize,means[index],item);
# Item changed cluster
if(index != belongsTo[i]):
noChange = False;
belongsTo[i] = index;
Machine learning lab Page 31
DEPARTMENT OF CSE

# Nothing changed, return

if (noChange):
break;
return means;

def FindClusters(means,items):
clusters = [[] for i in range(len(means))]; # Init clusters
for item in items:
# Classify item into a cluster
index = Classify(means,item);
# Add item to cluster
clusters[index].append(item);
return clusters;

Machine learning lab Page 32

Machine learning lab Page 33

ML Manual
No ratings yet
ML Manual
53 pages
Hi-Target V30 50 GNSS RTK System Manual PDF
100% (2)
Hi-Target V30 50 GNSS RTK System Manual PDF
70 pages
E Commerce Term Paper Topics
100% (1)
E Commerce Term Paper Topics
8 pages
Pandas
No ratings yet
Pandas
21 pages
ML
No ratings yet
ML
21 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
16 pages
Machine Learning Laboratory: Manual
No ratings yet
Machine Learning Laboratory: Manual
52 pages
Free AI Tools To Boost Task Productivity and Work Efficiency
No ratings yet
Free AI Tools To Boost Task Productivity and Work Efficiency
3 pages
Industrial Engineering and Management by Pravin Kumar
100% (10)
Industrial Engineering and Management by Pravin Kumar
673 pages
Kartik MLP 4-9prg
No ratings yet
Kartik MLP 4-9prg
10 pages
Machinelearninglabmanual
No ratings yet
Machinelearninglabmanual
47 pages
Python in Research
No ratings yet
Python in Research
18 pages
Unit - II MLT
No ratings yet
Unit - II MLT
75 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
PR Final File
No ratings yet
PR Final File
49 pages
Fundamentals of Data Science Lab Manual New1
100% (1)
Fundamentals of Data Science Lab Manual New1
32 pages
Numpy, Pandas, Matplotlib Basics
No ratings yet
Numpy, Pandas, Matplotlib Basics
70 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
MACHINE LEARNING Manual
No ratings yet
MACHINE LEARNING Manual
36 pages
ML Contenthalf
No ratings yet
ML Contenthalf
35 pages
ML Record
No ratings yet
ML Record
19 pages
Data Science Bootcamp Insights
No ratings yet
Data Science Bootcamp Insights
161 pages
Ad3461 ML Lab Manual
No ratings yet
Ad3461 ML Lab Manual
48 pages
Class Xii PDF For Practical
No ratings yet
Class Xii PDF For Practical
24 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
PP DWDM 4 5
No ratings yet
PP DWDM 4 5
26 pages
Datasheet ST S5H100
No ratings yet
Datasheet ST S5H100
5 pages
Data Preprocesing JavaPoint
No ratings yet
Data Preprocesing JavaPoint
19 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
No ratings yet
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
29 pages
PR Final File
No ratings yet
PR Final File
70 pages
ML Lab Manual With Statistical Formulas
No ratings yet
ML Lab Manual With Statistical Formulas
9 pages
ML Lab Manual Completed
No ratings yet
ML Lab Manual Completed
56 pages
24CSPC212-PIC Lab Manual
No ratings yet
24CSPC212-PIC Lab Manual
45 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
IML Lab Manual
No ratings yet
IML Lab Manual
31 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
Tanu Raman ML Lab File
No ratings yet
Tanu Raman ML Lab File
21 pages
GSTN Informatin Booklet
No ratings yet
GSTN Informatin Booklet
100 pages
ML RECORD - Merged
No ratings yet
ML RECORD - Merged
33 pages
Machine Learning Lab Guide
No ratings yet
Machine Learning Lab Guide
36 pages
Unit 30 - Assignment 1
100% (1)
Unit 30 - Assignment 1
3 pages
Living Now - Catalogue - 2MOD AD-EXLNW2M22C - GB
No ratings yet
Living Now - Catalogue - 2MOD AD-EXLNW2M22C - GB
132 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
27 pages
ML Lab Manual (Upto Cie-1)
No ratings yet
ML Lab Manual (Upto Cie-1)
33 pages
ML3,4
No ratings yet
ML3,4
11 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
ML Algorithms for Data Scientists
100% (1)
ML Algorithms for Data Scientists
148 pages
Machine File
No ratings yet
Machine File
27 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
Deep Learning
No ratings yet
Deep Learning
25 pages
Data Mining with Python Lab Guide
No ratings yet
Data Mining with Python Lab Guide
39 pages
The Pixar Story
No ratings yet
The Pixar Story
5 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
34 pages
Course Material (Lecture Notes) : Sri Vidya College of Engineering & Technology, Virudhunagar
No ratings yet
Course Material (Lecture Notes) : Sri Vidya College of Engineering & Technology, Virudhunagar
17 pages
Jashan ML
No ratings yet
Jashan ML
20 pages
ML Lab 04 Manual - Pandas and MatplotLib
No ratings yet
ML Lab 04 Manual - Pandas and MatplotLib
7 pages
ML Aml Cse It Lab Manual Final
No ratings yet
ML Aml Cse It Lab Manual Final
22 pages
Target Hardware Debugging Boundary Scan
No ratings yet
Target Hardware Debugging Boundary Scan
13 pages
Cranes&Hoists For Mining Industry
No ratings yet
Cranes&Hoists For Mining Industry
2 pages
ML (Prac1)
No ratings yet
ML (Prac1)
12 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
MP - ECE - UNIT-2 8086 and Interfacing
No ratings yet
MP - ECE - UNIT-2 8086 and Interfacing
60 pages
Hailey College of Commerce Punjab University, Lahore: Assignment: A.I.S (Oracle) Submited To
No ratings yet
Hailey College of Commerce Punjab University, Lahore: Assignment: A.I.S (Oracle) Submited To
6 pages
EEE363.2 Mid SU2020 Pineapple
No ratings yet
EEE363.2 Mid SU2020 Pineapple
2 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
Data Pre Processing
No ratings yet
Data Pre Processing
2 pages
How To Prepare Your Dataset For Machine Learning in Python
No ratings yet
How To Prepare Your Dataset For Machine Learning in Python
14 pages
TJR TUJR WF4 Manual 01 25 15
No ratings yet
TJR TUJR WF4 Manual 01 25 15
62 pages
B22EE010 Report
No ratings yet
B22EE010 Report
9 pages
Error 5
No ratings yet
Error 5
31 pages
Handle Missing Data in Real-Time
No ratings yet
Handle Missing Data in Real-Time
5 pages
Log
No ratings yet
Log
44 pages
Research Paper
No ratings yet
Research Paper
5 pages
CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
MySQL JOIN Types Explained
No ratings yet
MySQL JOIN Types Explained
1 page
Example Network Diagram: Msa Bts1 Bsc1 Msc/Vlr1 Air Interface/Lapdm Abis Interface/Lapd A Interface Map - E Interface
No ratings yet
Example Network Diagram: Msa Bts1 Bsc1 Msc/Vlr1 Air Interface/Lapdm Abis Interface/Lapd A Interface Map - E Interface
40 pages
Lab 08 - Data Preprocessing
No ratings yet
Lab 08 - Data Preprocessing
9 pages
Diagnose IIS Performance Problems Using Windows Performance Monitor
No ratings yet
Diagnose IIS Performance Problems Using Windows Performance Monitor
2 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
Philips Pet716 Service Manual
No ratings yet
Philips Pet716 Service Manual
31 pages
Confirmatory Composite Analysis Guide
No ratings yet
Confirmatory Composite Analysis Guide
10 pages
Perbandingan Biaya Jaringan Dan Kelayakan Teknologi LTE Pada Frekuensi 900 MHZ, 1800 MHZ, 2100 MHZ, Dan 2300 MHZ Untuk Mendukung Rencana Pita Lebar Di Indonesia
No ratings yet
Perbandingan Biaya Jaringan Dan Kelayakan Teknologi LTE Pada Frekuensi 900 MHZ, 1800 MHZ, 2100 MHZ, Dan 2300 MHZ Untuk Mendukung Rencana Pita Lebar Di Indonesia
16 pages
Medium Voltage Detector Guide
No ratings yet
Medium Voltage Detector Guide
1 page
Huawei RTN 905e Brochure
No ratings yet
Huawei RTN 905e Brochure
2 pages
mANT30 PDF
No ratings yet
mANT30 PDF
1 page
Lecture 2
No ratings yet
Lecture 2
47 pages
A Spatial Data Structure For Fast Poisson-Disk Sample Generation
No ratings yet
A Spatial Data Structure For Fast Poisson-Disk Sample Generation
6 pages