Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
41 views53 pages

ML Manual

The document is a Machine Learning Lab Manual that includes various programs demonstrating data structuring, preprocessing, feature selection, performance measurement, and classification techniques using algorithms like Naive Bayes, K-Nearest Neighbors, and Bayesian networks. Each section outlines the aim, program code, output, and results of the executed programs. The manual also covers clustering with the EM algorithm and pruning techniques for decision trees.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views53 pages

ML Manual

The document is a Machine Learning Lab Manual that includes various programs demonstrating data structuring, preprocessing, feature selection, performance measurement, and classification techniques using algorithms like Naive Bayes, K-Nearest Neighbors, and Bayesian networks. Each section outlines the aim, program code, output, and results of the executed programs. The manual also covers clustering with the EM algorithm and pruning techniques for decision trees.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 53

MACHINE LEARNING LAB MANUAL

1. Demonstrate how do you structure data in Machine Learning


AIM:
To write a structure data set program using machine learning.
PROGRAM:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import math
card_approval_df=pd.read_csv('C:\ProgramData\Anaconda3\Lib\site-packages\pandas\io\
clean_dataset.csv')
print(card_approval_df.head())
card_approval_df.duplicated().sum()
card_approval_df[['Age','Debt','YearsEmployed','CreditScore','Income']].describe()
sns.countplot(card_approval_df.Gender)
card_approval_df[['Age','Debt','YearsEmployed','CreditScore','Income']].corr
sns.scatterplot(card_approval_df.YearsEmployed,card_approval_df.Income)
plt.ylim(2000)
card_approval_df[card_approval_df.Ethnicity=='Latino']
[['Age','Debt','YearsEmployed','CreditScore','Income']].agg('mean')
sns.pairplot(card_approval_df[['Age','Debt','YearsEmployed','CreditScore','Income']])
sns.pairplot(card_approval_df[['Age','Debt','YearsEmployed','CreditScore','Income','Approve
d']],hue='Approved')
OUTPUT:
RESULT:
Thus the program was executed successfully and output was verified.
2. Implement data pre-processing techniques on real time dataset
AIM:
To write a data preprocessing techniques program using machine learning.
PROGRAM:
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
data_set= pd.read_csv(r'C:\ProgramData\Anaconda3\Lib\site-packages\pandas\io\
user_data.csv')
x= data_set.iloc[:, [2,3]].values
y= data_set.iloc[:, 4].values
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)
from sklearn.preprocessing import StandardScaler
st_x= StandardScaler()
x_train= st_x.fit_transform(x_train)
x_test= st_x.transform(x_test)
from sklearn.linear_model import LogisticRegression
classifier= LogisticRegression(random_state=0)
classifier.fit(x_train, y_train)
y_pred= classifier.predict(x_test)
from matplotlib.colors import ListedColormap
x_set, y_set = x_train, y_train
x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1,
step =0.01),
nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(),
x2.ravel()]).T).reshape(x1.shape),
alpha = 0.75, cmap = ListedColormap(('purple','green' )))
mtp.xlim(x1.min(), x1.max())
mtp.ylim(x2.min(), x2.max())
for i, j in enumerate(nm.unique(y_set)):
mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
c = ListedColormap(('purple', 'green'))(i), label = j)
mtp.title('Logistic Regression (Test set)')
mtp.xlabel('Age')
mtp.ylabel('Estimated Salary')
mtp.legend()
OUTPUT:
RESULT:
Thus the program was executed successfully and output was verified.
3. Implement Feature subset selection techniques
AIM:
To write a Feature subset selection program using machine learning.
PROGRAM:
from numpy.random import rand
from numpy.random import seed
from scipy.stats import spearmanr
seed(1)
data1 = rand(1000) * 20
data2 = data1 + (rand(1000) * 10)
coef, p = spearmanr(data1, data2)
print('Spearmans correlation coefficient: %.3f' % coef)
alpha = 0.05
if p > alpha:
print('Samples are uncorrelated (fail to reject H0) p=%.3f' % p)
else:
print('Samples are correlated (reject H0) p=%.3f' % p)
OUTPUT:
Spearmans correlation coefficient: 0.900
Samples are correlated (reject H0) p=0.000
RESULT:
Thus the program was executed successfully and output was verified.
4. Demonstrate how will you measure the performance of a machine learning
model

AIM:
To write a Measure the performance of ML models program using machine learning.
PROGRAM:
from math import sqrt
def rmse_metric(actual, predicted):
sum_error = 0.0
for i in range(len(actual)):
prediction_error = predicted[i] - actual[i]
sum_error += (prediction_error ** 2)
mean_error = sum_error / float(len(actual))
return sqrt(mean_error)
actual = [0.1, 0.2, 0.3, 0.4, 0.5]
predicted = [0.11, 0.19, 0.29, 0.41, 0.5]
rmse = rmse_metric(actual, predicted)
print(“Measuring performance:”, rmse)
OUTPUT:
Measuring performance: 0.00894427190999915
RESULT:
Thus the program was executed successfully and output was verified.
5. Write a program to implement the naïve Bayesian classifier for a sample
training data set. Compute the accuracy of the classifier, considering few test
data sets.

AIM:
To write a Navie Bayesian program using machine learning.
PROGRAM:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')
glass=pd.read_csv("C:\ProgramData\Anaconda3\Lib\site packages\pandas\io\glass.csv")
glass.tail()
glass.shape
glass.isnull().sum()
sns.countplot(['Type'], color='red')
nb = GaussianNB()
sns.countplot(['Type'], color='red')
nb = GaussianNB()
x = glass.drop(columns=['Type'])
y = glass['Type']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=4)
nb.fit(x_train, y_train)
y_pred = nb.predict(x_test)
print(accuracy_score(y_test, y_pred))
OUTPUT:
RESULT:
Thus the program was executed successfully and output was verified.

6. Write a program to construct a Bayesian network considering medical


data. Use this model to demonstrate the diagnosis of heart patients using the
standard Heart Disease Data Set.

AIM:
To write a structure data set machine learning program
PROGRAM:
import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination
heartDisease = pd.read_csv('C:\ProgramData\Anaconda3\Lib\site-packages\pandas\io\
heart_disease.csv')
heartDisease = heartDisease.replace('?',np.nan)
print('Sample instances from the dataset are given below')
print(heartDisease.head())
print('\n Attributes and datatypes')
print(heartDisease.dtypes)
model=BayesianModel([('age','heartdisease'),('sex','heartdisease'),('exang','heartdisease'),
('cp','heartdisease'),('heartdisease','restecg'),('heartdisease','chol')])
print('\nLearning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
print('\n Inferencing with Bayesian Network:')
HeartDiseasetest_infer = VariableElimination(model)
print('\n 1. Probability of HeartDisease given evidence= restecg')
q1=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'restecg':1})
print(q1)
print('\n 2. Probability of HeartDisease given evidence= cp ')
q2=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'cp':2})
print(q2)

OUTPUT:

1. Probability of HeartDisease given evidence= restecg


+-----------------+---------------------+
| heartdisease | phi(heartdisease) |
+=================+=====================+
| heartdisease(0) | 0.1012 |
+-----------------+---------------------+
| heartdisease(1) | 0.0000 |
+-----------------+---------------------+
| heartdisease(2) | 0.2392 |
+-----------------+---------------------+
| heartdisease(3) | 0.2015 |
+-----------------+---------------------+
| heartdisease(4) | 0.4581 |
+-----------------+---------------------+

2. Probability of HeartDisease given evidence= cp


+-----------------+---------------------+
| heartdisease | phi(heartdisease) |
+=================+=====================+
| heartdisease(0) | 0.3610 |
+-----------------+---------------------+
| heartdisease(1) | 0.2159 |
+-----------------+---------------------+
| heartdisease(2) | 0.1373 |
+-----------------+---------------------+
| heartdisease(3) | 0.1537 |
+-----------------+---------------------+
| heartdisease(4) | 0.1321 |
+-----------------+---------------------+
RESULT:
Thus the program was executed successfully and RESULT was verified.

7. Apply EM algorithm to cluster a set of data stored in a .CSV file.

AIM:
To write a EM algorithm using machine learning program.
PROGRAM:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.cluster import KMeans
import sklearn.metrics as sm
import pandas as pd
import numpy as np
iris = datasets.load_iris()
X = pd.DataFrame(iris.data)
X.columns = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y = pd.DataFrame(iris.target)
model = KMeans(n_clusters=3)
model.fit(X)
plt.figure(figsize=(14,7))
plt.subplot(1, 2, 1)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y.Targets], s=40)
plt.title('Real Classification')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
plt.subplot(1, 2, 2)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[model.labels_], s=40)
plt.title('K Mean Classification')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
print('The accuracy score of K-Mean: ',sm.accuracy_score(y, model.labels_))
print('The Confusion matrixof K-Mean: ',sm.confusion_matrix(y, model.labels_))
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=3)
gmm.fit(xs)

plt.subplot(2, 2, 3)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y_gmm], s=40)
plt.title('GMM Classification')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
print('The accuracy score of EM: ',sm.accuracy_score(y, y_gmm))
print('The Confusion matrix of EM: ',sm.confusion_matrix(y, y_gmm))
OUTPUT:
RESULT:
Thus the program was executed successfully and output was verified.

8. Write a program to implement k-Nearest Neighbor algorithm to classify


the data set.

AIM:
To write a K-Nearest Neighbor algorithm using machine learning program.
PROGRAM:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn import datasets
iris=datasets.load_iris()
x = iris.data
y = iris.target
print ('sepal-length', 'sepal-width', 'petal-length', 'petal-width')
print(x)
print('class: 0-Iris-Setosa, 1- Iris-Versicolour, 2- Iris-Virginica')
print(y)
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3)
classifier = KNeighborsClassifier(n_neighbors=5)
classifier = KNeighborsClassifier(n_neighbors=5)
y_pred=classifier.predict(x_test)
print('Confusion Matrix')
print(confusion_matrix(y_test,y_pred))
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))

OUTPUT:

[[5.1 3.5 1.4 0.2]


[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4]
[4.6 3.4 1.4 0.3]
[5. 3.4 1.5 0.2]
[4.4 2.9 1.4 0.2]
[4.9 3.1 1.5 0.1]
[5.4 3.7 1.5 0.2]
[4.8 3.4 1.6 0.2]
[4.8 3. 1.4 0.1]
[4.3 3. 1.1 0.1]
[5.8 4. 1.2 0.2]

class: 0-Iris-Setosa, 1- Iris-Versicolour, 2- Iris-Virginica

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0000000000000111111111111111111111111
1111111111111111111111111122222222222
2222222222222222222222222222222222222
2 2]

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',


metric_params=None, n_jobs=None, n_neighbors=5, p=2,
weights='uniform')

Confusion Matrix
[[17 0 0]
[ 0 15 0]
[ 0 1 12]]

Accuracy Metrics
precision recall f1-score support

0 1.00 1.00 1.00 17


1 0.94 1.00 0.97 15
2 1.00 0.92 0.96 13

accuracy 0.98 45
macro avg 0.98 0.97 0.98 45
weighted avg 0.98 0.98 0.98 45
RESULT:
Thus the program was executed successfully and output was verified.

9. Apply the technique of pruning for a noisy data monk2 data, and derive
the decision tree from this data. Analyze the results by comparing the
structure of pruned and unpruned tree.

AIM:
To write a Pruning for noicy data using machine learning program.
PROGRAM:
import numpy as np
import pandas as pd
import os
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn import tree
from sklearn.metrics import accuracy_score,confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
data = 'C:\ProgramData\Anaconda3\Lib\site-packages\pandas\io\heart.csv'
df = pd.read_csv(data)
df.head()
X = df.drop(columns=['target'])
y = df['target']
print(X.shape)
print(y.shape)
x_train,x_test,y_train,y_test = train_test_split(X,y,stratify=y)
print(x_train.shape)
print(x_test.shape)
clf = tree.DecisionTreeClassifier(random_state=0)
clf.fit(x_train,y_train)
y_train_pred = clf.predict(x_train)
y_test_pred = clf.predict(x_test)
plt.figure(figsize=(20,20))
features = df.columns
classes = ['Not heart disease','heart disease']
tree.plot_tree(clf,feature_names=features,class_names=classes,filled=True)

plt.show()
def plot_confusionmatrix(y_train_pred,y_train,dom):
print(f'{dom} Confusion matrix')
cf = confusion_matrix(y_train_pred,y_train)
sns.heatmap(cf,annot=True,yticklabels=classes
,xticklabels=classes,cmap='Blues', fmt='g')
plt.tight_layout()
plt.show()
print(f'Train score {accuracy_score(y_train_pred,y_train)}')
print(f'Test score {accuracy_score(y_test_pred,y_test)}')
plot_confusionmatrix(y_train_pred,y_train,dom='Train')
plot_confusionmatrix(y_test_pred,y_test,dom='Test')
params = {'max_depth': [2,4,6,8,10,12],
'min_samples_split': [2,3,4],
'min_samples_leaf': [1,2]}
model = gcv.best_estimator_
model.fit(x_train,y_train)
y_train_pred = model.predict(x_train)
y_test_pred = model.predict(x_test)

print(f'Train score {accuracy_score(y_train_pred,y_train)}')


print(f'Test score {accuracy_score(y_test_pred,y_test)}')
plot_confusionmatrix(y_test_pred,y_test,dom='Test')
plt.figure(figsize=(20,20))
features = df.columns
classes = ['Not heart disease','heart disease']
tree.plot_tree(model,feature_names=features,class_names=classes,filled=True)
plt.show()
params = {'max_depth': [2,4,6,8,10,12],
'min_samples_split': [2,3,4],
'min_samples_leaf': [1,2]}
model = gcv.best_estimator_
model.fit(x_train,y_train)
y_train_pred = model.predict(x_train)
y_test_pred = model.predict(x_test)

print(f'Train score {accuracy_score(y_train_pred,y_train)}')


print(f'Test score {accuracy_score(y_test_pred,y_test)}')
plot_confusionmatrix(y_test_pred,y_test,dom='Test')
plt.figure(figsize=(20,20))
features = df.columns
classes = ['Not heart disease','heart disease']
tree.plot_tree(model,feature_names=features,class_names=classes,filled=Tru)
plt.show()
path = clf.cost_complexity_pruning_path(x_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
print(ccp_alphas)
path = clf.cost_complexity_pruning_path(x_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
print(ccp_alphas)
clfs = []
for ccp_alpha in ccp_alphas:
clf = tree.DecisionTreeClassifier(random_state=0, ccp_alpha=ccp_alpha)
clf.fit(x_train, y_train)
clfs.append(clf)
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
plt.scatter(ccp_alphas,node_counts)
plt.scatter(ccp_alphas,depth)
plt.plot(ccp_alphas,node_counts,label='no of nodes',drawstyle="steps-post")
plt.plot(ccp_alphas,depth,label='depth',drawstyle="steps-post")
plt.legend()
plt.show()
clfs = []
for ccp_alpha in ccp_alphas:
clf = tree.DecisionTreeClassifier(random_state=0, ccp_alpha=ccp_alpha)
clf.fit(x_train, y_train)
clfs.append(clf)
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
plt.scatter(ccp_alphas,node_counts)
plt.scatter(ccp_alphas,depth)
plt.plot(ccp_alphas,node_counts,label='no of nodes',drawstyle="steps-post")
plt.plot(ccp_alphas,depth,label='depth',drawstyle="steps-post")
plt.legend()
plt.show()
train_acc = []
test_acc = []
for c in clfs:
y_train_pred = c.predict(x_train)
y_test_pred = c.predict(x_test)
train_acc.append(accuracy_score(y_train_pred,y_train))
test_acc.append(accuracy_score(y_test_pred,y_test))
train_acc = []
test_acc = []
for c in clfs:
y_train_pred = c.predict(x_train)
y_test_pred = c.predict(x_test)
train_acc.append(accuracy_score(y_train_pred,y_train))
test_acc.append(accuracy_score(y_test_pred,y_test))
clf_ = tree.DecisionTreeClassifier(random_state=0,ccp_alpha=0.020)
clf_.fit(x_train,y_train)
y_train_pred = clf_.predict(x_train)
y_test_pred = clf_.predict(x_test)
print(f'Train score {accuracy_score(y_train_pred,y_train)}')
print(f'Test score {accuracy_score(y_test_pred,y_test)}')
plot_confusionmatrix(y_train_pred,y_train,dom='Train')
plot_confusionmatrix(y_test_pred,y_test,dom='Test')
plt.figure(figsize=(20,20))
features = df.columns
classes = ['Not heart disease','heart disease']
tree.plot_tree(clf_,feature_names=features,class_names=classes,filled=True)
plt.show()

OUTPUT:

(303, 13)
(303,)
(227, 13)
RESULT:
Thus the program was executed successfully and output was verified.
10. Build an Artificial Neural Network by implementing the Backpropagation
algorithm and test the same using appropriate data sets
AIM:
To write a back propagation machine learning program.
PROGRAM:
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinally
y = y/100
def sigmoid (x):
return 1/(1 + np.exp(-x))
def derivatives_sigmoid(x):
return x * (1 - x)
epoch=7000
lr=0.1
inputlayer_neurons = 2
hiddenlayer_neurons = 3
output_neurons = 1
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
for i in range(epoch):
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr
bout += np.sum(d_output, axis=0,keepdims=True) *lr
wh += X.T.dot(d_hiddenlayer) *lr
bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)
OUTPUT:

Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.89101267]
[0.88295415]
[0.89603196]]
RESULT:
Thus the program was executed successfully and output was verified.
11. Implement Support Vector Classification for linear kernels.

AIM:
To write a support vector classification using machine learning program.
PROGRAM:
import numpy as np
import pandas as pd
import matplotlib as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn import metrics
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix
df = pd.read_csv("C:\ProgramData\Anaconda3\Lib\site-packages\pandas\io\diabetes.csv")
df.head()
df.notnull().sum()
df.info()
sns.pairplot(df)
plt.pyplot.figure(figsize=(16, 16))
sns.heatmap(df.corr(), annot=True)
sns.boxplot(x='Outcome', y='Glucose', data=df)
X = df.drop(['Outcome'], axis=1)
y = df['Outcome']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=101)
model = svm.SVC(C=2, gamma='scale', kernel='linear')
model.fit(X_train, y_train)
prediction = model.predict(X_test)
print(metrics.accuracy_score(prediction, y_test) * 100)
print(confusion_matrix(y_test, prediction))
print(classification_report(y_test, prediction))
print('MAE:', metrics.mean_absolute_error(y_test, prediction))
MSE: 0.21428571428571427
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, prediction)))

OUTPUT:
78.57142857142857

[[90 13]

[20 31]]

MAE: 0.21428571428571427

MSE: 0.21428571428571427

RMSE: 0.4629100498862757
RESULT:
Thus the program was executed successfully and output was verified.
12. Implement Logistic Regression to classify problems such as spam
detection. Diabetes predictions and so on.

AIM:
To write a logistic regression program using machine learning program.
PROGRAM:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv('C:\ProgramData\Anaconda3\Lib\site-packages\pandas\io\diabetes.csv')
df.head()
df.info()
df.describe()
sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='viridis')
sns.pairplot(df)
plt.subplots(figsize=(20,15))
sns.boxplot(x='Age', y='BMI', data=df)
x = df.drop('Outcome',axis=1)
y = df['Outcome']
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3,random_state=101)
from sklearn.linear_model import LogisticRegression
logmodel = LogisticRegression()
logmodel.fit(x_train,y_train)
x = df.drop('Outcome',axis=1)
y = df['Outcome']
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3,random_state=101)
from sklearn.linear_model import LogisticRegression
logmodel = LogisticRegression()
logmodel.fit(x_train,y_train)
predictions = logmodel.predict(x_test)

from sklearn.metrics import classification_report


print(classification_report(y_test,predictions))
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test,predictions)
OUTPUT:
RESULT:
Thus the program was executed successfully and output was verified.

You might also like