0% found this document useful (0 votes)

30 views18 pages

ML Python Exercises UOM BDS Classification

Uploaded by

metapi5906

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views18 pages

ML Python Exercises UOM BDS Classification

Uploaded by

metapi5906

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

CHAPTER 3: CLASSIFICATION

1. Decision Tree Classifier

import pandas as pd

from sklearn.preprocessing import LabelEncoder

from sklearn import tree

# Read the CSV file

df = pd.read_csv("salaries.csv")

print(df)

# Prepare inputs and target

inputs = df.drop('salary_more_then_100k', axis='columns')

target = df['salary_more_then_100k']

# Label encode categorical features

le_company = LabelEncoder()

le_job = LabelEncoder()

le_degree = LabelEncoder()

inputs['company_n'] = le_company.fit_transform(inputs['company'])

inputs['job_n'] = le_job.fit_transform(inputs['job'])

inputs['degree_n'] = le_degree.fit_transform(inputs['degree'])

print(inputs)

inputs_n = inputs.drop(['company', 'job', 'degree'], axis='columns')

# Create and fit the decision tree model

model = tree.DecisionTreeClassifier()

model.fit(inputs_n, target)
# Print model score and make predictions

print("Model Score:", model.score(inputs_n, target))

#Is salary of Google, Computer Engineer, Bachelors degree > 100 k ?

print("Prediction for [2, 1, 0]:", model.predict([[2, 1, 0]]))

#Is salary of Google, Computer Engineer, Masters degree > 100 k ?

print("Prediction for [2, 1, 1]:", model.predict([[2, 1, 1]]))

Output:
Model Score: 1.0

Prediction for [2, 1, 0]: [0]

Prediction for [2, 1, 1]: [1]

2. NAIVE BAYES CLASSIFICATION:[pg.no:51-53]

Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used
for solving classification problems. It is mainly used in text classification that includes a high-
dimensional training dataset.

The Bayesian method of calculating conditional probabilities is used in machine learning applications
that involve classification tasks. A simplified version of the Bayes Theorem, known as the Naive Bayes
Classification, is used to reduce computation time and costs.

Applications of Naïve Bayes Classifier:

 It is used for Credit Scoring.

 It is used in medical data classification.
 It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner.
 It is used in Text classification such as Spam filtering and Sentiment analysis.

PROGRAM:

from sklearn import preprocessing

from sklearn.naive_bayes import GaussianNB

age = ['youth', 'youth', 'middle-aged', 'senior', 'senior',

'senior', 'middle-aged', 'youth', 'youth', 'senior', 'youth',

'middle-aged', 'middle-aged', 'senior']

income = ['high', 'high', 'high', 'medium', 'low', 'low',

'low', 'medium', 'low', 'medium', 'medium', 'medium',

'high', 'medium']

student = ['no', 'no', 'no', 'no', 'yes', 'yes', 'yes',

'no', 'yes', 'yes', 'yes', 'no', 'yes', 'no']

credit_rating = ['fair', 'excellent', 'fair', 'fair', 'fair',

'excellent', 'excellent', 'fair', 'fair', 'fair',

'excellent', 'excellent', 'fair', 'excellent']

buys_computer = ['no', 'no', 'yes', 'yes', 'yes', 'no',

'yes', 'no', 'yes', 'yes', 'yes', 'yes', 'yes', 'no']

# Create Label Encoder object

le = preprocessing.LabelEncoder()
# Converting string labels into numbers

age_encoded = le.fit_transform(age)

print(age_encoded)

income_encoded = le.fit_transform(income)

print(income_encoded)

student_encoded = le.fit_transform(student)

print(student_encoded)

credit_encoded = le.fit_transform(credit_rating)

print(credit_encoded)

# Converting string labels into numbers

label = le.fit_transform(buys_computer)

print(label)

# Combining age, income, student, and credit rating into a single list of tuples

features = list(zip(age_encoded, income_encoded, student_encoded, credit_encoded))

# Create a Gaussian Naive Bayes model

model = GaussianNB()

# Train the model using the training sets

model.fit(features, label)

# Predict output

predicted = model.predict([[2, 2, 1, 1]]) # 2: youth, 2: medium, 1: yes, 1: fair

print("Predicted Value:", predicted)

Output:

3. MULTINOMINAL NAIVE BAYES CLASSIFICATION:[pg.no:54-55]

Multinomial Naive Bayes is one of the most popular supervised learning classifications that is used
for the analysis of the categorical text data. Text data classification is gaining popularity because
there is an enormous amount of information available in email, documents, websites, etc. that needs
to be analyzed.
Examples of categorical variables are race, sex, age group, and educational level. While the latter two
variables may also be considered in a numerical manner by using exact values for age and highest
grade completed, it is often more informative to categorize such variables into a relatively small
number of groups.

This dataset is the result of a chemical analysis of wines grown in the same region in Italy but
derived from three different plant varieties. Dataset comprises of 13 features (alcohol, malic_acid,
ash,alkalinity_of_ash,magnesium,total_phenols,flavonoids,nonflavanoid_phenols,proanthocyanins,c
olor_intensity,hue,od280/od315_of_diluted_wines,proline) and type of wine plant variety. This data
has three type of wine Class_0, Class_1, Class_2.

PROGRAM:

# Import scikit-learn dataset library

from sklearn import datasets

# Load dataset

wine = datasets.load_wine()

# Print the names of the 13 features

print("Features:", wine.feature_names)

# Print the label type of wine (class 0, class 1, class 2)

print("Labels:", wine.target_names)

# Print data (feature) shape

print(wine.data.shape)

# Print the wine data features (top 5)

print(wine.data[:5])

print(wine.target)

# Import train test split function from sklearn.model_selection

from sklearn.model_selection import train_test_split

# Split dataset into training set and test set (70% training and 30% test)

X_train, X_test, y_train, y_test =train_test_split(wine.data, wine.target, test_size=0.3,

random_state=109)

# Import Gaussian Naive Bayes model

from sklearn.naive_bayes import GaussianNB

# Create a Gaussian Classifier

gnb = GaussianNB()

# Train the model using the training sets

gnb.fit(X_train, y_train)

# Predict the response for the test dataset

y_pred = gnb.predict(X_test)

print("Predicted Labels:",y_pred)

# Import scikit-learn metrics module for accuracy calculation

from sklearn import metrics

# Model Accuracy to find out how accurate is the classifier

print("Accuracy:", metrics.accuracy_score(y_test, y_pred) *100)

Output:

Classify 1: According to color.

Classify 2: According to the carbon dioxide pressure.

Classify 3: According to the sugar content.

4. LINEAR KERNEL:[pg.no:59-61]

Kernels, also known as kernel techniques or kernel functions, are a collection of distinct forms of
pattern analysis algorithms, using a linear classifier, they solve an existing non-linear problem. SVM
(Support Vector Machines) uses Kernels Methods in ML to solve classification and regression issues.

Linear Kernel is used when the data is Linearly separable, that is, it can be separated using a single
Line. It is one of the most common kernels to be used. It is mostly used when there are a Large
number of Features in a particular Data Set.
PROGRAM:

# Load libraries

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score,classification_report, confusion_matrix

import matplotlib.pyplot as plt

import seaborn as sn

from sklearn.svm import SVC

# Assign column names to the dataset

column_names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

# Load dataset

df = pd.read_csv("iris.csv", names=column_names)

# Split dataset into features and target

X = df.drop('Class', axis=1) # Features

y = df['Class'] # Target variable

# Split dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.3, random_state=1)

# Import support vector classifier from sklearn.svm

clf = SVC(kernel='linear')

# Fit the classifier to the training data

clf.fit(X_train, y_train)

# Predict the classes on test set

y_pred = clf.predict(X_test)
print(y_pred)

# Calculate accuracy

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy * 100)

# Print classification report and confusion matrix

print(classification_report(y_test, y_pred))

# Generate and display confusion matrix heatmap

cm = pd.crosstab(y_test,y_pred,rownames=['Actual'],colnames=['Predicted'])

ax = sn.heatmap(cm, annot=True)

plt.show()

Output:
5. POLYNOMINAL KERNEL:[pg.no:61-63]

PROGRAM:

# Import libraries

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score,classification_report, confusion_matrix

import matplotlib.pyplot as plt

import seaborn as sn

# Assign column names to the dataset

colnames = ['sepal-length', 'sepal-width', 'petal-length',

'petal-width', 'Class']

df = pd.read_csv("iris.csv", names=colnames)

# Split dataset into features and target variable

X = df.drop('Class', axis=1) # Features

y = df['Class'] # Target variable

# Split dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.3, random_state=1)

# Import support vector classifier from sklearn

from sklearn.svm import SVC

clf = SVC(kernel='poly', degree=8) # Polynomial kernel with degree 8

# Fit the classifier to the training data

clf.fit(X_train, y_train)

# Make predictions on the test data

y_pred = clf.predict(X_test)

# Print the predicted labels

print(y_pred)

# Calculate and print the accuracy score

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy*100)

# Print the classification report

print(classification_report(y_test, y_pred))

# Generate and display the confusion matrix as a heatmap

confusion_matrix = pd.crosstab(y_test, y_pred,

rownames=['Actual'],colnames=['Predicted'])

ax = sn.heatmap(confusion_matrix, annot=True)

plt.show()

Output:
6. RADIAL BASIS FUNCTION KERNEL:[pg.no:63-65]

PROGRAM:

# Load libraries

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score,classification_report, confusion_matrix

import matplotlib.pyplot as plt

import seaborn as sn

from sklearn.svm import SVC

# Assign column names to the dataset

col_names = ['sepal-length', 'sepal-width', 'petal-length',

'petal-width', 'Class']

# Load the dataset

dataset = pd.read_csv("iris.csv", names=col_names)

# Separate features and target variable

X = dataset.drop('Class', axis=1)

y = dataset['Class']

# Split dataset into training set and test set (70% training and 30% test)

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.3, random_state=1)

# Import support vector classifier

clf = SVC(kernel='rbf', gamma=0.1)

# Fit the classifier to the training data

clf.fit(X_train, y_train)

# Predict the labels for the test set

y_pred = clf.predict(X_test)

# Print the predicted labels and accuracy

print("Predicted labels:", y_pred)

print("Accuracy:", accuracy_score(y_test, y_pred))

# Print the classification report

print(classification_report(y_test, y_pred))
# Generate and display confusion matrix as a heatmap

confusion_matrix = pd.crosstab(y_test, y_pred,

rownames=['Actual'], colnames=['Predicted'])

sn.heatmap(confusion_matrix, annot=True)

plt.show()
7. K-NEAREST NEIGHBOURS

The k-nearest neighbors algorithm, also known as KNN or k-NN, is a non-parametric, supervised
learning classifier, which uses proximity to make classifications or predictions about the grouping of
an individual data point.

# The make_blobs function from sklearn.datasets is used to generate a synthetic dataset with 500
samples, 2 features, 4 centers, and a cluster standard deviation of 1.5. The X variable contains the
feature vectors, and the y variable contains the corresponding labels.

PROGRAM:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.datasets import make_blobs

from sklearn.neighbors import KNeighborsClassifier

from sklearn.model_selection import train_test_split

X, y = make_blobs(n_samples = 500, n_features = 2, centers = 4,cluster_std = 1.5, random_state = 4)

plt.style.use('seaborn')
plt.figure(figsize = (10,10))

plt.scatter(X[:,0], X[:,1], c=y, marker= '*',s=100,edgecolors='black')

plt.show()

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)

knn5 = KNeighborsClassifier(n_neighbors = 5)

knn1 = KNeighborsClassifier(n_neighbors=1)

knn5.fit(X_train, y_train)

knn1.fit(X_train, y_train)

y_pred_5 = knn5.predict(X_test)

y_pred_1 = knn1.predict(X_test)

from sklearn.metrics import accuracy_score

print("Accuracy with k=5", accuracy_score(y_test, y_pred_5)*100)

print("Accuracy with k=1", accuracy_score(y_test, y_pred_1)*100)

plt.figure(figsize = (15,5))

plt.subplot(1,2,1)

plt.scatter(X_test[:,0], X_test[:,1], c=y_pred_5, marker= '*', s=100,edgecolors='black')

plt.title("Predicted values with k=5", fontsize=20)

plt.subplot(1,2,2)

plt.scatter(X_test[:,0], X_test[:,1], c=y_pred_1, marker= '*', s=100,edgecolors='black')

plt.title("Predicted values with k=1", fontsize=20)

plt.show()

Output:

Accuracy with k=5 93.60000000000001

Accuracy with k=1 90.4

8. RANDOM FOREST :[pg.no:70-71]

It builds decision trees on different samples and takes their majority vote for classification and
average in case of regression.

PROGRAM:

# Load libraries

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score,classification_report, confusion_matrix

import matplotlib.pyplot as plt

import seaborn as sn

# Load dataset

df = pd.read_csv("iris.csv")
# Assign column names to the dataset

colnames = ['sepal-length', 'sepal-width', 'petal-length',

'petal-width', 'Class']

df.columns = colnames

# Split dataset into features and target

X = df.drop('Class', axis=1) # Features

y = df['Class'] # Target variable

# Split dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.3, random_state=1)

# Import random forest classifier and fit the model

clf = RandomForestClassifier(n_estimators=100)

clf.fit(X_train, y_train)

# Make predictions on the test set

y_pred = clf.predict(X_test)

# Print predictions and accuracy score

print("Predictions:", y_pred)

print("Accuracy:", accuracy_score(y_test, y_pred) * 100)

# Print classification report and confusion matrix

print("Classification Report:")

print(classification_report(y_test, y_pred))

print("Confusion Matrix:")

cm = confusion_matrix(y_test, y_pred)

print(cm)

# Generate heatmap and display it

plt.figure(figsize=(8, 6))

sn.heatmap(cm, annot=True, fmt='d')

plt.xlabel('Predicted')

plt.ylabel('Actual')

plt.show()

Machine Learning Evaluation Guide
100% (1)
Machine Learning Evaluation Guide
504 pages
Amlt Bca Unit-1
No ratings yet
Amlt Bca Unit-1
24 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
Wa0001
No ratings yet
Wa0001
39 pages
ML Notes - 2025
No ratings yet
ML Notes - 2025
145 pages
Unit-4-AIML 1
No ratings yet
Unit-4-AIML 1
19 pages
3 Classification
No ratings yet
3 Classification
16 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
Ds Notes Mca
No ratings yet
Ds Notes Mca
30 pages
Scikit-Learn Python Cheat Sheet
100% (1)
Scikit-Learn Python Cheat Sheet
1 page
ML Unit-Ii
No ratings yet
ML Unit-Ii
37 pages
AAM PR QB
No ratings yet
AAM PR QB
13 pages
Supervised Learning Notes
No ratings yet
Supervised Learning Notes
7 pages
ML5 Implementation
No ratings yet
ML5 Implementation
32 pages
Perform The Data Classification Using SVM Classifier - BI Prac 1
No ratings yet
Perform The Data Classification Using SVM Classifier - BI Prac 1
8 pages
07 Naive - Bayes
No ratings yet
07 Naive - Bayes
7 pages
ML Manual With Outputs
No ratings yet
ML Manual With Outputs
30 pages
Amlnew
No ratings yet
Amlnew
25 pages
Purva Rawale - BDA Practical No 2
No ratings yet
Purva Rawale - BDA Practical No 2
9 pages
Prathamesh KRAI
No ratings yet
Prathamesh KRAI
38 pages
Beginner's Guide To Implementing A Simple Machine Learning Project - DeV Community
No ratings yet
Beginner's Guide To Implementing A Simple Machine Learning Project - DeV Community
9 pages
ML Lab PT
No ratings yet
ML Lab PT
25 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
8 pages
MlUnit 2
No ratings yet
MlUnit 2
11 pages
Remaining ML Program
No ratings yet
Remaining ML Program
12 pages
ML File - 1
No ratings yet
ML File - 1
12 pages
B+ Tree in DBMS
No ratings yet
B+ Tree in DBMS
21 pages
Module 4 - Classification
No ratings yet
Module 4 - Classification
10 pages
Dsbda 10
No ratings yet
Dsbda 10
5 pages
ML Lab
No ratings yet
ML Lab
10 pages
Data Stream Sampling Techniques
No ratings yet
Data Stream Sampling Techniques
3 pages
Practical # 11
No ratings yet
Practical # 11
10 pages
Machine Learning Lab New
No ratings yet
Machine Learning Lab New
14 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
Data Science Practical
No ratings yet
Data Science Practical
22 pages
23BCE7092 ML Lab Assignment
No ratings yet
23BCE7092 ML Lab Assignment
14 pages
ML Internal Answers
No ratings yet
ML Internal Answers
9 pages
ML Cheatsheet
No ratings yet
ML Cheatsheet
4 pages
ML External Xerox
No ratings yet
ML External Xerox
1 page
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Practical 3
No ratings yet
Practical 3
11 pages
Data Structures and File Management
No ratings yet
Data Structures and File Management
12 pages
ML LAB Rec
No ratings yet
ML LAB Rec
9 pages
Logistic & Naïve Bayes Analysis
No ratings yet
Logistic & Naïve Bayes Analysis
17 pages
Bayesian Algorithm
No ratings yet
Bayesian Algorithm
6 pages
Tutorial 6
No ratings yet
Tutorial 6
8 pages
Naive Bayes Model With Python 1684166563
No ratings yet
Naive Bayes Model With Python 1684166563
9 pages
Exp 3 Bi 30
No ratings yet
Exp 3 Bi 30
7 pages
LAB MANUAL For Machine Learning
No ratings yet
LAB MANUAL For Machine Learning
15 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Python ML Algorithms Guide
No ratings yet
Python ML Algorithms Guide
7 pages
Fall Semester 2020-21 AI With Python ECE-4031
No ratings yet
Fall Semester 2020-21 AI With Python ECE-4031
5 pages
UCS551 Chapter 6 - Classification
No ratings yet
UCS551 Chapter 6 - Classification
20 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
11 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
5 pages
Lecture3 PDF
No ratings yet
Lecture3 PDF
15 pages
Unit 1 Introduction To Digital Signal Processing
No ratings yet
Unit 1 Introduction To Digital Signal Processing
15 pages
DataDriven ReservoirModeling NAGAO THESIS 2021
No ratings yet
DataDriven ReservoirModeling NAGAO THESIS 2021
119 pages
Exploring Painting Synthesis With Diffusion Models 2
No ratings yet
Exploring Painting Synthesis With Diffusion Models 2
4 pages
Machine Learning Classification Bootcamp Cheatsheet
No ratings yet
Machine Learning Classification Bootcamp Cheatsheet
7 pages
Maxbox Starter60 Machine Learning
No ratings yet
Maxbox Starter60 Machine Learning
8 pages
ML Lab1 PGM
No ratings yet
ML Lab1 PGM
4 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
Brain Tumor Detection Using Deep Neural Network
No ratings yet
Brain Tumor Detection Using Deep Neural Network
6 pages
Dijkstra's Algorithm Explained
No ratings yet
Dijkstra's Algorithm Explained
39 pages
Sigma-Delta Modulator FPGA Design
No ratings yet
Sigma-Delta Modulator FPGA Design
76 pages
MATHS Mini Project Sem4
No ratings yet
MATHS Mini Project Sem4
10 pages
Ak Mathematics Iii Unit 1
No ratings yet
Ak Mathematics Iii Unit 1
6 pages
Classical Dynamics & Thermodynamics
No ratings yet
Classical Dynamics & Thermodynamics
30 pages
Trend Lines Case Study ONLINE
No ratings yet
Trend Lines Case Study ONLINE
25 pages
Windows: Design of Fir Digital Filters
No ratings yet
Windows: Design of Fir Digital Filters
10 pages
Data Mining Techniques Guide
No ratings yet
Data Mining Techniques Guide
17 pages
Advances in Neural Computation Machine Learning and Cognitive Re 2020
No ratings yet
Advances in Neural Computation Machine Learning and Cognitive Re 2020
434 pages
Algebra 2 Unit 7 Homework
No ratings yet
Algebra 2 Unit 7 Homework
9 pages
A Survey On Kolmogorov-Arnold Networks
No ratings yet
A Survey On Kolmogorov-Arnold Networks
35 pages
RLC GenRLCSolnSummary
No ratings yet
RLC GenRLCSolnSummary
3 pages
1.TKS Tutorials Overview
No ratings yet
1.TKS Tutorials Overview
3 pages
Hashing
No ratings yet
Hashing
48 pages
Statistical Computing & Data Generation
No ratings yet
Statistical Computing & Data Generation
23 pages
Basic of SVM Algorithm
No ratings yet
Basic of SVM Algorithm
10 pages
Data Structures Algorithms U5
No ratings yet
Data Structures Algorithms U5
83 pages
Practice Problem Set 3: OA4201 Nonlinear Programming
No ratings yet
Practice Problem Set 3: OA4201 Nonlinear Programming
6 pages
Constrained Optimization Matop
No ratings yet
Constrained Optimization Matop
6 pages
π π π = 4 · arctan (1) : How to approximate π? Use: tan 4 = 1 ⇒ 4 = arctan (1) ⇒ Know: Taylor expansion of arctan x
No ratings yet
π π π = 4 · arctan (1) : How to approximate π? Use: tan 4 = 1 ⇒ 4 = arctan (1) ⇒ Know: Taylor expansion of arctan x
7 pages
The Inverted Pendulum Benchmark in Nonlinear Control Theory A Survey
No ratings yet
The Inverted Pendulum Benchmark in Nonlinear Control Theory A Survey
9 pages
03 - Classification PDF
No ratings yet
03 - Classification PDF
92 pages