0% found this document useful (0 votes)

42 views6 pages

L3 - Classification - RandomForest - Jupyter Notebook

The document outlines the process of loading the Iris dataset, training a Random Forest Classifier, and evaluating its accuracy. It demonstrates the importance of feature selection by removing the least significant feature, which improved model accuracy from approximately 91.11% to 93.33%. Additionally, it includes visualizations of decision trees from the Random Forest model.

Uploaded by

Gaynika Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views6 pages

L3 - Classification - RandomForest - Jupyter Notebook

Uploaded by

Gaynika Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

In [1]:

# Loading dataset

# importing required libraries

# importing Scikit-learn library and datasets package

from sklearn import datasets

import pandas as pd

# importing random forest classifier from ensemble module

from sklearn.ensemble import RandomForestClassifier

# Loading the iris plants dataset (classification)

iris = datasets.load_iris()

In [2]:

print(iris.target_names) # Dependent Variable

['setosa' 'versicolor' 'virginica']

In [3]:

print(iris.feature_names) # Independent features or columns

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

In [4]:

# Here dataset will contain all independent columns given by iris.data. It will convert to dataframe.

dataset = pd.DataFrame(iris.data)

In [5]:

# printing the top 5 rows in iris dataset

print(dataset.head())

0 1 2 3
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2

In [6]:

# We are trying to create a new column named 'species' in dataset. The values of species column
# is same as iris.target - setosa, versicolor and verginica i.e. 0,1,2

dataset['species'] = iris.target
In [7]:

# Adding column name to the respective columns

dataset.columns =['sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species']

# displaying the DataFrame

print(dataset)

sepallength sepalwidth petallength petalwidth species

0 5.1 3.5 1.4 0.2 0
1 4.9 3.0 1.4 0.2 0
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 0
.. ... ... ... ... ...
145 6.7 3.0 5.2 2.3 2
146 6.3 2.5 5.0 1.9 2
147 6.5 3.0 5.2 2.0 2
148 6.2 3.4 5.4 2.3 2
149 5.9 3.0 5.1 1.8 2

[150 rows x 5 columns]

In [8]:

# Spliting arrays or matrices into random train and test subsets

from sklearn.model_selection import train_test_split

X = dataset.iloc[:, : -1]
y = dataset.iloc[:, -1]

# i.e. 70 % training dataset and 30 % test datasets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state = 8)

In [9]:

from sklearn.ensemble import RandomForestClassifier

# Create a Random Forest Classifier

# n_estimators : int, default=100 : The number of trees in the forest.

# criterion{“gini”, “entropy”}, default=”gini”

clf = RandomForestClassifier(n_estimators=100) # 100 trees

# Train the model using the training sets y_pred=clf.predict(X_test)

clf.fit(X_train,y_train)

# Prediction on test set

y_pred=clf.predict(X_test)

# Import scikit-learn metrics module for accuracy calculation

from sklearn import metrics

# Model Accuracy, how often is the classifier correct?

print("Accuracy: ",metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.9111111111111111

In [10]:

# predicting which type of flower it is.

clf.predict([[3, 3, 2, 2]])

Out[10]:

array([0])

In [11]:

# This implies it is setosa flower type as we got the three species or classes in our data set:
# Setosa, Versicolor, and Virginia.
In [12]:

clf.predict([[3, 5, 5, 2]])

# Here, array([2]) indicates the flower type Virginica.

Out[12]:

array([2])

In [13]:

# Now we will also find out the important features or selecting features in the IRIS dataset.

In [14]:

# importing random forest classifier from assemble module

from sklearn.ensemble import RandomForestClassifier

# Create a Random forest Classifier

clf = RandomForestClassifier(n_estimators = 100)

# Train the model using the training sets

clf.fit(X_train, y_train)

Out[14]:

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,

criterion='gini', max_depth=None, max_features='auto',
max_leaf_nodes=None, max_samples=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100,
n_jobs=None, oob_score=False, random_state=None,
verbose=0, warm_start=False)

In [15]:

# using the feature importance variable

feature_imp = pd.Series(clf.feature_importances_, index = iris.feature_names).sort_values(

ascending = False)

feature_imp

Out[15]:

petal width (cm) 0.519967

petal length (cm) 0.349479
sepal length (cm) 0.103166
sepal width (cm) 0.027388
dtype: float64

In [16]:

# Generating the Model on Selected Features

# Here, we can remove the "sepal width" feature because it has very low importance,
# and select the 3 remaining features.

# Import train_test_split function

from sklearn.model_selection import train_test_split

# Split dataset into features and labels

X=dataset[['petallength', 'petalwidth','sepallength']]

# Removed feature "sepal width"

y=dataset['species']

# Split dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=5)

In [17]:

from sklearn.ensemble import RandomForestClassifier

# Create Random Forest Classifier

# n_estimatorsint, default=100: The number of trees in the forest.

clf=RandomForestClassifier(n_estimators=100)

# Train the model using the training sets y_pred=clf.predict(X_test)

clf.fit(X_train,y_train)

# Prediction on test set

y_pred=clf.predict(X_test)

# Import scikit-learn metrics module for accuracy calculation

from sklearn import metrics

# Model Accuracy, how often is the classifier correct?

print("Accuracy: ",metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.9333333333333333

In [18]:

# We can see that after removing the least important features (sepal width), the accuracy increased.
# This is because you removed misleading data and noise, resulting in an increased accuracy.
# A lesser amount of features also reduces the training time.

In [19]:

# first decision tree is 0th tree and total trees are from 0 to 99

clf.estimators_[0]

Out[19]:

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',

max_depth=None, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort='deprecated',
random_state=1364519456, splitter='best')
In [20]:

# Plot first decision tree

from sklearn import tree

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(25,20))

a = tree.plot_tree(clf.estimators_[0], feature_names = X.columns, filled=True)

In [21]:

clf.estimators_[1]

Out[21]:

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',

# Plot second decision tree

from sklearn import tree

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(25,20))

a = tree.plot_tree(clf.estimators_[1], feature_names = X.columns, filled=True)

In [ ]:

Encyclopedia of Recreational Diving Chapter 1
100% (4)
Encyclopedia of Recreational Diving Chapter 1
98 pages
From A Game of Polo With A Headless Goat-Annotated
No ratings yet
From A Game of Polo With A Headless Goat-Annotated
2 pages
IS300 Ecu Pinout
50% (2)
IS300 Ecu Pinout
6 pages
Map of The GD&T World
No ratings yet
Map of The GD&T World
2 pages
Fundamentals of BALLISTICS
No ratings yet
Fundamentals of BALLISTICS
12 pages
Glass Block Technical Presentation
No ratings yet
Glass Block Technical Presentation
16 pages
VAMSHI PR (1) 2 Edit
No ratings yet
VAMSHI PR (1) 2 Edit
16 pages
DSE 6 - Colab
No ratings yet
DSE 6 - Colab
5 pages
Machine Learning - Lab Record
No ratings yet
Machine Learning - Lab Record
43 pages
'Iris - CSV': Import As
No ratings yet
'Iris - CSV': Import As
3 pages
Mood Disorder
No ratings yet
Mood Disorder
18 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
Christianity As A Double-Edged Sword in Colonial Africa
No ratings yet
Christianity As A Double-Edged Sword in Colonial Africa
12 pages
Code
No ratings yet
Code
3 pages
DA PRA WEEK 13 (Random Forest) - 054551
No ratings yet
DA PRA WEEK 13 (Random Forest) - 054551
12 pages
Summer03 The Labyrinth PDF
100% (1)
Summer03 The Labyrinth PDF
3 pages
Compare Fermentation Rates of Flours
100% (1)
Compare Fermentation Rates of Flours
17 pages
PSCON01-02 (HPCL Nagaon) PDF
100% (1)
PSCON01-02 (HPCL Nagaon) PDF
120 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
15 pages
Tehcnical Paper: Calculating Freezing Times in Blast and Plate Freezers by Dr. Andy Pearson
100% (2)
Tehcnical Paper: Calculating Freezing Times in Blast and Plate Freezers by Dr. Andy Pearson
36 pages
Eai Exp 2-5
No ratings yet
Eai Exp 2-5
13 pages
Assgn 06 ML - Ipynb - Colab
No ratings yet
Assgn 06 ML - Ipynb - Colab
5 pages
Random Forest 1737667979
No ratings yet
Random Forest 1737667979
11 pages
Pra 8
No ratings yet
Pra 8
4 pages
Sklearn
No ratings yet
Sklearn
141 pages
Software Project Management Metrics
No ratings yet
Software Project Management Metrics
2 pages
ML Functions
No ratings yet
ML Functions
12 pages
TranMinhTu1 bt2 2
No ratings yet
TranMinhTu1 bt2 2
5 pages
DM Practical06
No ratings yet
DM Practical06
12 pages
Lab Manual ML
No ratings yet
Lab Manual ML
23 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
Exp 3
No ratings yet
Exp 3
3 pages
Train Test Splitting
No ratings yet
Train Test Splitting
3 pages
EX - NO:3: Algorithm
No ratings yet
EX - NO:3: Algorithm
11 pages
1 Assignment 3 - Classification
No ratings yet
1 Assignment 3 - Classification
16 pages
Frankenstein Context
No ratings yet
Frankenstein Context
1 page
Dsbdalab 6
No ratings yet
Dsbdalab 6
5 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
DSBDA6
No ratings yet
DSBDA6
6 pages
Implementing Logistic Regression For Iris Using Sklearn and Checking The Accuracy Using Confusion Matrix
No ratings yet
Implementing Logistic Regression For Iris Using Sklearn and Checking The Accuracy Using Confusion Matrix
7 pages
Crop R
No ratings yet
Crop R
10 pages
DS 6
No ratings yet
DS 6
2 pages
ML L - Ab
No ratings yet
ML L - Ab
13 pages
Eco Blocks
No ratings yet
Eco Blocks
49 pages
Unsupervised ML
No ratings yet
Unsupervised ML
17 pages
CSET301 LabW8L2
No ratings yet
CSET301 LabW8L2
1 page
ML#07
No ratings yet
ML#07
21 pages
Untitled2 - Jupyter Notebook
No ratings yet
Untitled2 - Jupyter Notebook
4 pages
Dsbda Assig 6 Data Analytcs 3
No ratings yet
Dsbda Assig 6 Data Analytcs 3
6 pages
Support Vector Machine (SVM Classifier) Implemenation in Python With Scikit-Learn
No ratings yet
Support Vector Machine (SVM Classifier) Implemenation in Python With Scikit-Learn
21 pages
CO3
No ratings yet
CO3
8 pages
Impacts On Water Environment: Prediction and Assessment of
No ratings yet
Impacts On Water Environment: Prediction and Assessment of
32 pages
PR 6
No ratings yet
PR 6
6 pages
Jaswinder Pal Singh 2024-04-05: Library Data Print Unique
No ratings yet
Jaswinder Pal Singh 2024-04-05: Library Data Print Unique
4 pages
Lab Manual
No ratings yet
Lab Manual
32 pages
Bagging, Random Forest, Gradient Boost, AdaBoost & PCA
No ratings yet
Bagging, Random Forest, Gradient Boost, AdaBoost & PCA
8 pages
Introduction To Neural Networks
No ratings yet
Introduction To Neural Networks
4 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
1 page
Iris Classifier Accuracy Comparison
No ratings yet
Iris Classifier Accuracy Comparison
5 pages
AI & ML Lab Journal for MCA Students
No ratings yet
AI & ML Lab Journal for MCA Students
77 pages
Bda 3.1
No ratings yet
Bda 3.1
2 pages
7 Output
No ratings yet
7 Output
4 pages
UC Berkeley Course Reviews Summary
No ratings yet
UC Berkeley Course Reviews Summary
5 pages
Iris Dataset Decision Tree Analysis
No ratings yet
Iris Dataset Decision Tree Analysis
4 pages
ML Lab Programs
No ratings yet
ML Lab Programs
23 pages
10 Random - Forest - Algo
No ratings yet
10 Random - Forest - Algo
6 pages
PR
No ratings yet
PR
17 pages
EXP 07 (ML) - Ashu
No ratings yet
EXP 07 (ML) - Ashu
4 pages
Exp 07 (ML)
No ratings yet
Exp 07 (ML)
4 pages
EXP 07 (ML) - Darshu
No ratings yet
EXP 07 (ML) - Darshu
4 pages
EXP 07 (ML) - Sarthak
No ratings yet
EXP 07 (ML) - Sarthak
4 pages
ML 2.3 Prashant
No ratings yet
ML 2.3 Prashant
4 pages
CH 7.5 - Cargo & Ballast Operations
No ratings yet
CH 7.5 - Cargo & Ballast Operations
472 pages
Star Trek: Borg Cube
No ratings yet
Star Trek: Borg Cube
4 pages
Random Forest Classifier on Banking Dataset
No ratings yet
Random Forest Classifier on Banking Dataset
7 pages
19mid0034 (Chandru) - ML Lab Fat - Jupyter Notebook
No ratings yet
19mid0034 (Chandru) - ML Lab Fat - Jupyter Notebook
4 pages
FA3629AV
No ratings yet
FA3629AV
8 pages
The Wilderness Effect and Ecopsychology RobertGreenway
No ratings yet
The Wilderness Effect and Ecopsychology RobertGreenway
13 pages
Rotman Good Kitchen
No ratings yet
Rotman Good Kitchen
6 pages
How The Rib of Adam Is Incorrectly Translated
No ratings yet
How The Rib of Adam Is Incorrectly Translated
5 pages
Independent Proposal
No ratings yet
Independent Proposal
26 pages
Advanced Micro :nanotechnologies For Exosome Encapsulation and Targeting in Regenerative Medicine
No ratings yet
Advanced Micro :nanotechnologies For Exosome Encapsulation and Targeting in Regenerative Medicine
22 pages
SC Assignment Q2
No ratings yet
SC Assignment Q2
7 pages
Semiconductor Field Service Expert
No ratings yet
Semiconductor Field Service Expert
2 pages
S1B CMT C8 Percentage Tutorial
No ratings yet
S1B CMT C8 Percentage Tutorial
28 pages
ALAC14
No ratings yet
ALAC14
6 pages
LEE Exam 1 Version A
No ratings yet
LEE Exam 1 Version A
7 pages
Mod. AV-1/EV: Tabletop Power Supply Unit For Electric Measurements and Machines
No ratings yet
Mod. AV-1/EV: Tabletop Power Supply Unit For Electric Measurements and Machines
1 page

L3 - Classification - RandomForest - Jupyter Notebook

Uploaded by

L3 - Classification - RandomForest - Jupyter Notebook

Uploaded by

In [1]:

# importing required libraries

from sklearn import datasets

# importing random forest classifier from ensemble module

from sklearn.ensemble import RandomForestClassifier

# Loading the iris plants dataset (classification)

print(iris.target_names) # Dependent Variable

['setosa' 'versicolor' 'virginica']

print(iris.feature_names) # Independent features or columns

# printing the top 5 rows in iris dataset

# Adding column name to the respective columns

dataset.columns =['sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species']

# displaying the DataFrame

sepallength sepalwidth petallength petalwidth species

[150 rows x 5 columns]

# Spliting arrays or matrices into random train and test subsets

from sklearn.model_selection import train_test_split

# i.e. 70 % training dataset and 30 % test datasets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state = 8)

from sklearn.ensemble import RandomForestClassifier

# Create a Random Forest Classifier

# n_estimators : int, default=100 : The number of trees in the forest.

clf = RandomForestClassifier(n_estimators=100) # 100 trees

# Train the model using the training sets y_pred=clf.predict(X_test)

# Prediction on test set

# Import scikit-learn metrics module for accuracy calculation

from sklearn import metrics

# Model Accuracy, how often is the classifier correct?

print("Accuracy: ",metrics.accuracy_score(y_test, y_pred))

# predicting which type of flower it is.

# Here, array([2]) indicates the flower type Virginica.

# importing random forest classifier from assemble module

from sklearn.ensemble import RandomForestClassifier

# Create a Random forest Classifier

clf = RandomForestClassifier(n_estimators = 100)

# Train the model using the training sets

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,

# using the feature importance variable

feature_imp = pd.Series(clf.feature_importances_, index = iris.feature_names).sort_values(

petal width (cm) 0.519967

# Generating the Model on Selected Features

# Import train_test_split function

from sklearn.model_selection import train_test_split

# Split dataset into features and labels

# Removed feature "sepal width"

# Split dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=5)

from sklearn.ensemble import RandomForestClassifier

# Create Random Forest Classifier

# n_estimatorsint, default=100: The number of trees in the forest.

# Train the model using the training sets y_pred=clf.predict(X_test)

# Prediction on test set

# Import scikit-learn metrics module for accuracy calculation

from sklearn import metrics

# Model Accuracy, how often is the classifier correct?

print("Accuracy: ",metrics.accuracy_score(y_test, y_pred))

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',

# Plot first decision tree

from sklearn import tree

import matplotlib.pyplot as plt

a = tree.plot_tree(clf.estimators_[0], feature_names = X.columns, filled=True)

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',

# Plot second decision tree

from sklearn import tree

import matplotlib.pyplot as plt

a = tree.plot_tree(clf.estimators_[1], feature_names = X.columns, filled=True)

You might also like