
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Create Random Forest Classifier Using Python Scikit-Learn
Random forest is a supervised machine learning algorithm that is used for classification, regression, and other tasks by creating decision trees on data samples. After creating the decision trees, a random forest classifier collects the prediction from each of them and selects the best solution by means of voting.
One of the best advantages of a random forest classifier is that it reduces overfitting by averaging the result. That is the reason we get better results as compared to a single decision tree.
Steps to Create Random Forest Classifier
We can follow the below steps to create a random forest classifier using Python Scikit-learn ?
Step 1 ? Import the required libraries.
Step 2 ? Load the dataset.
Step 3 ? Divide dataset into training and test datasets.
Step 4 ? Import random forest classifier from sklearn.ensemble module.
Step 5 ? Create dataframe of dataset.
Step 6 ? Create a random forest classifier and train the model using fit() function.
Step 7 ? Predict from test dataset.
Step 8 ? Import metrics to find the accuracy of the classifier.
Step 9 ? Print the accuracy of the random forest classifier.
Example
In the below example, we will be using Iris Plants dataset to build a random forest classifier:
# Import required libraries import sklearn import pandas as pd from sklearn import datasets # Load the iris dataset from sklearn iris_clf = datasets.load_iris() print(iris_clf.target_names) print(iris_clf.feature_names) # Dividing the datasets into training datasets and test datasets X, y = datasets.load_iris( return_X_y = True) from sklearn.model_selection import train_test_split # 60 % training dataset and 40 % test datasets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.40) # Import random forest classifier from sklearn assemble module from sklearn.ensemble import RandomForestClassifier # Create dataframe data = pd.DataFrame({'sepallength': iris_clf.data[:, 0], 'sepalwidth': iris_clf.data[:, 1], 'petallength': iris_clf.data[:, 2], 'petalwidth': iris_clf.data[:, 3], 'species': iris_clf.target}) # Create a Random Forest classifier RForest_clf = RandomForestClassifier(n_estimators = 100) # Train the model on the training dataset by using fit() function RForest_clf.fit(X_train, y_train) # Predict from the test dataset y_pred = RForest_clf.predict(X_test) # Import metrics for accuracy calculation from sklearn import metrics print('\n'"Accuracy of our Random Forst Classifier is: ", metrics.accuracy_score(y_test, y_pred)*100)
Output
It will produce the following output ?
['setosa' 'versicolor' 'virginica'] ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] Accuracy of our Random Forst Classifier is: 95.0
Let's predict the type of flowers using our classifier ?
# Predicting the type of flower RForest_clf.predict([[5, 4, 3, 1]])
Output
It will produce the following output ?
array([1])
array([1]) represents the versicolor type.
# Predicting the type of flower RForest_clf.predict([[5, 4, 5, 2]])
Output
It will produce the following output ?
array([2])
Here the array([2]) represents the virginica type.