Random Forest Algorithm
Machine Learning | Supervised
Learning Technique
What is Random Forest?
• Random Forest is an ensemble learning
method.
• It builds multiple decision trees and merges
them together to get a more accurate and
stable prediction.
• Used for both classification and regression
tasks.
• Reduces overfitting and increases accuracy.
Why Use Random Forest?
• Can handle large datasets with higher
dimensionality.
• Works well for both categorical and numerical
data.
• Handles missing values efficiently.
• Provides feature importance scores.
• Reduces variance and avoids overfitting.
How Random Forest Works?
• 1. Select random samples from the dataset
(bootstrap sampling).
• 2. Build a decision tree for each sample.
• 3. During tree construction, select random
subset of features at each split.
• 4. Aggregate predictions from all trees:
• - Classification: Majority Voting
• - Regression: Average
Random Forest vs Decision Tree
• Decision Tree: Simple and easy to interpret,
but prone to overfitting.
• Random Forest: Multiple trees reduce
overfitting and improve accuracy.
• Decision Tree: Uses all features for splitting.
• Random Forest: Uses random subset of
features for each split.
Solved Example
• Dataset: Classify whether a person will buy a
product based on age and income.
• Randomly sample data points to create
multiple datasets.
• Build a decision tree for each sampled dataset.
• Each tree gives a prediction (Yes/No).
• Final result is based on majority vote from all
trees.
Advantages and Disadvantages
• Advantages:
• - High accuracy
• - Works well with large datasets
• - Resistant to overfitting
• - Estimates feature importance
• Disadvantages:
• - Slower for real-time predictions
• - Complex model, hard to interpret
• - Requires more memory
Applications
• 1. Fraud Detection in Banking
• 2. Stock Market Predictions
• 3. Medical Diagnosis (e.g., cancer detection)
• 4. E-commerce Product Recommendations
• 5. Sentiment Analysis and NLP Tasks
Thank You
• Hope you understood the Random Forest
algorithm!
• Questions and Discussions Welcome.
Ensemble Learning
• Combining multiple models to improve
performance. Model 1 + Model 2 + ... + Model N
• Key types: = Final Prediction
• - Bagging
• - Boosting
• - Stacking
Bagging vs Boosting
• Bagging:
• - Builds models independently
Model 1
• - Reduces
Model 2 variance Model 1 → Model 2 → Model 3
Model 3 (Sequential Learning)
• - Example:
(Parallel Learning) Random Forest
• Boosting:
• - Builds models sequentially
• - Reduces bias
• - Example: Gradient Boosting