Decision trees are a popular machine learning algorithm for both classification and regression tasks.
They
work by recursively splitting the dataset into subsets based on feature values, creating a tree-like
structure of decisions that leads to predictions. Here’s an overview of decision trees and some
commonly used algorithms:
1. Basic Concept of Decision Trees
• Nodes: Each node represents a feature (or attribute) in the dataset.
• Edges: Each branch from a node represents a decision based on that feature’s value.
• Leaf Nodes: Represent the final output (class or value) after all decisions have been made.
• Root Node: The topmost node in a tree, representing the initial feature or question.
2. Decision Tree Algorithms
a) ID3 (Iterative Dichotomiser 3)
• Developed by Ross Quinlan, ID3 is one of the earliest algorithms.
• Criterion: It uses information gain to decide which feature to split on, favoring splits that result
in the greatest reduction in entropy.
• Limitations: Prone to overfitting and can’t handle numeric data directly without modification.
b) C4.5
• An extension of ID3, also developed by Quinlan.
• Criterion: Uses gain ratio to handle continuous and categorical data better than ID3.
• Pruning: Implements pruning to reduce overfitting.
• Handling of Missing Values: C4.5 can handle datasets with missing values more effectively than
ID3.
c) CART (Classification and Regression Trees)
• Developed by Leo Breiman, CART is widely used in both classification and regression.
• Criterion: For classification, CART uses Gini impurity as the splitting criterion, while for
regression, it uses mean squared error (MSE).
• Binary Splits Only: CART splits the data into exactly two branches at each node, creating binary
trees.
• Pruning: Prunes trees based on a cost-complexity parameter to manage overfitting.
d) CHAID (Chi-Square Automatic Interaction Detector)
• CHAID is used for categorical data and is based on the chi-square test.
• Criterion: Uses statistical tests (chi-square for classification, ANOVA for regression) to determine
splits.
• Multifurcating Splits: Unlike CART, CHAID can create branches with multiple splits from a single
node.
• Use Cases: Often used for market research and survey analysis.
3. Advantages of Decision Trees
• Interpretability: Easy to understand and visualize, even for non-experts.
• Non-linearity: Can model non-linear relationships.
• Little Data Preprocessing: Often requires minimal data preparation, like normalization or scaling.
4. Limitations of Decision Trees
• Overfitting: Decision trees can easily overfit, especially with deep trees.
• Bias: Sensitive to small changes in data, which can lead to vastly different trees (high variance).
• Preference for Certain Features: Tend to favor features with more levels.
5. Applications of Decision Trees
• Classification tasks (e.g., spam detection, customer churn prediction)
• Regression tasks (e.g., predicting housing prices)
• Feature selection
To calculate entropy and information gain for building a decision tree:
entropy and information gain calculation with a small dataset and use it to build a decision tree. We’ll
also make a prediction for a given input based on the final decision.
Details:
https://towardsdatascience.com/decision-tree-in-machine-learning-
e380942a4c96