Decision Tree Notes:
● Decision tree as the name suggests it is a flow like a tree structure that
works on the principle of conditions.
● It is efficient and has strong algorithms used for predictive analysis.
● It has mainly attributed that include internal nodes, branches and a
terminal node.
● Every internal node holds a “test” on an attribute, b
ranches hold the
conclusion of the test and every leaf node means the class label.
● It is used for both classifications a
s well as r egression. It is often
termed as “CART” that means classification and regression tree.
● Tree algorithms are always preferred due to stability and reliability.
● A decision tree makes decisions by splitting nodes into sub-nodes.
Example: various examples
Last Part:
● Node splitting, or simply splitting, is the process of dividing a node into multiple
sub-nodes to create relatively pure nodes.
● There are multiple ways of doing this, which can be broadly divided into two
categories based on the type of target variable
Categorical Target Variable
● Entropy(Gini Impurity)
● Information Gain
● Chi-Square
The algorithm can be summarized as :
1. At each stage (node), pick out the best feature as the test condition.
2. Now split the node into the possible outcomes (internal nodes).
3. Repeat the above steps till all the test conditions have been exhausted into leaf
nodes.
When you start to implement the algorithm, the first question is: ‘How to pick the
starting test condition?’
Choose best features or attributes for split
The answer to this question lies in the values of ‘Entropy’ and ‘Information Gain’.
Let us see what are they and how do they impact our decision tree creation.
Entropy: Entropy in Decision Tree stands for homogeneity. If the data is
completely homogenous, the entropy is 0, else if the data is divided (50-50%)
entropy is 1.
Information Gain: Information Gain is the decrease/increase in Entropy value
when the node is split.
● An attribute should have the highest information gain to be selected for
splitting.
● Based on the computed values of Entropy and Information Gain, we
choose the best attribute at any particular step.