i X Y
A1 2 10
A2 2 5
A3 8 4
A4 5 8
A5 7 5
A6 6 4
A7 1 2
A8 4 9
Choose two points to be center (A1, A2)
i A1(cluster 1) A2 (Cluster 2)
A1 0 5
A2 5 0
A3 8.5 6.08
A4 3.6 4.2
A5 7.07 5
A6 7.2 4.1
A7 8.06 3.1
A8 2.2 4.47
A1 belong to the Cluster of Point A1
A2 belong to the Cluster of Point A2
A3, belong to the Cluster of Point A2
A4 belong to the Cluster of Point A1
A5 belong to the Cluster of Point A2
A6 belong to the Cluster of Point A2
A7 belong to the Cluster of Point A2
A8 belong to the Cluster of Point A1
i X Y Cluster
A1 2 10 1
A2 2 5 2
A3 8 4 2
A4 5 8 1
A5 7 5 2
A6 6 4 2
A7 1 2 2
A8 4 9 1
Calculate the mean of Cluster 1----------------------(A1, A4, A8)
X= (2+5+4)/3= 3.6
Y= (10+8+9)/3= 9 49
Mean Cluster 1 (3.6, 9).
Calculate the mean of Cluster 2----------------------(A2, A3, A5, A6, A7)
X= (2+8+7+6+1)/5= 4.8
Y= (5+5+4+4+2)/5= 4
Mean Cluster 2 (4.8, 4).
Step2: Recalculate the distance from each point to the cluster means
I cluster 1 Cluster 2
A1 1.8 6.6
A2 4.3 2.9
A3 6.6 3.2
A4 1.7 4
A5 5.2 2.4
A6 5.5 1.2
A7 7.4 4.2
A8 0.4 5.06
A1, belong to the Cluster 1
A2, belong to the Cluster 2
A3, belong to the Cluster 2
A4 belong to the Cluster 1
A5 belong to the Cluster 2
A6 belong to the Cluster 2
A7 belong to the Cluster 2
A8 belong to the Cluster 1
i X Y Cluster
A1 2 10 1
A2 2 5 2
A3 8 4 2
A4 5 8 1
A5 7 5 2
A6 6 4 2
A7 1 2 2
A8 4 9 1
Calculate the mean of Cluster 1----------------------(A1, A4, A8)
X= (2+5+4)/3= 3.6
Y= (10+8+9)/3= 9 49
Mean Cluster 1 (3.6, 9).
Calculate the mean of Cluster 2----------------------(A2, A3, A5, A6, A7)
X= (2+8+7+6+1)/5= 4.8
Y= (5+5+4+4+2)/5= 4
Mean Cluster 2 (4.8, 4).
Q2
Find the Entropy of the set We have 10 Records
Output label: 3 No, 4 Yes 3 Yes
E(S) = - PN * log2 PN - PY * log2 PY 7 No
= − 3/10 ∗ log2 (3/10) – 7/10 ∗ log2 (7/10)
=0.88
Find the Information Gain for each attribute attribute: Marital Status 3 Yes
7 No
E(Married)=0
E(Single)= 1 Marrie Divorced
d Single
E(Divorced)= 1
G(S, “Marital Status ”) = E(S) – (PM*E(M) + PS*E(S) + PD*E(D)) 2 No 1 Yes
4 No
= 0.88 - (4/10 * 0 + 4/10 * 1 + 2/10 *1) 2 Yes 1 No
= 0.88 - 0.6= 0.28
Find the Information Gain for each attribute attribute: Refund
3 Yes
E(YES)=0 7 No
E(NO)= - PN * log2 PN - PY * log2 PY Yes No
= 0.88 − 4/7 ∗ log2 (4/7) – 3/7 ∗ log2 (3/7)
=0.98
3 Yes
3 No
4 No
G(S, “Refund ”) = E(S) – (PN*E(N) + PY*E(Y) )
= 0.88 -(3/10 * 0 + 7/10 *0.98)
= 0.19
Find the Information Gain for each attribute attribute: Taxable Income
3 Yes
E(>100)=0 7 No
E(<=100)= - PN * log2 PN - PY * log2 PY >100 <=100
= 0.88 − 4/7 ∗ log2 (4/7) – 3/7 ∗ log2 (3/7)
=0.98
3 Yes
3 No
G(S, “Taxable Income”) = E(S) – (PN*E(N) + PY*E(Y) ) 4 No
= 0.88 -(3/10 * 0 + 7/10 *0.98)
= 0.19
From the calculated Gaining Information, select the attribute with the highest value
attribute: Refund ➔ 0.19
attribute: Marital Status ➔ 0.28
attribute: Taxable Income ➔ 0.19
So, select the Marital Status to start with as it has highest value
Material Statues
Divorced Married
Single
TID Refund Tax No
1 Yes 125k
3 No 70k
TID Refund Tax
5 No 95k
7 yes 220k
8 No 85k
10 No 90k