Swe370 Data Mining
Swe370 Data Mining
1 Homework+ 1 midterm =
-
% 60
final Exam =- 10 %
KDD
O LAB
D ala Warehouse
Data Cube
Data Mining
X
DescriptivePredictive
must be now
Chapter z : cas function less
variances
standard deviation and
= (x-X
Wavele
Transformation
2 .
5:4 Numerosity Reduction
t
Name : Mohamed Osman Hassan
Student Number: 210 S13737
= E Xi
83 31.
= =
14 18 19 00 20
, ,
16 , 16
, , , ,
21 , 22 , 22 23
, ,
25, 25 25 30,
, , 33 , 33, 35, 35
, ,35, 36, 40, 45, 46, 52
35
,
70 , 85
Median 27 5
230
= = .
Question 10 : The
highest frequency of data set
and only
there
mode.
is 35
modality is unimodal Win one
Question 1c :
Midrange ishighest
value + lowest valued
2
+ 149 5 .
Question 1D
firs quartile (o) is
14 18 16 , 16 19 00 20
, 21 , 22 , 22 23 25, 25 as
, , , , , , , ,
average =
(2) = 20 .
5
Q3 is : 30, 33
, 33, 35, 35, 35;35, 36, 40 45
, ,
46, 52 ,
70 , 85
Q3
average
(336) 35:5
= = -
Question If :
: 18
E
,
Question 2 a : cosine similarity
ddek
equation
dini = = 0 .
0660 = co
similaritie
correlation
equation
,
*
=
8 375
j
:
2+ 2 = 1 5
20
+
.
=
2 (xi - *
) (yij) (0) (0 5) + 0 (-1 5) + 0
=
. . :
.
00 5) + . 0. 10 .
5) = 0
0 .
[10 2s)
. + 12 253 + (0 25) + 20
. .
. 25)]
Of undefined
=
=
correlation
-
Sodedian distance : Jaxi-yi)2 =
J(1-2 + 11-0+ 11-2)+ (1-252 = 2
02(b)
= did = =
0 . 516 = Cosine
Similarity
T = 0 . 75
2 (xi - <
) (y ,j) = (0 - 0. 75)(1-0 75) + 21 -0 75710
.
. -
0 .
75) +
y = 0 . 75
=
- 0 .
1875 -0 1875 -
-
0. 1875 + 0 3175 .
= -
0 75
75)"(0 29)
.
= ( 0
-
- - :
- . . .
↑ (102512 (0 25)2 .
195
-0
.
C-0110-11Tar
Zocedian distance :
J + = 2
Jaccard Similarity :y ==
Ge() y
=
t
=
0408
=
zocedian distance
F : Th to = Je = 1 752
.
Jaccard Similarity
:
God
No
Cosine similarity
==
0
T
= = 0 ,
83 = 0
a
=
correlation
0 343
&=
.
Jaccard Similarity
: 2
= 0 .
667
Q2e
No
Cosine similarity =
=
correlation 0
T = 0
Y = 2
Jaccard Similarity
:= 0
=
n + 4 + 25 + y 557 6 08 = = .
= 11
Cluster
OutpuDie darnational
analysis
a dann cubes
Multidimensional data tables
Outlier
mining
Support (x = y) P(Xuy)
=
Loose
coupling
Semitight coupling
fight coupling
& IAP -- Online
analytical processing
Data cleaning -
> Data Integration
Data transformation Q5-Q
Interquartile range (IQR)
- distance between ,
Distributive Measure
Algebraic Measure
weighted arimmatic
trimmed Mean
mean
I wi or weightedaverage
Holistic Measure
Median = L
+fat) wide
>
-
unimodal
& Sima
al
dispersion or variance
quartile I
as
like
first quartile Q
Boxplots for visualizing
Variance N X ,, Xz XN
of
. ...
= [Evi-Y(Exi)]
mean Value
Quantile plot
fi =
o 5
.
quantile-quantile plow or
giq put
Scatter
plot
Loes curve
data
smooring &
BinningMenodeion
Clustering
field overloading
uniqueeve re
rAB = )
=
T
square
Contingency table
invert
Data traformation can
constrative
Win-max normalization transform
z-score normalization
/DAT) -
* worde
i
C
Decision
C1 5
tree
and Cort
distrate
wavelet transforms
ID3
:
Entropy (D , ) =
= pilry(pi) .
3-1-S' rule
↑
Waterfall
Mush
a
Spin
~ 44)2
using 4432 + (23 - 46 .
+ 127 - 42 .
19511824
1476 1
78 +
121 54
.
-us . um)2+ 149-usun) + (so-usunsh
18 &
24
24706 gyz5 96 +1011
.
4477 22 .
-
min-Mad X = X-minx)
-
(manmin) + min
max(X) -
minD)
z-scoremeanStandard deviation
z-score normalization = Mean absolute deviation (MaD) =
↑ xi-y
mean
largest
number digic
Normalization decimal DX
by =
=/ 0
2x1000 = 3
Step
2 make a table and measure distance
10 8 48 198 Cluster 1
11 I 39 189 Cluster 1
13 3 37 187 Cluster I
Is S 35 189 Cluster 1
35 25 IS 16S Cluster z
So 40 J Iso cluster z
SS YS S 14S Cluster
72 62 22 128 Cluster z
92 82 42 10s Cluster 2
closter 1 :
,,
5 10 11 ,
13 , 15
Cluster 2 : 35, so 55 72
, , , 92