Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views12 pages

Swe370 Data Mining

Uploaded by

c.zikocherki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views12 pages

Swe370 Data Mining

Uploaded by

c.zikocherki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Data Mining

1 Homework+ 1 midterm =
-
% 60

final Exam =- 10 %

KDD
O LAB
D ala Warehouse
Data Cube

Object Relational Databases


Temporal Database
Data base
Sequence
time series Database

Data Mining

X
DescriptivePredictive

must be now
Chapter z : cas function less
variances
standard deviation and

= (x-X

Wavele
Transformation
2 .
5:4 Numerosity Reduction

t
Name : Mohamed Osman Hassan
Student Number: 210 S13737

department : Software Engineering


Signature :
At
Question L answer : We can
find mean by adding all

data and dividing in with the number data


of
=

= E Xi
83 31.
= =

14 18 19 00 20
, ,
16 , 16
, , , ,
21 , 22 , 22 23
, ,
25, 25 25 30,
, , 33 , 33, 35, 35
, ,35, 36, 40, 45, 46, 52
35
,
70 , 85

Median 27 5
230
= = .

Question 10 : The
highest frequency of data set

and only
there
mode.
is 35
modality is unimodal Win one

Question 1c :
Midrange ishighest
value + lowest valued
2

+ 149 5 .

Question 1D
firs quartile (o) is
14 18 16 , 16 19 00 20
, 21 , 22 , 22 23 25, 25 as
, , , , , , , ,

average =

(2) = 20 .

5
Q3 is : 30, 33
, 33, 35, 35, 35;35, 36, 40 45
, ,
46, 52 ,
70 , 85

Q3
average
(336) 35:5
= = -

Question -E ; it menus we have to write Min


, R , median and=
= 14 20 . 5 27 5 35 5
.
.
, , ,

Question If :

: 18
E

Question 19 : Grantile plac Ordersalata from smallest to


largest .

Quantile-Qantile Plot : Calculates the quarties of


dow data sets

One theoretical one dataset .

,
Question 2 a : cosine similarity
ddek
equation
dini = = 0 .
0660 = co
similaritie

correlation

equation
,
*
=
8 375
j
:
2+ 2 = 1 5
20
+
.
=

2 (xi - *
) (yij) (0) (0 5) + 0 (-1 5) + 0
=
. . :
.

00 5) + . 0. 10 .

5) = 0

0 .

[10 2s)
. + 12 253 + (0 25) + 20
. .
. 25)]
Of undefined
=
=
correlation

-
Sodedian distance : Jaxi-yi)2 =
J(1-2 + 11-0+ 11-2)+ (1-252 = 2

02(b)
= did = =
0 . 516 = Cosine
Similarity

T = 0 . 75
2 (xi - <
) (y ,j) = (0 - 0. 75)(1-0 75) + 21 -0 75710
.
. -
0 .

75) +
y = 0 . 75

(0 -0 75) (1-0 75) . .


+ 12 -
0 .
75)(1-0 75) :

=
- 0 .
1875 -0 1875 -
-
0. 1875 + 0 3175 .

= -
0 75
75)"(0 29)
.

(xi <y [ (yi j7 7592(8 25)2 + (0 2572( 07512 + 1- 0


.

= ( 0
-
- - :
- . . .

↑ (102512 (0 25)2 .

195
-0
.

C-0110-11Tar
Zocedian distance :
J + = 2

Jaccard Similarity :y ==

Ge() y
=

t
=
0408

=
zocedian distance
F : Th to = Je = 1 752
.

Jaccard Similarity
:
God
No
Cosine similarity
==
0

T
= = 0 ,
83 = 0
a

=
correlation
0 343

&=
.

Jaccard Similarity
: 2
= 0 .
667

Q2e

No
Cosine similarity =
=
correlation 0

T = 0

Y = 2

Jaccard Similarity
:= 0

Py(a) Evclidean distance = (18-201+ (213+ (42 37) + 16-4) -

=
n + 4 + 25 + y 557 6 08 = = .

Manhattan distance 118-201


= + (21 + 142 371 + 16-41
-

= 11

Minhowski distance = 18-20 23


+ 42-37'+ 6-13
=
+ 8 + 8 + 125 + 8 = "Sing = 5. 30

Supremum distance = Max (2 , 2 , S 2) = 5


,
data

Cluster
OutpuDie darnational
analysis
a dann cubes
Multidimensional data tables
Outlier
mining

Support (x = y) P(Xuy)
=

Confidence (x -y) P(Y(X)


= =

parallel and distributed data mining algorimms .

Loose
coupling
Semitight coupling
fight coupling
& IAP -- Online
analytical processing

Chapter & Data


processing
Data reduction

Data cleaning -
> Data Integration
Data transformation Q5-Q
Interquartile range (IQR)
- distance between ,

Distributive Measure
Algebraic Measure

weighted arimmatic
trimmed Mean
mean

I wi or weightedaverage

Holistic Measure

Median = L
+fat) wide

>
-
unimodal
& Sima
al
dispersion or variance

quartile I
as

like

first quartile Q
Boxplots for visualizing
Variance N X ,, Xz XN
of
. ...

= [Evi-Y(Exi)]
mean Value

Quantile plot

fi =

o 5
.

quantile-quantile plow or
giq put
Scatter
plot
Loes curve

data
smooring &
BinningMenodeion
Clustering

field overloading

uniqueeve re

entity identification problem


correlation analys is
Correlation
Chi
crefficient (Pearson's product moment) =

rAB = )

=
T
square
Contingency table

invert
Data traformation can

constrative
Win-max normalization transform
z-score normalization
/DAT) -
* worde
i
C
Decision
C1 5
tree
and Cort
distrate
wavelet transforms
ID3
:

Dimensionality Reduction - principal components analysis


Pyramid algorithm
Principal components analysis
Info(D) = Entropy (D + Entropy (D

Entropy (D , ) =
= pilry(pi) .
3-1-S' rule

Large Software Can be


developed
by following


Waterfall
Mush
a
Spin

~ 44)2
using 4432 + (23 - 46 .

+ 127 - 42 .

19511824
1476 1
78 +

44)" + (39-46 44)


-4 6 (41-16 443
. · .

121 54
.
-us . um)2+ 149-usun) + (so-usunsh

18 &

24
24706 gyz5 96 +1011
.

4477 22 .

-
min-Mad X = X-minx)
-
(manmin) + min

max(X) -
minD)

z-scoremeanStandard deviation
z-score normalization = Mean absolute deviation (MaD) =

↑ xi-y
mean
largest
number digic
Normalization decimal DX
by =
=/ 0
2x1000 = 3

equi-wide bin-min( new (min + u (eq)(


number
of bing
k-means- >
clustering
Given datax : S 10 11
, , ,
13 15, 35 , 50
, ,
55
; 72 , 92, 201, 215

Step 1 initial Gues

Take one number ative


beginning mid und
If 50 doo

Step
2 make a table and measure distance

X Distance to 10 Distance to 50 Distance to 200 Cluster Assignment


g S 45 195 Cluster I

10 8 48 198 Cluster 1

11 I 39 189 Cluster 1

13 3 37 187 Cluster I

Is S 35 189 Cluster 1
35 25 IS 16S Cluster z

So 40 J Iso cluster z

SS YS S 14S Cluster
72 62 22 128 Cluster z

92 82 42 10s Cluster 2

201 194 154 4 Cluster 3

215- 205 165 Is Cluster]

closter 1 :
,,
5 10 11 ,
13 , 15

Cluster 2 : 35, so 55 72
, , , 92

Cluster 3 : 201 His


,

You might also like