0 ratings 0% found this document useful (0 votes) 32 views 12 pages DM Unit 4
The document discusses various clustering methods in unsupervised machine learning, including partitioning, hierarchical, density-based, and grid-based methods. It also covers the types of data variables and the importance of data standardization for effective clustering. Additionally, it addresses outlier detection and the different types of outliers that can affect data analysis.
AI-enhanced title and description
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save DM Unit 4 (1) For Later Cluctesing ‘and *
Clusters Bpalye® g- '
ee aoe } .
TOFD, *
Pe apne flas groups of objects together & %
Fore Spachistes. 9 |
The data ems are clustered based oo poincibles
Oh mariner cr) the Folea clace oft dlaaflfes. and
Mintmreity hq the foter clase! Stuirlasrlted «
B.To ouw clase use dre ‘placthg all the gle at
one side and all the boys at one Side
Gitle cluster and boys ave boys Cluster -
+ Gitle ave
Tt fs Unsupesvited Machihe Learnitg alqooithm.
; ,
cluster 4 -
© v aa I (|
\ eS of Pipe et
= clustering Scalabltly ;
— Algoo?thms usabrltty uofth roulttple Lypes of dale
Dealing wrth Unstructured data
- Inter operatab?| Fey.clustertog method's S-
= Pastfontng methods
| = Hesarch%l Metheds
= Denstlfly Based Methode
= Gptd based Methods -
es Sh le & eloete Analyete
: clastertng Analye%s Supports Data Stoucluses,
spate otilichetes Grae Of Lod types.
They aves
4: Data Matt .
a Detar lastly Mab
OB" Bata Mabe 2? oe!
Here, Data % septa as ‘able (os) fe Pp
matsie bev oo!
wows —> real world entlia be (names )
Columns —> Properties, of enthties.
a Dieefmnflaaly Mabirx ¢
- it
& represented ae oxo > ela. i:
get beluseen oe objets
eolFizc Staten es
aay? oe yendtid oh
ii) olen 2) ----0
Mypee Of datas vel
a hed cag} r
* Thee are. Four ieee oh been They ae
4 Thtesval gcalbed dita
be a. Bihary Variablesa. Cateqostal Vaotables
4: Mixeal Variables -,
4: Interval sealed clata s-
+ Tt has continuous varables
ExS 10-20, 20-30, --- ete:
To Convest thdivPtual data fto Continuous
variables. J .
= do the tar data anda nate atten before
that:
* Data standardeation means removal of units
+ For etandardfeation data we Should calculate
_ mean absolute deviation:
« Then divide the clata tito fhtervale ..
& Bihasy” vatriables 3 ra
+h has only 2 dtates |
Mf Soe
Vertables is > abserst |
arab le Ss
® present
° S tas = gublypes
Z
~ Symmetie bilary states of Yaotables can
RSs, change
ribet A? States of vanfables
ies fngantt Change
pos Categorital Varfables ¢-
Data that can be, dfyfded
Sal Pe
~8 a a
Toto cateqories
eohel af,ra 5 nominal vaviable 3°
tt, bas no partteular Ordex to the Meee: Codegprtes
‘Exe Geodeo
/ \
male Fernalé
(can be. Fo. any Order).
b) Ordinal vartble ze
TH hae pasticulas Poternal jovdes to the Cabegosie
Ex? Temperabure oe
ae
| tow Meditien High :
(should be 5 oxder)
4 Mied Variables 2-
Combhhation of Afferent types of variables:
Cateqorimation os Mafbz clusteotng Methods #-
Te cluctesfng methods axe categorized “to Four
types. They aves
L Pant Fonfng | Method
2: Hitzavchial Method
3: Densfly- based Methed
41 Gpid- based Method.
£ Rationing Methods go.)
+ The data © awitted, tate parthleos-
Fast? tfon segs pent a cluster:
Bechaclobtes should be less than ©) eqytal to
total data Phe”2 Past®bfon Should past@fante satfefy @ Rules.
Leach pasties should bave atleast one object.
~ each obfect should be belongs te only One parity,
Exampleg Mean algartthm-
~ data % A®ited Blo clustedve based on
Afstance and centro Values:
Hight (x) | web oo:
) 125 tapes OTS
2) 1t0 fe Bb
3) 16@ ga
4) 144 ce
> viéde the data tht two cheeters.
Each Cluster. should have centro? .
Clustes | s a en cluster a :
(tes 42) . (190,56
We Can take any Order pate From cxample
tornlow Pemathing values should be Aw Red based on
Ecucledtan distance (tp)
ED= | (xy-%e)*+(Yo-Ve)™
Rs) Kis \[(ies-tes)* + (60-32) = 208 : :
V Rie a (tee (a0)" + (60-se)*
Kirt ky
"
448 .
Tt belongs 40 clustes a-
nlow, calculate the new CentsoPd)
| (140,56) and’ (16,60)
I
Uto +16
( 416g » eter) = (14 58")
z
a= (169,08) i a sila Win @
|
mr ®, kis (osa=ea say = Gar
| '
(it4y6@)a15.-
wae \hl4—leay” tGe-se)e = last
Bocsien meek art
I + Tt belongs to cluster 2.
calculate centeotal fos’ (ies ,42) and (194,60)
i
| fiestiag 7 as
{ (eset as) = (e2,40)
The data “Ei arte Moto two different Clusters.2: Hideeachial Methods &
+ Tt quoupe the data foto tvee of Sat
+ The chuckuse called. dend sograr:
. Dendoograrm have Sequences ol merges and,
opl?ts -
© Urrarch&l method bas two sub methods
4 me were Method
DAP WeMethod-
4 on Method @-
+ Tt % bottorn —up methods. -
Steps Involved 2, - stort)
4) caleulate the eft tasty ae one eee earth veepect
4o all other clusters.
a) Consider every cata pofot a Tod dual data.
3) Meage the clueless woth he hee! sfinflasfty:
4) Recaltulate etenPlavthy Fen eh, cluster.
5) Repeat ® and © untfl -the Shogle, cluster %
obtathed.ee
, These ave & modes
= Sfogle: lmkage. — nfo <
—Complete lfokage — max
- Pverage likage ~ AVF-
a: DIVis%e Method 2
+ Tt % top-bottom methsd.
«we take all the clata Pleme foto siagle cluster and
f Pleratfons, we Split the data-
«+ At the end, we get Ni clusters
3+ Densfty- Based Methods 2-
+ Data obfects ave clustered Bnet on dlenstty-
+ Denerty means mass /Volume!
Example 3 pescan
(Densfty Based spatial clustesiing of; Pppleations
with Noite). ve
tt bas a frpute Geoatad)
>é - Epelen
seifhiinurm tpats polite -Lthese , ee
E — radius of cBele aah formeck whth
lata obfect> das centee:
mfp pothts — mfrfanun 99 O data potnts thee
the c&cle
Exe min pte = 2
Types of data points
4: Core pofot 2 ;
Ik Should satiefy the cond?tfon, ph min poldte.
2 Boundary pot :
nefbbour! & the Core pofot: 5
3» Nofee point 2
It % net cove nox wef: boundary pofhts -
Eee .4+ Grid- Based Methods &
yp Tk usecia mull? recolubton of qx data stouckuise .
web afiated the Obfedte. ‘Foko ‘Forté o6- % celle that
form a grid Ihe Steuckuse.
+
- ~ then denetly fe, calculated fox these celle
aieort the elle acon bo denefly.
— Tdentity cluctes centers
Sosop Update nemohboles ‘celle. '
+ GoPd Based aoe) fe Qufck proteeethy tno
Example g STING
(stabttveal Information Gal clasteotha -Algorthm)
~ Spatial data AvMed Flo vectanqular cells ot
Afferent levele of aesclution, these celle foom a
tree steucture- gees lseg
- cells at hrghes levels contafos Small celle compasel
to lowes levels.
- Clustering done paced a 0m pasameters
‘mBan }2ount min ymax.
= Caleulatibrd of these pavametess should Stast
feom oot and go down til bottom layer.Outles Analyse”
—S
, there! m
+ Armong ther data, Peme % Data Base eo ay ty
fame, which ofo, net, Fellows the qentval bebavi,
Some Per 2 ,
-ur of datas
+ Those data fleme a6, called antlers |
nown outie fralyet,
. Palyate a “cutlen % Le
Outlies Detection & 9 a
The procese of Ment?Fyinig outlitne and: mubsequerty
vemoving them. _ }
These se tus methods, They ave?
to statBital Approach
teal, ProPitly Approach
Le ctalfotital ae bskivi eles ro
» ett! &'Based’d0 poobabslt Pty data pote.
. Lecoett Auge ite: iz conctlened as outlite
j
<"pasamels% method
- Non Pecpnetate swethod
&: Proxfinflfty, “waabs
ttt fe based. on locaton of data: pownts:
BindPly bosed Approach
~ Dfetance based Approach
- Grd based Approach
~ Devialton basect Appsoach.Types Of Outliers 3
These ase thee Aypes of outls, They ave:
1 Global /potrt outles
# CollecdRe outlven
3 Cond?tidnal outles.
4: Global / pore Outles ¢
wlhen a stale data objects devites oom the vest
a )
F data:
Pointe Global /pot oud
en.
acollect®e outles ¢
then a qroup of data obfecte deviates Dom the
© vest of data,
2 cond?tfonal outlér¢
Data obfects deviates from others because of the
SpecPhic condition: