Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
32 views12 pages

DM Unit 4

The document discusses various clustering methods in unsupervised machine learning, including partitioning, hierarchical, density-based, and grid-based methods. It also covers the types of data variables and the importance of data standardization for effective clustering. Additionally, it addresses outlier detection and the different types of outliers that can affect data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
32 views12 pages

DM Unit 4

The document discusses various clustering methods in unsupervised machine learning, including partitioning, hierarchical, density-based, and grid-based methods. It also covers the types of data variables and the importance of data standardization for effective clustering. Additionally, it addresses outlier detection and the different types of outliers that can affect data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 12
Cluctesing ‘and * Clusters Bpalye® g- ' ee aoe } . TOFD, * Pe apne flas groups of objects together & % Fore Spachistes. 9 | The data ems are clustered based oo poincibles Oh mariner cr) the Folea clace oft dlaaflfes. and Mintmreity hq the foter clase! Stuirlasrlted « B.To ouw clase use dre ‘placthg all the gle at one side and all the boys at one Side Gitle cluster and boys ave boys Cluster - + Gitle ave Tt fs Unsupesvited Machihe Learnitg alqooithm. ; , cluster 4 - © v aa I (| \ eS of Pipe et = clustering Scalabltly ; — Algoo?thms usabrltty uofth roulttple Lypes of dale Dealing wrth Unstructured data - Inter operatab?| Fey. clustertog method's S- = Pastfontng methods | = Hesarch%l Metheds = Denstlfly Based Methode = Gptd based Methods - es Sh le & eloete Analyete : clastertng Analye%s Supports Data Stoucluses, spate otilichetes Grae Of Lod types. They aves 4: Data Matt . a Detar lastly Mab OB" Bata Mabe 2? oe! Here, Data % septa as ‘able (os) fe Pp matsie bev oo! wows —> real world entlia be (names ) Columns —> Properties, of enthties. a Dieefmnflaaly Mabirx ¢ - it & represented ae oxo > ela. i: get beluseen oe objets eolFizc Staten es aay? oe yendtid oh ii) olen 2) ----0 Mypee Of datas vel a hed cag} r * Thee are. Four ieee oh been They ae 4 Thtesval gcalbed dita be a. Bihary Variables a. Cateqostal Vaotables 4: Mixeal Variables -, 4: Interval sealed clata s- + Tt has continuous varables ExS 10-20, 20-30, --- ete: To Convest thdivPtual data fto Continuous variables. J . = do the tar data anda nate atten before that: * Data standardeation means removal of units + For etandardfeation data we Should calculate _ mean absolute deviation: « Then divide the clata tito fhtervale .. & Bihasy” vatriables 3 ra +h has only 2 dtates | Mf Soe Vertables is > abserst | arab le Ss ® present ° S tas = gublypes Z ~ Symmetie bilary states of Yaotables can RSs, change ribet A? States of vanfables ies fngantt Change pos Categorital Varfables ¢- Data that can be, dfyfded Sal Pe ~8 a a Toto cateqories eohel af, ra 5 nominal vaviable 3° tt, bas no partteular Ordex to the Meee: Codegprtes ‘Exe Geodeo / \ male Fernalé (can be. Fo. any Order). b) Ordinal vartble ze TH hae pasticulas Poternal jovdes to the Cabegosie Ex? Temperabure oe ae | tow Meditien High : (should be 5 oxder) 4 Mied Variables 2- Combhhation of Afferent types of variables: Cateqorimation os Mafbz clusteotng Methods #- Te cluctesfng methods axe categorized “to Four types. They aves L Pant Fonfng | Method 2: Hitzavchial Method 3: Densfly- based Methed 41 Gpid- based Method. £ Rationing Methods go.) + The data © awitted, tate parthleos- Fast? tfon segs pent a cluster: Bechaclobtes should be less than ©) eqytal to total data Phe” 2 Past®bfon Should past@fante satfefy @ Rules. Leach pasties should bave atleast one object. ~ each obfect should be belongs te only One parity, Exampleg Mean algartthm- ~ data % A®ited Blo clustedve based on Afstance and centro Values: Hight (x) | web oo: ) 125 tapes OTS 2) 1t0 fe Bb 3) 16@ ga 4) 144 ce > viéde the data tht two cheeters. Each Cluster. should have centro? . Clustes | s a en cluster a : (tes 42) . (190,56 We Can take any Order pate From cxample tor nlow Pemathing values should be Aw Red based on Ecucledtan distance (tp) ED= | (xy-%e)*+(Yo-Ve)™ Rs) Kis \[(ies-tes)* + (60-32) = 208 : : V Rie a (tee (a0)" + (60-se)* Kirt ky " 448 . Tt belongs 40 clustes a- nlow, calculate the new CentsoPd) | (140,56) and’ (16,60) I Uto +16 ( 416g » eter) = (14 58") z a= (169,08) i a sila Win @ | mr ®, kis (osa=ea say = Gar | ' (it4y6@)a15.- wae \hl4—leay” tGe-se)e = last Bocsien meek art I + Tt belongs to cluster 2. calculate centeotal fos’ (ies ,42) and (194,60) i | fiestiag 7 as { (eset as) = (e2,40) The data “Ei arte Moto two different Clusters. 2: Hideeachial Methods & + Tt quoupe the data foto tvee of Sat + The chuckuse called. dend sograr: . Dendoograrm have Sequences ol merges and, opl?ts - © Urrarch&l method bas two sub methods 4 me were Method DAP WeMethod- 4 on Method @- + Tt % bottorn —up methods. - Steps Involved 2, - stort) 4) caleulate the eft tasty ae one eee earth veepect 4o all other clusters. a) Consider every cata pofot a Tod dual data. 3) Meage the clueless woth he hee! sfinflasfty: 4) Recaltulate etenPlavthy Fen eh, cluster. 5) Repeat ® and © untfl -the Shogle, cluster % obtathed. ee , These ave & modes = Sfogle: lmkage. — nfo < —Complete lfokage — max - Pverage likage ~ AVF- a: DIVis%e Method 2 + Tt % top-bottom methsd. «we take all the clata Pleme foto siagle cluster and f Pleratfons, we Split the data- «+ At the end, we get Ni clusters 3+ Densfty- Based Methods 2- + Data obfects ave clustered Bnet on dlenstty- + Denerty means mass /Volume! Example 3 pescan (Densfty Based spatial clustesiing of; Pppleations with Noite). ve tt bas a frpute Geoatad) >é - Epelen seifhiinurm tpats polite - Lthese , ee E — radius of cBele aah formeck whth lata obfect> das centee: mfp pothts — mfrfanun 99 O data potnts thee the c&cle Exe min pte = 2 Types of data points 4: Core pofot 2 ; Ik Should satiefy the cond?tfon, ph min poldte. 2 Boundary pot : nefbbour! & the Core pofot: 5 3» Nofee point 2 It % net cove nox wef: boundary pofhts - Eee . 4+ Grid- Based Methods & yp Tk usecia mull? recolubton of qx data stouckuise . web afiated the Obfedte. ‘Foko ‘Forté o6- % celle that form a grid Ihe Steuckuse. + - ~ then denetly fe, calculated fox these celle aieort the elle acon bo denefly. — Tdentity cluctes centers Sosop Update nemohboles ‘celle. ' + GoPd Based aoe) fe Qufck proteeethy tno Example g STING (stabttveal Information Gal clasteotha -Algorthm) ~ Spatial data AvMed Flo vectanqular cells ot Afferent levele of aesclution, these celle foom a tree steucture- gees lseg - cells at hrghes levels contafos Small celle compasel to lowes levels. - Clustering done paced a 0m pasameters ‘mBan }2ount min ymax. = Caleulatibrd of these pavametess should Stast feom oot and go down til bottom layer. Outles Analyse” —S , there! m + Armong ther data, Peme % Data Base eo ay ty fame, which ofo, net, Fellows the qentval bebavi, Some Per 2 , -ur of datas + Those data fleme a6, called antlers | nown outie fralyet, . Palyate a “cutlen % Le Outlies Detection & 9 a The procese of Ment?Fyinig outlitne and: mubsequerty vemoving them. _ } These se tus methods, They ave? to statBital Approach teal, ProPitly Approach Le ctalfotital ae bskivi eles ro » ett! &'Based’d0 poobabslt Pty data pote. . Lecoett Auge ite: iz conctlened as outlite j <"pasamels% method - Non Pecpnetate swethod &: Proxfinflfty, “waabs ttt fe based. on locaton of data: pownts: BindPly bosed Approach ~ Dfetance based Approach - Grd based Approach ~ Devialton basect Appsoach. Types Of Outliers 3 These ase thee Aypes of outls, They ave: 1 Global /potrt outles # CollecdRe outlven 3 Cond?tidnal outles. 4: Global / pore Outles ¢ wlhen a stale data objects devites oom the vest a ) F data: Pointe Global /pot oud en. acollect®e outles ¢ then a qroup of data obfecte deviates Dom the © vest of data, 2 cond?tfonal outlér¢ Data obfects deviates from others because of the SpecPhic condition:

You might also like