0 ratings 0% found this document useful (0 votes) 5 views 127 pages DM Total Notes
The document provides an overview of data warehousing and data mining systems, detailing the types of storage environments such as file systems, DBMS, and data warehouses. It discusses the processes involved in data mining, including data preprocessing, integration, and transformation, as well as the challenges faced in handling large datasets. Additionally, it outlines the functionalities of data mining tasks, differentiating between descriptive and predictive tasks, and covers various types of data attributes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save DM Total Notes For Later s\\F Tntroduction to daa. wayehousing and tala Mining .
Pala Minivig Ayptem vequives imput of cota. fiom ‘Three
types of storage. Environments.
WW File. aysien
(®) DBMS CPolabare Management sytem) CoLTP) 4
S Palawarehouse COLAPi sytem)’ yi od
W File System: eur
9 Th File System jth datas Organized “with a” set! of
files. Generally in mining system the Input & only given
rough a ad files such as + CSV 5 +HLbs ranfh ete
open DalaBane Gmnechivitg, [Some cl tyson (ariiitute
velated file
“format )
iP Avadus ex
ove. palatine Connectivity Vorlucs
c- oujec J inking embeddings
palabase (cormeck ity, ar
> The main cdrawmack of a file sytem ww bins Seely ,
no valideiion on a data, lock °5 management operations
ono Dara, | sosat [luge
@ DBMS * |
—> Tn DEMS, The dala % orgavized with a Sek of DalaBare,
Generally Dajabares or wed do Hove Lage amount of
Gperationol para:
-> The main annnnges of DBMS Are Nigh secunity , ‘gt
fetes validaions —cha. "aks ehbickenk iatonage mer =
operahons ono pola etc-,b y
~ (3) Delaware house
— On Odlawnrehoure » The data, Qigonized with a different
ounces of daka- thot means it con stove hebra geneous (ip,
Catection of data. it
=a Generally Thlaware houses ave Used for storing nion-operaty,
or historical data , Henig ora
“The input of data requires 1 types of Prepacessing acti
Tuy anv
7 i20ala honing ~ feriove Ae -neisy on dala, ill the miss
Cn ‘ “ality Rermoving intomislancy on-deta.
‘ *avUrfoyrnation ~"This i the process. of Conventiing the
dota into a suitable form for mining
applications.
(3) Daa. Integration ~ procs of Combining multiple date
Aources into Armgle Paka source.
© Ohta Leduction - he Joige Volume of data & comet
into smoll sek of veprosentaigns
withou Lossing any in-ermation, —
UNIT: Tahoduction Jo "bake ting.
(0 Whats Dlaining_ 9 origin ‘end, ohana challenges
(lenaualedge piscavenss
tent , GcDD) in elabane) . !
i) ae a
vil : 1 a0
ee : P Semeigh
D0o'o cheney eolepaltenytvalvalion —* ¢ 0
MN ata, Wow four i oy (1) knowledge regnesetalion
Von Bo heey Cvisintigation Techniaw s)
/palamining is a proces to discover she (mowledge om large
amount Of dclavases Cr Dokwarehouses or files . Tu olhen name o:
taiaming is. also called as "knowledge, Discavery im Databases fe
(KDD) “He above, diagram, shows the working, prows of a palaminin
which indudus both preprocessing amd “fot processirg steps “Ty ane
(Dela clearing ~ Remove the noisy Qf dota » fin jhe 'otissing
Naluss Yempve, incimsistamoy om” doko.
pala Tawsformation - To convert the dala io “suttabte | form
ss aes for mining process.
@ data Tntearation - To Combine the dake rom rrubiiple, Sources
' into single. Aounce«
) paka Reduckion - sarge’ volume of data we
Arnall, 4c OF vepresentations: ;
(S) pallom evaluakion,—- Once the knowledge i, extracted where the
knowledge bs evaluaking ‘by using
. interesting pattems nt "
_ © knowledge gepresentakion ~ te ‘Kngusledge i presented” by wing
diffewnt Visualization Techniques
+ puch as Tables» Cross tales , Decision
Trees » Graphs» pie chovts ten,
The following ave the paric origin. Of o balamining system
= AS ctakisiical appyeaches :
(ey Asdigcdol tnkelligene Haun on Aarne
>" Madkinastearting wor wvoay ok gevibwaai:
(4) Peep leaning. wth .
- s tSfdilowrs fovea}:
5), pathern igakions i ;
isin sant nA JB A Ci
converting mo
‘
Sdtu following, ant the ‘motivational Chidllenges ofa ° i]
Datatvining: system
( Scalability ~ Handling Jorge amount of data.
) High Virewsionality - Handling large set of dimension,
© Complex Data, > which imeludes mmulimedia. data’
—— fadak dota web data -
(Oa Dishibation,~ Guileting the dota from differnt
{ Geographical Jovations-
(©) non Traditional stabitical amalisis ~ vohich includes
Reqgressim , Bayesiam approach »
Mypothusis Test pouadignus ete.”
1. Dota Mining “aks (on) functionalities :
There axe ,Tu0’ Gakegories of Data Mining Tasks:
0) Descriptive , Pala. mining Tasks.
@ Predictive pata mining Tasks:
© Descriptive doa mining Tasks ~
Focus on feng human leprae
on data.
@) Predictive fala minitng Works.
fous on extinatun a unknown dala. or fehne date
boyd on cuonwil dala 0 |
Scoring 40 above salamat Tea
‘functionalities Mis ) Elrlare) qu i (
Aa hwvobtation, ilgfumnren: log (2d
Gs
pallens describinga
: : (ad CLasiificaluowe * ;
_ } custeing .
up outlier Analysts -
1) Association Analysis : ,
Assotiotion, Analysts monly used for mmankst Bosak
analy: Tis analy \ u used To find’ the frequunt pallens
from Int given dota set + Agvourdiivi ons “b the prs of 7
piscover the assce' alien vl Tok. shales a “ae
oe CUCtLLs freojently dogetliin Hal! ii wT
Geurrolly, auouatiom Yulyy awe dowribed fa he 791
4ormat 0f = AB
tne expandable formal of, above “alte Ayaan Ne~An
> BNA. .8n
ge hae Cote) a saaenont atudunt => BiLys (Ce, lepiop)
cpuanisation : seit | Torte
classifi uv “the “process do denive the model used
gush “We doka wilh a Set Of dawes For
ES whose
Ao duscnibe- dist’
tre purqo, OF VN ab 0 predict “the * han 04% Obje
ors Wed & unRnowy"
B® wutenty” ‘
ausering | ‘a Ane” coxs"4o'! amaly2¢ the data seck
whidn withoul Gshy Cauuttig Ties 4 NB? apie AG «pa S
wa Ut ee DLE ROE
48,9 Sothis ma apr, te
iv Cu gi WT:
Too principles - ha mai nab e ers breeg ™
P B 5 maxi) sing! Ya a rtiaicor ete!ane minimising the inter class ‘arttastilyy ‘between whe objet
Wek MONE > The objects aw forming based “on Aira sui,
dissimilanty:
(4 Ouldien Analysis
Oude Avolyi salto coll as oudlien | mining: or oul
. by Anomoly mining: Genevally owlier Analisis u we
4 idurdtipying the mistthaviow oF dale oljed s in dala Boy.
, Mt applicaliows. includes credit. cod fection, tabs imcbusia
detection. «
WM - Types of data -
Ud Types -of’ Athibutes :
Atinbuke © » ahJiwspp
Qualitative -f 4 ; ; - aueriiatve
; yd v “bt sy
Worinal ordinal pin Numenic piscrete: Continuous
_ a - Ce +
Summetiic Assymmebic ‘nkenval . “Ratio :
ee HP Stole. Stabe
Q) Qualitative :
@ romrinal,: powrimnal., oltiloules ave olso called. _ categorical
athibutes where fe, oubibute,. yale: ere defined iw igs 7
a inte re if Suk i aoe tele J, a
ee hd ahs te a paseo ie oe alaiwriry Men |———
wacominghul order, sto! :
exs profession , hain Colowt >, deparcyent Sees Sail ct
y
(h ovdinal:: The ris @ subsd.- of, nominal, attribuke and where
ane VOlUEs C202" O10: reprenpnited in diffeunk categorie « b
are ool Volos have, a maeorninghu Orde +
Ko eda» Avink, se) AQe credit Rating. he
ce) Binary = Whanes tt obi valli ave omy foo
Oulawus - i ‘
wx: Gendut », Kusull + monital salu » Tossing © coin ete»
TAL OM fun types. of Binary ale bukes -“W ane
~ Gy Syiammnbric. Were shy cuca of two Valles have
_ occ ns eaga) tmpottomce Om priowty . | mae
; wy OE Gender a a
i) Asyamebn © Wlhwre The ovkcomt Of 4wo valus mot
Wate oye & quol importance ‘ 07 priority.
we Raut _ masittal status ; medical
foeceaet diagonlsis ports ges st
(2) Qualitative : WOOLY IC | tage ME
(o> Numuatc + via olin Rapin. bygone ve
iy Trdenval Scale * where the alfribuke Walutes have a
bof ta pal categorical nature » the values Placing
oI. muperiy od bat eeu
sting 9 CTL, re iO
~ Hegre Ne ve no rae ero points
Defi aed alate gifaratse 99), cons F
Ee Yernparar Ww “too MRS 2,Colundvr dates , stlisfactiy levels ek
* Wh Ratko” seals * TY Wd iSubset’ Of Trtenval' scale ‘abbiby
"These. athibucte values aio have a
“ @alagerical value Paldmungful. orden,
: equal pacing . bout Tene Bo True" Pen
/ “yoink:
ex: “Fennel. in’ Kalvin, out mous,
weight ee»
| 6) aiscatte | Discrete odvibute Valuies are pelle na
Finite or Countable PROBL where the ‘athibute’ valiies ave!
-olefinid in A concrete dpprocich. thai treans te abbrlute valu
Ore defined In a’ specifet ov fied, sét of values.
ex: Number of childven > Ho. of Bouts vootking , No- oof
lomquages,Jpeaking Cer!
€) Covdlinuaus : ‘This. abbibute 8 “opposite ‘to descutt alfribute-
whow the valuss ane defined. nw infinite but the values Or
Voried accovcling 4o opplicolim of dota Processing.
ox Height wught ; age eke
Gi Types G doings)
there. avs. diffeunl. types of ania ss re |
ony 5 |
ne ae pmo
2) ety (a Tiguaclgnal bel” mre} toh hes Ke |
4 vs 1
age!