0 ratings 0% found this document useful (0 votes) 13 views 28 pages Data Science Notes Unit 1
data science notes unit 1 notes.data science notes unit 1 notes.data science notes unit 1 notes.data science notes unit 1 notes.data science notes unit 1 notes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save data science notes unit 1 For Later Data science
Daa — science combines math e
advanced analytics, artificial intelligence (Al) and machine learning with speci
mnatier expertise 10 uncover actionable insights hidden in_an organization's:
insights can be used to guide decision making and strategic planning.
and — slatistics, specialized programming,
ific subject
These
ines that uses statistics, data analysis,
.e and insights from it.
making
sis, and make future
* Data Science is a combination of multiple disci
and machine learning to analyse data and to extract knowl
© Data Science is about data gathering, analysis and deci
© Data Science is about finding pattems in data, through analy:
predic
© By using Data Science. companies are able to make:
+ Better decisions (should we choose A or B)
+ Predictive analysis (what will happen next?)
+ Pattern discoveries (find pattern, or maybe hidden information in the data)
Where is Data Science Needed?
Data Science is used in many industries in the world today, e.g. banking, consultancy,
healtheare, and manufacturing.
Examples of where Data Science is needed:
+ For route planning: To discover the best routes to ship
+ To foresee delays for Mlight/ship/train etc, (through predictive analysis)
+ To create promotional offers
+ To find the best suited time to deliver goods
+ To forecast the next years revenue for a company
+ To analyze health benefit of training
+ To predict who will win elections
Data Science can be applied in nearly every part of a business where data is available.
Examples are:
+ Consumer goods
+ Stock markets
+ Industry
+ Politics
+ Logistic companies
+ E-commer
The accelerati
of the fastest growing field across every industry.
1g Yolume of data sources, and subsequently data, has made daa science is one
ss analys
The data science lifecycle involves various roles, tools, and processes, which enables anal
ving stages
to glean actionable insights. Typically. a data seteice project undergoes the following stae
(& scanned with OKEN Scannerlection, both raw structured and
sty of methods. These methods
1g data from systems
Data ingestion: The lifecycle begins with the data coll
unstructured data from all relevant sources using a var
can include manual entry, web scraping, and real-time stres
and deviees, Data sources can inelude structured data, such as customer data, along
‘with unstructured data like log files, video, audio, pictures, the Internet of Things
(oT). social media, and more.
nd data proces: ince data can have different formats and
-d to consider different storage systems based on the type of
tured, Data management teams help to set standards around
data storage and structure, which facilitate workflows around analytics, machine
learning and deep learning models. This stage includes cleaning data, deduplicating,
transforming and combining the data using ETL (extract, transform, load) jobs or
other data integration technologies. This data preparation is essential for promoting
data quality before loading into a data warehouse, data lake, or other repository.
Data analysis: Here, data scientists conduct an exploratory data analysis to examine
biases. patterns, ranges. and distributions of values within the data. This data analytics
Sploration drives hypothesis generation for a/b testing. It also allows analysts to
deiermine the data’s relevance for use within modelling efforts for predictive
analytics, machine learning, and/or deep learning. Depending on a model’s accuracy,
ms can become reliant on these insights for business decision making,
Dats sto
structures, companies. n
data that needs to be
organizati
allowing them to drive more scalability.
Communicate: Finally, insights are presented as reports and other data visualizations
that make the insights and their impact on business easier for business analysts and
other decision-makers to understand, A data science programming language such as R
cor Python includes components for generating visualizations; alternately, data
scientists can use dedicated visualization tools.
How Does a Data Scientist Work?
A Data Scientist requires expertise in several backgrounds:
Machine Learning
Statistics
Programming (Python or R)
Mathematics
Databases
A Data Scientist must find patterns within the data, Before he/she can find the pattems,
he/she must organize the data in a standard format.
Jere is how a Data Scientist works:
Lj
2i
ns - To understand the business problem.
Explore and collect data - From database, web logs, customer feedback, ete.
‘Extract the data - ‘Transform the data to a standardized format,
in the data - Remove erroneous values from the data.
and replice missing values - Check for missing values and replace them with a
suitable value (e.g, an average value).
Ask the right quest
(& scanned with OKEN Scanner6. Normalize data - Scale the values in a practical range (e.g. 140 em is smaller than 1,8
m, However, the number 140 is larger than 1,8. - so scaling is important), ,
7. Analyze data, find patterns and make future predictions.
8. Represent the result - Present the result with useful insights in a way the "company"
can understand
Data Preprocessing
2 involves cleaning and transforming raw data into a usable format for
Date, preprocessi
jable analysis.
accurate and rel
r
Daia anal
make informed decision
Data Vi
Data visualization uses
(ss Analy sis
is the process of inspecting data to discover meaningful insights and trends to
alization
raphical representations such as charts and graphs to understand
and interpret complex data,
Machine Li
Machi
and make predictio
ing
Tocuses on developing algorithms that helps computers to learn from data
plicit programming.
or decisions without
What is Data Analyties?
Dats Awalsties is the process of collecting, 01
«derstand what’s happening and make betl
people ‘s learn from data like what worked in the past, what is happening now
‘und what might happen in the future.
People ofivs mie up data analyties and data analysis but they're not exactly the same, Data
sais fs just one part of data anulyties it focuses on finding meaning in data, On the other
ote unui ties includes mare than just analysis, It also involves things like coming up
panel predictions from data and building the tools and systems needed (o handle
‘and busin
with ide
cms of date.
Importance and Usage of Data Analytics
Data analytes is used in many fields like banking, fat
ming, shopping, government and more.
It helps in many ways:
Data. Analytics Importance
‘on Making: It gives elear facts and patterns from data which help people
ps in Dec
snake smarter choi
© Jelps in Problem Solving: It points out what's going wrong and why making it easier to
‘is problems.
© Helps Identify Opportunities: It shows trends and new chances for growth that might
not be obvious
Improved Efficiency: It helps reduce waste, saves time and makes work smoother by
finding better ways to do things.
ties
ptists and data engineers together create data pipelines which helps to
her analysis, Data Analytics can be done in the following steps
of Data Ana
ysts. data s
ie model and do fi
mentioned below:
Pro
(& scanned with OKEN ScannerData Analyties Process
Data colleetion is the first step where raw information is gathered from.
metimes data comes from
1. Data Collectio
different places like
s and needs (0 be join
websites, apps. surveys or machines. Sor
ned together. Other times only a small useful part of the
many souree
alata is selected.
2. Data Cleansing Once
ceniigs, missin «s or repeated rows. In this step 1
anything that isn’t needed. Clean data make
the data is collected it usually contains mistakes like wrong
the data is cleaned to fix those
s the results more
value
syoblems and renwove
r
accurate andl rustsorthy
Data Analysis and Dat
ai, Python, Bor SQL. Analysts look
like Excel.
> salve problems or answer questions, The goal her
fier cleaning the data is studied using tools
for patterns, trends or us eful information that
wre is to understand what the data
1a Interpretatio
ean help
is telling us.
4. Data Visualization: Data visualiza
.¢ plots. charts and graphs whieh helps
+ valuable insights of the data, By comparing the
J he useful data tiem the raw data,
1 is the process of creating visual representation of
irends and get
to analyze the patterns,
datasets and analyzing it data analysts
sta usi
u
find
Types of Data Analytics
There are different types of data analysi
jis, Some of the types of data analysis
is in which raw data is converted into valuable
are mentioned below:
Types of Data Analytics
Data Analytics: Descriptive data analytics helps to summarize and
ia, It shows What has happened by using tables, CRAPS and averages,
‘and weaknesses and spot any unusial
1. Descriptive
understand past d
‘eo compare results, find strengths
pattems
nostic Data Analyties: Diagnostic data analytics looks at why something happened:
comparison to find the cause of a
tools like correlation, regression or
panies understand the reason behind a drop in sales oF & sudden
in the past. It use
problem. This helps com
change in performance
vreditive Data Analytis: Predictive data analytics is used to guess what might happen
jn the future, It looks at et past data to find patterns and make forecasts.
usinesses use fto prediet things like customer be havior, future sales or possible risks.
tive Data Analyfies: Prescriptive data analytics helps to choose the best action
ferent options and suggests what should be done next.
jons and managing machines
4. Prese
‘or solution. It looks at diff
Coinpantes use it for things like loan approval, pricing de
‘or schedules.
‘Methods of Data Analytics
‘There are two types of methods in data analyties which are mentioned belo
1, Qualitative Data Analytics
ualitative data ana ri fr
a a ye data analy st 's and derives data from the words, pictures and
symbols. Some common qualitative methods are:
is doesn’t use statis
(& scanned with OKEN Scanner+ Narrative Analytics is used for working with data acquired from diaries, interviews and so
on,
+ Content Analyties is used for Analyties of verbal data and behaviour.
‘+ Grounded theory is used to explain some given event by studying.
2, Quantitative Data Analysis ae
Quantitative data Analyties is used to collect data and then process it into the numerical data.
Some of the quantitative methods are mentioned below:
«= ypothesis testing assesses the given hypothesis ofthe data set.
J sample size determination is the method of taking e small sample from a large group of
people and then analysing it
«Average or mean of a subject
of items present in that list.
dividing the sum total numbers in the list by the number
= a
Maebrne hearing a
N
O
© scanned with OKEN Scanner. Supuunises deauning tvil Pol ~Toaining diath
halo bed rect b “|
( foetback, foon Me
> Training ala =~ toluely in rp
: bat
> beth ajp 2e/p cup Pao
both. Na/p 2 of WP tp 7 0 “o ne
D> Classifralidn 7 OY ym fe est
vr ap : WW Op ane
Tinbgy Ont leavni Jrenate
age dows 4
Qiuten/4 my) p Oe
aN tote "fe CNaive Oe ic tp
aye) T° Model,
+ Vrwapanined tanning ye
> ony T/p 2
> contig Chased on ie)
> k= Mee. PD
ly ore. of Sarma lupe.
+ Koi forcement deauninge ss
=> Revwasol / fenatity
eo hewn
( Apert)
Vo
C :
1
© scanned with OKEN Scannerwoot 2
Or ae
Arif Neural Networks —
| Brnin, — thas 2 tae
va raunars [ baste wn)
rane Jakes b/p —> neurons accenk
ipl ae
Yp\) take. achion, 2
Hehor use Buy y
%
7
Xn, wor re} :
gd: EC Sri)
tail ~ Node Crepiiea sf nestron)
Exarnple - jf heat.
fi actin. — Grains
Neuron coAbe Le
and tends 10 brains:
KX, Ho. . wn —
Wi, Wa--. w= _wwuplits assocralocs
(& scanned with OKEN Scanner= = vootp lech. acum
Uy, VO, + 9-7 +. - EK LO _
(& scanned with OKEN Scanner. puiddan, ——Ouetpuh
“Toye tag og
: consfan|
a —, —
x, @ — CO: Le OSX 4+ 0-902+b~ value
wr A -] ahaa 72)
o
sO-7 ze "OO
Rince zyo 4e 0-670
© $0 Relu AF gies
rO~8S a pm es
d me +E]
na inuxcpt
wath on
peradian Pee ot forclin
= 1.
Item
=
ushan. ughen 27e e*
N
A
°
pon,
tee Top N/K
Both usith give diff susulh
Aoak Relu
—_
(& scanned with OKEN Scannerbare A =
en bey OF
06 OT as We aga sane pr ee
to_get v ae
: Activation, Function. :
dinear Lunction :-
t(v) = Q+v
‘nel
bio Loeig Wd eum
A+ Fu Xf
(& scanned with OKEN Scanneracai Sigmotad fumcH om
fms
l
Vv
(& scanned with OKEN ScannerC==0
Curate, ee en_i/p aecordingit of —
: Aipmoid fue
t
at
ma +
-i C)} Ei
Vanishing Gradient Puoblom
tie
= 08 tip inewor,
pueditant does nots choope
So hoop glous
Accurat® | psx oltety' 4a gloud -
Non 20.0 contol pemetim
(& scanned with OKEN Scanner@D holu — Rectified Kimear Unit:
x ipiye
Ae © df aido
(& scanned with OKEN ScannerAr lepucrak, Neural Nebus0sk,
Numerical
Back propopedtien, example ust ee aclivetions
We usilh peur he, foleuriee Bitps
(i) Forward pass
(iD Compute, te total evror
ci) do beeper a
font,
WY) Colcufatr g
(Uplate weights
Given tT er has 2 nodes %,= 0-0S RG= Ol
7s voeights (input > ‘iddar’)
Wo, = OS
W, = 0-20
eg
h wy
A we
a \
Wa= 0-25
Udy = 0-20
1,2 hy ]
bye, I piddon
a he 0 fey)
hy he (he
(& scanned with OKEN Scanner1 5 woof (hiadon > output) Wop = 0:40 ,
log = OUT
wz = 0°50, Wa OSS
Wp 2 fi OL
Sa eae Sout Nod
4 ti Oo.
: ‘1, ©:
‘08 : need Cle)
tiddon bias ® by = O35
output bica by = 0-60
> Baises :-
> Activation Funct (sigmoid) t o-(2)=,
eon: =
(ree)
e-
2
LEC of show 4 ytarget ouput
Forward Pass j—
O > obtained oukbul
© Compuls friddan hay. net sinputs and activadion
nebh,
u
WO, Gy Wy Ky + by
1
OIS XOOS + 020X010 + OS
0-00TS +
0-02 +035
o-BFtS
nl!
Te fy = wy%i + Wy % + by
= 095x005 + 0-30 X0:10 $35
90-0185 + 003+ O98
2 O-D925
Apr Sigmoid ene funtlions
Ai (out) = = (reth) =
Bt | acozis
pet Lt &°27"5
Ag owt) = 7 (neha) = esas = Omics
(& scanned with OKEN ScannerComput, output layer net tpl and activations ».
NebO, = AyWs + hy Weg +b
= CHoxO-544+ 0-596 KOUS TOE
= 1 fos
netO, = A, wy t hy wog + bo
0-sq3 XO-Go + O596 ¥ oss +060
= 029654 0:3278 +060 = |-2243
pelt Agroail, actuation, funtions 0 get outpuls
© (out) = o (neta) = ios = 0-7513
Onouct) = o-Crot 2) > pra = 0.4424
@ Crmpue bool vor Csum 4 squcseud evr)
Fa = 2 Cti-9)*= L(o.0- 04513) = o2ay
Eos = 4b (tq-oay'= + ( 0.99 -0-4729) = 0.0295
Eee = Eo, + Foo
= O-RIFS
(& scanned with OKEN ScannerI. Backpropagation Cqrradient and updatis)
~ Beegropgain dis the aligosecthin used Jo Train newrall
es sua dha, werglets un the network 20 tha
the = nelioaek's psuckichions ge closer fo Bhe dosivet
owtpuls :
Qrrockionts i ub othe puciol dovivaltve of the evr,
function, uwilty seespects bo yee 3
strum abicaby for weight w;
Wright update susbe »
WOnaus = Wy — K 3&
pers og LU oe vale
Now dn, given rumetcad ate oer ght
x utp hey
gq, tah = 02989 .
,
é. : Enron
3F€. 3€. Ow} helo, 7 ps
Sa But, MOMO Beef *e
=.
EF ig Gradient
dO Ws Ooutoy ba.
” t -_
partial How totes] [tt mn
re votive, ean 0; Cout) tat 0;
otal changia
wrt Wr vont wrt ages
de gradient 0 (out) nelO| ee
(x ort moons — with veapet 40 )
XK
(& scanned with OKEN ScannerOEtetal = | Oubo, — Target 0
OO o751g- OL = OTHID
- SOubO = | ouxo, (- outo,) |
—
oneto
= 07513 (4-0-7513) = 01868
+ Oneb Foe hy = 0¢990
ows
> Germ ~ 6.7413 * 01868 X OS5F32
ows
= [eo 08214
<
Now new value a Wwe ss
WweX = We- K* Shon
*
ushere we — no We
woos = old We
= a rate
wee oy 0-6 ¥ 008213
Oui Oreta28
0-350%
Ain wt find, welt cea Mia
we, ©z and wg
me el
(& scanned with OKEN ScannerAdpusting cups sn Riddance
boy Eres, _ boat, SOHN y Srathy
wor bbs) Ow ~ Qout hy © Sheth) BW]
by
en
© Oba Fo, 4 ooo
South, duty cam re
_ | Eo, ye SnetOy. dEo2 ye ONetOr
_ i —" er Douth,
sell 30 S6on Y doutOr
* oe o eA A x [OS
Jouko, omeon “H)| duos anton
. [orseua x oy] 4 posses Hose} if
a€ =
o= fom! = o-ssa9 4+ (-0-01904 a heaaial
2 ‘Out hy E (eae
© 5)_Cateutalvant
Ey _ dutor— 02 bdoutos
Za maps eoatta. — exteni-ouy
3
= 0-77292 — 0.99 pete = [pss]
= [ear
ss
4
se a
= 27Ol = oto, ~ Tnsyel 0, 20utoy _ = buld4(I-eut-o)
3 Outo, neko,
0-513 — 0-0)
= 07513 (I-60 #913)
= Ocg 7
/
(& scanned with OKEN Scanner
= OFSNow we heed to cabotate
= Outt,) = 0-24) 200404 }
= oth; ( | D) [ecaieorey
pout hi
aon
anet hy
Snet Ay
osha
OWL
use knovd
woe. posi diffarntiot, eq- 1 vark v9,
we ae
meth, = uy t WX + by eq-t
ant hy = %
aw
Novo —T oll Dcomponents calculated,
dGiotab 3¢ SOW 5. Snethy
“Douth, ONet hy aw|
0363
oeeeet ee 0-2QU12 %* OOS
S Evol
Ou,
_ 6-006 128S~
a
Now we caluclate od value of WL
OE tort,
wit = wy - K SE
Lobo w/t = nLLd LO)
uw, = ald wo)
Ko = ‘earmining rae
So wo" = Org — 0-6 & 0.0004 38668
‘ = 0°(49F d68S92
(& scanned with OKEN ScannerSimi use caleulot ugg, ur, Q voy -
Prope these changed. tour hk and
A one cyeke 4s tide epoc Cone full paw
the nowal neltowrp
Contin this prow tH the mor %
minimal er equal oe ais —
NG
(& scanned with OKEN Scanneren =
Before inpat It ANN
pring of i > impak layer
imonge “2 siaen OF of image p oye :
Sininn
al.
ray Pelli
Comvelation re
Activation a
Rel
lonceletien — important clement i, Fill Cfealtire
|- rT I —> hoop feculiure 48 Lemmen ‘uv
ry & 9, O He
Var tioat Une ~ 2,3 9 ete
J “eu, Tluse ase fects
tn
1
toe need to Fetenty
that ib 4
Loop file
eclo xg
i a. Ung
Wwe place (Tue en im Rial / i
Cots ushice ounatapp fy Pe cee
Veli ase rpuad
on meh y Ds acing en
ete T4faat tod ttt
Ei = oy
T Cttat two. of entail
et [ele
meas Alou? Cloth ome cole
on} Acfeonty [itive spect 4 ek
ketene again place file
tclee, woop
Po e2afeaa [we Wt bel LR LEI al
a & .
fen
| . a a
Now Sto lian
oss |o.
© scanned with OKEN ScannerCNN vidal for tasks Like dimaga clascigiention, efject |
detection, Ancl segmentation
b Convelulson, Ketwoukse use a process catleel convoluton
veohtth: Combs'nes two fonctions fo! thow hour one changes
the shape ae tae, other +
> Role 4 the ‘Convouitton melwork. ue to Seohuce the
rmaga lurcto a fer chat 4¢ easier, to process
esitnout fealtoros thet ano oritioal for potting
a goon preelie ton.
Losin
How doe CNN Works 7
> Aixt we understand het, an tmage as arel how it
is seprasenTeal
An RGB image tt nothing beta matux of prkel valuu
hang Dheee planes , ‘whereas a Grays fmape. Ls
the game, but wt hata ee” plere.
Exampu Tage +
For Siempliceg det Us consider gregicale ones
(& scanned with OKEN Scanner7 Ly ‘3 4
ee LLP
To | Uf |e
oft for of
: Be eae
kK Convolssod,
Tnput Data Kewnad
On loc Li,
Fi (for,
+
> The abouc tmage shou shat © tonvolwlton J's
we Take a / Konad (3x3, malsu) and opply oe
to the input image to ge the emvolyed. deakiny,.
This convowed fecdtwo, is passek on Jo the mut iyer-
> The no, of prrametins dim a CNN doyer Aopends on
thy size oy te eet he fields ( Pir kernels)
and £he Number of fillirs.
> Each Nneuren in 2% CNN dayer suceiva inputs
foortlfe eenl, sapiens faa foe prandias Aeype. Kame
an Hs scnceptins, t! .
> The socopte tala tape suns he Inpe >
eta et oat rs lg oe,
feative Shap as Lhe eclput ;
> This omop thon ie po paeae
se cific, Linoas unt (Rel) activelion, funetion.
5 claccle CNN awchitectunes Like oe and more
modern ones Like Reset arpiaty hts ferdemontal.
principle -
(& scanned with OKEN Scanner‘onyolutional, nuwul networks ane Comporech of
I ttiiple feqe % astifictal, Neon.
When we on] an ima. into a ConvNet, each,
Haya generates 2¢verok b clivation, functors
that are passech on ta roxt Sayer fer featino,
eefiadtion! G
featiwuw ExDiachion iv CNN t
> The pout doy & uescinlly exbiacs basic peatiina such
as fortzontal or bt ok es This output 6
passed en 40 Are next wry uhh dotecls more
Complex featiow such corners or combination
te Va, we mou dleeprr unle Lhe nitioork
| itleam ftnty eer ere compen fete
guch as objects 1 faces ee
> Conpllels axe feed- foroard networks, Drak
proces the vapet data, Jn @ single pass:
> Based on tre acuvalion, the firoe
convolution a, the elassft cation, Sayer tl
a set of a dceres ( Value “between 2
Shah ape
Abe image is to bolon,
Ras eed) ah a
> Gradient doseort Lu comm
ophimrction aly orithmn clue Dh lacy aout
to wou hs “hy He “oud
subsequent toys
kL
a
(& scanned with OKEN Scanner5 NN was pout duel ord al 2 N
(986s.
+ wahat ib a faring lays |
on is seesponsible for seolue! De
> Pools
apatiol size of the convolved feature .
Unorour to Leowase DL ulational poor
seep ints: do prows the oldtor by sediaclig te
dimonsions .
Tan one foe Gs Fp”
+ 7) aunage pooling 3 [3/2[1 Jo
+ i) max pooling .— efols Bit
— 2 B
costes, GR belay
4 2 [2 [e
Valse. of a parol =
fiom dO porlsn, i couareol,
; P Chats pe by tre
> Max froling also pejoure as a Noise Suppressor.
> Tt décandls Are noe sw aelivahon ab taper
ark also poferms lenses tong. uxcth-
camenuonalty veoluction
Average fooling -— sultry the cuurage of all the
batt One Portion oh the nce
Courrecl. by thy Kernel.
> Max poly foferms a lot beta the Average
0
(& scanned with OKEN Scanner¢
Average ee
Usha, Aoppens after C7 un CNN Ts
> Marry CAINS Stack multiple convelutisnal ancl
Posten Aoyos Eo extiact Heong absback
feotpus ,
Example: fiat lay. clelacks eclges > Looper,
eg Aki cle Ahopes , objec ete.
ein) tay |
~~ Conunts te 2D featinne JVI C ct,
pooting) tuts a LD Veclon ? “p tem
> This prepares te date. for input inte
a an connecTec doyer.
(& scanned with OKEN ScannerEe Commas hay ~ ding, neootk. Sf, Murer
Connection, beliucon,
We use
iso mens
Connedlicl Netpale fo clasty
imoge to potatos mand oy on aa
extrac SMG fealines fm de image ms
Convetutrion ae Ooh peoll
The numba sf, ruwesns in, output ot uae)
with be same ar Lhe no-of caegonts
use haus
TH we ae Ag binary clas fication then,
ge achiatiin function and
tse hose 1 neuren i obetpat Lager
Tf we trou Suppote Y merery use vac
Seftma x ac'ualen function,
aoe Use $1
(& scanned with OKEN Scanner