Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
24 views27 pages

ADS Assignment 1

Applied data science some question and answer

Uploaded by

yifiw66477
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
24 views27 pages

ADS Assignment 1

Applied data science some question and answer

Uploaded by

yifiw66477
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 27
Explain dada science — die ceyole “um_cledoal Doda Sciemee is “the _domaim. “of —atudliy “hed deals seth — vast volumes of data using mederm tools ame techmigues te fimcl_umseen patlerms , derive -meariimng}ul eematinn amd make usimess decisions Dada Seiemee_ difecyole_ aevolues arcund use_sf ML end difperent amalyticol steatepies tc aoa imaights amd predictions pom imfomnslion im ad? jie acyuise oa commercial enlerpeise ebjecluv 41 15 a ilesotive set of steps te 2zeqsiired to deliver ATE, A difeyole af Deda Scienoe ) Sdentiging_ paste amd amdee séthing the coach yeocal: Sdemtipying prolslerns by one of them major steps recenaly im the dota acience to finda clas at re _ealay since it usill decide the final goal analysis This hose should exarrime amd passe lyst cone stuclies ” of similar amalyss , assess im hase AOSGUtees » dnftasleuduse , total and technology needs - ~- Shis phase shauld - + Clenely stole the peoltern thd 2equiixes sollions and ub should be sesclued od once | + Define the potential value af the _bussuners project |+ démd aiks , imelucling ethical aspecls imvelised. im peapedd |. Build amd communicode ao highly unlegeated , flertlel? — prayed) plan: : = } Data Collection oo Fa dm this step 2eu2 doda 1» cap from televomt | sources . dhe coda captured cam be either Jn — ates” ed! | ot wnstectured fowm - _ . a || she methods of collectin, dala might come from - dogs websites, secial io data, data fem online __ || zepositortes amd oxen dala streamed feem.anlime : || scuzcer via APIs, useb_sceapping, er dota preserd am excel. ee 7 Oke must kmau the difference beluwreen —veiriceie data | sets availelsle anol the dala umseslmerd _stsategy | of cm exgamisalien amd keep drack ushere each dada comes prom amd whether 2b 2» up te date ot nal. 6 > Hada ian Gm this step the dala is conuerled inte a umipied a} for _ameath dola peoceriimg Hela in y20cd ed! ustth ETL procers Céxteact , Teamsporm ame) lead) € data science operations are carried aul - The actions 1 be perfeemec! ad thir slage axe - + Selection of applicolele data + Data inlegeatian hy meams of mezging. dala sets + Data cleaning and filitation of selevard dmnforrolin + desing the asking valuer through either elemimabing them ox np umputisng them Z - + Jrenting dmateurole daja theough Canalo them + dest fect outliers the use. of bare plels 4 ospe uth them. |G is the mest dime comsumirg bed mos) _essenlios | slep as your model ustll be as accurate os your dala. Bato exploration — a © |" Dada Amalysrs i least aul by ve vases adatistical tack With the suppse of dada engimeer eleusing steps are — carried cul foe taplovalety data _aralysis : - _tExanrime the data by —formuletling dhe various as | adattistical fumstion / fo |. tdentify incleponderd amd depenslent variatsles = | + Amalyze deey feotures of dada te wok on | + Defime aprend of dada : Doda scientist explore clisteibution of dala imside | clisdenetive vorialsles af a chorocler geophicolly by usage | of bar genphs alse 2elalions belweem distinct nibecia aie Eolas via geaphicod tepresertatlions Lider _acatler c= plat _ om uwoemth maps. e I Datla Modeling | Amedel should use prepared amd amalysed cata to i provide the desited output dhe emitzennmemt needed | dor exeauiling the data medel will be decided amc! - ceaded belore meding the _2pecifie zequizememts. | She tearm usceks together to develop dotascls for _ daouming amd tesling the made for pesduclien pusypeses, th | 4 abso imvole tasks Like choosing Ih? apptopsicte meele igpe and dearming ashether the pecs dina | classification , pesion ae cluvderimg problem Affe amalysimg choose the algowdhrn to implerren » 6) Medel Deployment dhe -mecdel is finally prepared to be deployed 4m the desired Joumal and preferred chammel- Machine deoumimg pede howe to be ea the Lyme precerb - tn enezal these au “rhage amd ample “sith producls and application dh on celses the ceenilion of 0 delivery mechamusrn zenptited 40 ge ihe made aud im he marke) anmamg the uxers ete amother sysem. He ML model are abe deplayed en devices amd gaining adaption and populaxity im field of avmpoli waite a male an “Role of Statistics um Dala Scierce . Statistics xa fied of maths hod help 1» usotk | with dala , ushether itS numbers, ce deciphins - | D's all about callecking , checking omcl maleumg sone | of dota 20 we cam make er choices . | Statishins 2 a ceucial topic for dada sciemer lasing | cdalistios im dota science helps um umemver ne. Limsighhs , make beHlet decisions amd even predic | future daemoly _© Role of statistics im Datla Siemer ) Data teaming: Statisties adds im iclenbipying and dealing uth enor oe oulliess im dalasels , ensuring data quality - - 2) Descriptive Amalysis: ff alleus ust dumamange and andetstamd data through measures like —meéeom, meciam amd standard cleiation: 3) Inferemtial Amalysis + Slatisics helps make predictions amd. chau conclusions about orger populations based am__cample data. a) A/ B Testing amol Expetim aril : A) B destimg a pouserlul teckmique a deda science comes imlo play. “9 Hypothesis testing: d's vital Jor esting ideas ond hypotheses amd deterarunimg xf observed patteems ae statistically iLicand - 6) Machime dearming Statistical methocls umclerpim many HL algorithms , guickong model olevelapmert and evaluolian 7) Dada Visualization : Statirties enhances data visualization, -metcun it ensier 1© camvey -imsighis to non- techmiced stakeholders: _ Bemehits of Statistics im Data Science : sa 5 make Data Scientists make sense of the heops of dada they work usth. dt helps us ts 2) With aslatintios , we cam make imloumed olecision - Inmagine you'ze checsimg belweem tuse ice CREO « Stalisties cam tell you ushich ame peaple like mare based on suneys , ao you pick the igh ene. stall | —-Aoyp wu» _cadeh _ermnel conse) vist 2) Statistics helps us predic! ushed might happenin the future 41's like ooking 04 past useadher —cleca : had will —20im tomorcus ot nal to guess 4 9) Data Scientists use alalisdics to cud eercors Stas — faker un ous data amolysis. — = ee es, SESS ~~ explain the _sigmificamee —of— aja_sciemce _comsidetin 9 volume and chimensians of dala.______ - Dua Sciomer Ba mubliclisciplinary field that imisoluves ) Volume of Data “| tig Dodo. Handling led douly , commenly referred! to as ‘big doda’ his imelucles dloda___ from social media ,semsess , online Jeamsactions and more + Dala science provicle tools cmd techmi ques to _efpiciemlty pescess.,_amalyre amd extioct valuuclsle imights fem the messuse dodasets = Scalability a an : deadilional data processing methods may steugele 1° dr today's digilal weld Large amaumls of dala ave i hamelLe Jorge volurmen of data - Dela science Lenerages dlisdai buted ‘compuilimg and paralled peocessing fo scale up ih? __pzecessing pouser , enalslimg the omalyis of enormous dolases im a sesonable ameunt of Lime: = Patiewn Recogmrihtosm : - - - With a lorge volume of data, dlaja scientist, cam ademtifiy paberns , “hemds amd correlation that max | mat be appaxent im amoller dotasels .ghis helps — ee im making more accurate peedichions acl — denfoumed —desisans z — pthilules co ferlures For ezomple , im image dala , each pixel com be a _dimemsion . Data acience methods - amd make sense of such complex. datasels - - Feature enguncering: Dota Scientist uxsrk on feotuse adleclian and enginesiung te iderdify the mash welovord dumensions se amablysis Shs umuelses chessing the tight vauiable oe ottibules thal camdaibule — the mot ta the model's preclichwe power |= Dimension aliby Reo chion : ——— . Heading 46 the cusse of dimensionality Ded 7 Pmalyis | emplaiys techmugues dake Primcpol it Datubuied Slochasuie Neighbor embedding cPCA) ot t- | Ce sue) te tecluce dimensionality while _presenring | the esential information: - _ pe | | | thu the a ieomep of data science | phe sheer jo tackle the challenges posed by volume anc) olimeniors of dala St empowers orgamigllers _ cmd imelividuals to extaaed -menrrurrgy bud yraights » make —_}- dalo - chien decirinrs amd) unlock the potential ____- | within vod ard complex datasels . {| | a t | Ae —— Je apply dota explssalion amd sual sion dechniques txplaim Hypothetical teshing — im dedail ain a Hypothesis fest “ya a_abalintical cmabhed, ‘hed a4 _used 46 male a _atatisbica) decision using experiomendas dala ———~ Yas bassially an assumption shad ax ne make about a | population : “tue —mruntually exclasive Uatajements aleut a population to i | slodement_is__betler uiled. [supposes by working. of Hypothesis Taing — 1 Define “pall ae ~ pitesmale “ngpathexis_—____— “Null Hypethens _CHo) > 44 ise -generxol atodement “measured cos 2] choose » siymificemee fovel. re - Selecta sigmnipicamce— Level ©. 5 “typically 0 0:05 = abeenad the threshold foe _20yectimg—the— “nu hypethesis — The p-value isthe _prooolsilitiy of vyitling, she obscused pelts ushem the He Jo} Bivem—_pssblem in te. tf — Pvalue <_< then 2eject mull hypothesis hus p-value ws the ctiterion ased_lo_ coleutade significance value 3) Collect amd ighlyge deta — roy zelevamt dota thacugh abseasabion _ot or expesimrilin >the dada —usira appeopiaie an andhesd- —_@btaim atest at _ a} coteutate — Test Statindias the dota for the teal — anes in Hin — _ dep: — she dice of 405) dapend>—a onthe yee _test_beimg conclucted a oe pte ae tes} ) z- test: meas, Pet | atamdarel “dewiation. yee e_aize is tome Z- test ds used __— "= sample mean ___6 = SD_s 4 A test Le tet _cam am_only be. “used _ushen_ ieee the _-meams of tuo _geaups. 24 population sD are ume amd scumple sizes Amal them use _t- tea Ttest_in ured Ahem mn < 39 - t+ Hou] $2 5Dah sample ae n= Semple sige —— © Chi=Square Teds : we ae test_is_used for _calegarical dade_ot fet jebing __indeper lence _im_conihimgemoy faleles ae — Did, = Ghstaued gran! auleell _ Eig - Expeded fueq. in cell — Ag 7 tous 4eclummna —|@ anova ted — | Amalysirs of Vasiamer in a atatisticas lechmique- used to || check if the omens 0} tus o_more Groups are ij coleuloded a5: Row tela x Column tial a Teta Obs exwatiorns jain aifesent pom ecuth other. __ ______F_=_ANovA coefficient _ Fegeerpyey __ “Hse | SB ream soon. ace} \5] comparing test Statiatin: — a im_this use have to _clecide_ whether _ eqecd Nal Hypothests » a Shere are 2 sways to decide this = —_— —— ) Ax bse g ceitical uabsen” a Comparing Test sdalistics & Jalsuloted ciiticas value Test statistics > Git’ cal value —> Reject Null Hypothesis Test Statistics < cidica) value —> Accept Nal ypoinetin pe = haimg- P-values _ - value Reed sa ge ake Liars — T) tdepred the 2esully a _ || Ad dest, we cam _canelucle out experiment wing —methed A OB. — 2 — suppose a _seumple_ of Pee eee Jaleo amd theix cholestec! levels are ‘niddsured! im Cmg Idt) 205,196, 210) 190,218,205, 200, 192,198,202, 208, 200, 205, 198, 208 5 20, 192,145,196, 205,210, 192,20 Papulecion méam = 200 a Population 9D Ce)= Soy la Step: Nall Hi Step 2: Defime the aigmificomee _tove} es = 34m a tuse -taed dest and based! en a idisttibation tale , ciidical value Jor a sigmificamce tee. || of 0-05 can be calaalokeat thscugh 2-table as 1-96 LTCE Step 3: Compude the test datishes — -|| Qe use __2-test ay 5:D amd 4 axe kmausn Step 4: _As _alsolucke value of tert statistias 2.0% is ———greatler_thom —ceiticad value of ag ————Nall_Hypatheris ia rejected Thus there _is__Atalistically aigmipi cam} evidamee that _ average chetestral tevel is + he_pepulatian 2s cliff exe mt = ft 200 mg JL a ee Sol”: Sm this case, we werd fo deat whether ihe average thigh mes Of worhert produced by the machiume is _aigrifi comb, different frorn the epee thickness = 0-025 0m toot Step |: Define “ypethenin ee = Null Hypesthesis (Ho) + _M_= 0025 Aleemade ‘eypatnens (Hd * M #0-025- Step 2: Define ihe bignipicamee Level ——— 4} is 0 tue toiled ted based on novel _ disdei bution tale, || cettica! value ai comce level of 5% (-05) Js _______ 2 a quer an —— _ — siep 3+ Compule the test stalisties = - Hoe use use t-test Tees es ts G-w)/ fS/ fe) Steg eth ee H+ 9-024em _$ = 0:002¢m a WL = 00250 __ meio C 4+ (0-024 -0-025) = 7 ° oot Mion 63 X457x1 ey Thin 1 = satiation — follows “a tedishi bution with muumber a} degree of fpecdom ven-l t = O-001(3) = -1-5 o-:00oL Itle 15 -felleusimg dato WED ef aeteaopac Wh lignes Wlpeesei nt aes codficiend from the a Speasamaars Qotrdlalion on Corjiiant Rene Q.? Suppose A gisiem —sestasisa nt secdiulen _am_aurerage < o loo cntomers per day. doe the Poisson Distribution — find the probability thed the restaurant teceies more than || 0 certaim — runner of austomers- Plot the _pouson —___ |dindei bution fer adteast omy 6 dis ceele number are cst om ater CAverage sole —of Zacienees paella ee ___|| dets coomricler__5 _discrebe__numlsor_of-_cerstoormneeb a4 —___— _____|| _Hie= PO ena NR SL se NS: = a She _restarixaml reneives ane sthom_.0 muses — —custeomers—_ pee —dery. =. 2 MO ee Poisson eee - - L= PSU) : _ aie 97° x 100" 0 il |= 0-452 — 0-148 Probability that aestauracnt seneiie “matte thom lo | Tuvmber_of customers per clay ts +O 0-18 LTCE |S discete number —> 11h, 112, M3 h Wd VS at Db p(eeun = e7'O iool! = osozst WwW Las pli) = jet? PO jooNt is Oo ORB nz! ; — — Ly. pC xe aad 2. e722. 100"? Sd @ Ong ge __— U3 1 |) P(x= ua) > 7% 1o9"* = 0. 0186. nat . Is) pCx=tsd= @-100"% = 0-0122 st 7 o.8_ Sreplosim im_hbuieh +)) ensure of contealtencemey DHeosune of spread 3) Skeuemess im.data sith exormple. | i] Measure of Cemt2al Tenclemey a Cemteot renclemcies im Statistics ore the — anuwmericol values __ | for a foxge collection of numeticol cla» — ———————— || Measure of combeol _dendency is the aepreremteitine | thad_axe ured to _2zepeesemt mud - value ot _cemtsos value — vole of a dataset the _cemdeal value ot the amest—___— “coturing valiie thed gives general idea of the —ushole— dataset. — ES ___ seme of the mest _comanenty — useal icembealtemdency laze - Meam_, Nediam _4.Nede Pre the ausea jé_of tne _gisensset af voles | t demotes the equal _disteibutian of values poe a iwem datas). _______ tl - | __Yhere axe three types_of mera 2 _||__Asithmetic eam, _4yeomeltie_meam.,_Hasimenic mes a yAethnnelic Heowm “ulhen use “aay cobeuLale meam_ur usually coleuleile axithemélie mem |= eam for Apngrouped dada - k= Ex = Sum of all chseradliom a _Setol no. af observations 88 83.. 87, $8:, 40 10 students ama have sneoieed % 84,82, 83, 85 timc meom | (Ems (88183 142405190 281284182 e88 +85] = 85.2% —Heam_* t= 85. Doel fet hig I= Heam. for_tpouped Datla as 7 B+ Bin "Fi > fregquemoy 1 obi Es dataport — EFi xi__=> i dodapaimt example e i + Fs IS a ae = - fi 5s | 10 = _ | R= 2 (4xs) + (4x10) +(isxg) = 200 SHIO+S oo | aim of 48,9 sais | aan: Yaxexg = YE = 6603 _ of Hazmeric Mean i o Lis - CTR, ‘ ~ if halh ef data samyale fear the tower half. +) —Feemula = Medion af dngtouped Data ||) ttediam ee |—PMecliom.,im_. stollistion is 4 the middle value..of the | given =| Ee gb alee assamged tm ascending 2 desecmdi -—Amedliam ix a mucmber ned ws _aqperated by the ty ‘phe For Ode Narmlser of Ohserwations — Median = es” teem a | rat cuen number of obsentions — |__Median = (72ers + "seen 2 | ntediom of Senuped Dela — Nesom z a $e cy - A -Aauser-dinmit of clos -h_= magmitude — F = Freqpemey 4 N= tolof — Cf = eterrrae “close = ing the Pedant _ cf o- g |3> Mecle bes i ee ee __a_maninnuien frequency coerospondimg to it. = Nede_o] Ungrouped Deda phat at» ebsersalian_uitth the highest frequency = | _Example -> 2, 3, 3,4 p2ye8 joo pepen iy : aan jag | (fa te fe jut. [> Asayseohh | a olla th. by ben by ued, Mn Son 2 han oceused 2 timer. bye 2 - —t _ Mode of th» clata iw 3 LTCE | Hecle of trayped Data —— _ — ae = rede = 4 a (eo Ee - - = RF ofa 2 tousee Pee oa fas peyo} class vabich proceeds ‘ 9 the meal chore —_______ Fas pp oielnns- sich asec |é xanmple = _ = - | Clan 10 -20_| 20-30.| 30-.40.|_40 =SO_|_SO=66_;___— || Freqsremeyy 5. |_& 12 16 10 | As sloss 40-50 hos highest frequemey it is medal clows pheio, 240, fi 16, fa =12 2 =10 Hele = 40 + [4 <2. 2 2C16) - 12-10, sat fue - ee of spread lesctiles haus cimidae ab sveasden | the set_o} observed values are for a _pasicular vorialle Gdota_itemn aa 94 is used fageasity heated! seal of data Luidhin a _datase}- || vasiomee Co#) sus wld : _ || + dhe _vaxiamee of data is gure by _-meanusing the distance _||_ _of the _obsersed value _feomn he_2neamo_ of the _distsibutian _|| these measured tell us hou muuch | the _pbseriralions “avr _vaired or_similax 40 each other. pent ty ei ges Some _mojor_usays te_-meosure ane spread xP Dkamge Sa ae lide rend Se She zamge af cada 2s given a5 the _differener —__— beluweem the manumuum amd the —-miunicrmurm values of the obserssotions of the datq______—- || + dt 4s bared onal values... evem_chamge in cone cof “hs «is 2 aia)” = (Populeti gra) = ee f _ & 4 abt the data “values axe id entice, then O variance BA Little vasiamer_zepessends thot the dlerla_patuats ore desse tothe -meam _ashereas if the dala _pourts are righty speead aut feo the _meam! amd fuer. emather— ___imdicaites high vaxiamer + dt i» overage of squcreal_olistamer from each péunt te the mean |3> Stamdamd Deviclion Co) +44 in a -meaiure _uduch shous hous much variation | ftom meen exists - an a _+ St coleulates the extent to ushich the values ctiffer ftom. value asill offer the vole of SD S-De = Je? = Mvariamce a) Quewlile. Deviation wot Quartile, ax the values. that disicle. a lh of mes olata info 3 quarters @,., @2,Q3-- —— Qi > tower quartile, .@, <> Mediam, Qs “dapper Quote QD i» half the dlipperener-belascin.- “the upper_amnd Aouser — quartile i dees 1 at Ore his Qa = 4 quote Av = fouser iret Ax = upper timid 28 = Sol”: La im ae .orcley 2,547,288, 10, 10, 1F 1S 412 4 18-, 24,2 25-28, $8 __ | ieaaene Medan — — = aq)/2 a _— 2 an |News tose halk “of dada — — ——EE | 25,2, 2, 84 8410,10. Cele) — — - Qe oe - || Nous upper hol} of dela. a | 14,15 417,18, 24, 27,28 48 __C €ssen) o |__Q, = Mecliam of ee es —e =4 Latha sth £018 42%) - _ Pe 3) Skewness im data Sp - ——— —— — - ——

You might also like