Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
47 views12 pages

Unit 3

Uploaded by

shraddha chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
47 views12 pages

Unit 3

Uploaded by

shraddha chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 12
, science is. a multidiscipting ary field that uy pata ind unstructured data, p, Scientific © 1S Such a fn tod to a pene - huge field and comey insights from S unifies statist cept that’s often Wa scien intr : ine Heaening and related fet pata science life cycle provides the a Hifecycle O' tH s 7 _ e str roe lteyele outlines he Mar steps, fom a tere a $ AProaches to managing peo MSHS that proj vrandard process for data minin Bing DS projects, amon esos UMA follow. Now, jatbases (aka KDD), any Deere a CRUSP-DND ONES ich ac Cross-industry raat - proces: ; sata few other simplified processes ed custom Sree ee Ss. njured up by a company. Lys, TUcture to the dev elo PMent of a data science project. crish- DM cRISP-DM is an open standard process ~ ecienti 7 model that descril data mining scientists. In 2015, it was refi lesctibes common 15, ined and ext approaches used by minjogy called Analytics Solutions Unified Motel ier Der wicaer oleae ining/Predictive moth (aka ASUM-DM). Analytics the CRISP-DM model steps are: 1, Business Understanding 7, Data Understanding 3, Data Preparation 4, Modelling 5, Evaluation and 6. Deployment a8 nce eta [ESE may |SeRas GENTE. [nore 1 soni ots | secon ; axigreons ‘bens praised enmie |" Re me es Seen See ees oo cos | ais eens | Bea |Meat ng ‘Cee Pier desereeon | Cons Cleaning Report \apsrowed Modets | Monsoragety ean el oad | ao Assess Situation oat Construct Data Test Design | Revie Process a Se ay lerernre., Recreation oeresarass, demamoat ene Pete Fatt Mroeemont | Rrrot Poorer semings steps | Fis! Preaacon amen ieoemoes) | |S (pga = eceyousceomy [SEES — [EE acini | Bot oe = | a | ome Coss ant Senefits Reformerted Dota Model ASsesionent oes | Deraset ‘Sertings - ae Se mo | reac Pre re | peewee initial Assessment of mes aes | Knowledge discovery in databases (KDD) KDD is commonly defined with the following stages: Y Selection Y Pre-processing Y Transformation Y Datamining | ¥ Interpretation/evaluation The simplified process looks as follows: (1) Pre-processing, (2) Data Mining, and (3) Results Validation. Suppose, we have a standard DS project (without any industry-specific peculiarities), then the lifecycle would typically include: Y Business understanding Y Data acquisition and understanding Y Modelling Y Deployment v Customer acceptance The DS project life cycle is an iterati : an iterativ. i ipuiascea the taka tecdet oe a Process of research and discovery that provides Sea ee a node | =P eticory models. The goal of this process is to move a ject -point by providing means for easier and cl exmmanication between teams and customers with a well-defined set of arifets and standardized templates to homogenize procedures and avoid misunderstan Each stage has the following information: + Goals and specific objectives of the stage + Aclear outline of specific tasks and instructions on how to complete them : The expected deliverables (artifact) Business understanding 1k on a DS project, you need to understand the problem you're tying {F your project by identifying the variables to Before you even embai rojet central objectives of ‘o solve and define the predict, oy Identify key variables that will serve a model targets and serve as the metries for defining the suecess of 6 profs ness has already acess 10 or need 10 obtain sth Y Identify data sources access Guidelines: Work with customers and stakehold ‘hat data science needs 10 answer. ane The goal here is to identify the Key Tea teeds to predict and the project's ders to define business problems and formulate questions lers jables (aka model targets) that your analysis se ould be assessed against, For ‘example, the sales forecasts, Th Your prediction what needs to be predicted, and atthe end of your project, yout gop to the actual volume of sales. Pate Define project goals by asking specific questions related to data science, such as: How much/many? (regression) Which category? (classification) Which group? (clustering) Does this make sense? (anomaly detection) Which option should be taken? (recommendation) ° Business Requirements Y The purpose of busi the criteria of its success. Y Business requirements describe why a project is needed, whom it will benefit, when and where it will take place, and what standards will be used to evaluate it. Y Business requirement generally do not define how a project is to be implemen: Tequirements of the business need do not encompass a project’s implementation details, Y “Business requirements are higher-level statements of the goals, objectives, or needs of the enterprise.” Y “They describe the reasons why a project has been initiated, the objectives that the project will achieve, and the metrics that will be used to measure its success.” ¥ In short, business requirements chart where a project is going, not how it’s going to get there. The business requirements the analyst creates for this Project would include (but not be limited to): + Identification of the business problem (key objectives of the project) ie, “Declining ticket sales require a strategy to increase the number of customers at our theatres.” + Why the solution has been proposed (its benefits; why it will produce the desired outcome of returning ticket sales to higher levels), ic, “Customers have overwhelmingly cited the inconvenience of standing in line as the primary reason they no longer attend our theatre, We will remove this impediment by enabling customers to buy and print their theatre tickets at home with just a few clicks.” + The scope of the project. A few examples might be: ess requirements is to define a project's business need, a5 wu while the plan is to bring this project to all 400 theatres eventually, we will start with 50 theatres in the most populated metropolitan areas, + Rules, policies, and regulations. For example, “We will design our web site and commerce so that all other relevant governmental regulations are properly adhered to.” + Key features of the service (without details as to how they will be implemented). A few examples might include: “1. we will provide a secure site for the user to select the number of tickets and showing they wish, and to enter their payment information. 2- We will give the user the option to store his or her card information in our system SO. that they do not have to re-enter it ina later session. 3. The system will accommodate credit, debit, or PayPal payment methods only.” Hey enronnnnnes ) bi Testy Ht 5 RY Hew aig a ie a en eng Hat ea i hey tte av any given (inne Wana aon wlll gn tite Noe Pee a ay cerilyRentures Hi HY Iain teen at fe Fein, win wll if oe Ae a MOMeITLAIteN unl within aie fer formanee 1 Ahonen Wak manny set Crileriy (mente hihi pvt ape ty sonnet ravooennltil UT Theta bt PiU i eLrHh (ty 200% tee eel 2000 lovee Hhie project will b within 12 ment aaa 2 mnutin of its assed ante), 18 ihentifier vine a tune japan! renting Huston Feguiony events would not Inehude . Ader a How to adhere 4 Fy dexeription of how we 1 avers Et how penn . Pee regulatory requirements Fniinitow ult xyz promen information bs pe I Ini rnented, such ne, “08 | Any dexcription of hi D ed val be bucked up evens NV ) w the uni J Any detail or specitic Hae Hicker identifie fe ae etn oi co Vil he 20 cl saturgs, auch ox; "9, Tha credit Care ye 20 characters Jong and a ae ae oe ver user neleets Yeu w Ww KYZ Morey (01), the information will be foaded Zw med to our KYZ storys 1! called." extual, business 1 best serves the significant input ata hile the above examples eed compunylng nelected bulle e ns illet points are project: fom a models, or any combination of these Ha strategic ihinkines rements may inelude graphs, ne needs of # project fective business requireme er, and the a nti require rong, bility to clearly state 1B project's business: own high level. swith all peau’ rements, business re Just becuse business requiren ations doesn't mean they mu quirements should be: nents state busine? able. wwin’t be demons quirements are specific ‘and objective. A quality control expert must be Tor example, that the sysien rnecommodates the debit credit. Pin te business requiremeny ‘syne could not do 30 if the vee whe system will secommodate appropriate For example, “This bably too Verifinble. technical spe Verifiable re able to check, PayPal methods speci requirements were mor val payment methods.” (Approprie stating, precisely sue, te i erpretation.) ium problem is beings 16 d. 1 if ticket sales inerease sufficiently,” 1 pro! ing. atthe progect’s end: role), Business requirements jeture, In the aforementio id know to design a systery customers the theatre chain h uirements, the at any ject to int vague for a + Comprehi are indec example, that could acco! developers mis! one time without performance i requirements answer the what's, not the how's, but they are ig, No business pont 15 overlooked. At a project's Remember th meticulously thorough in ical record of the init; hould serve as a methodical record itil by, should ehd, the business requirements sh "ey Problem and the scope of its solution in perspect Hhicctives and requicements from a domain perspective Ject objectives and project obj and y Understanding the em definition with a prelim “averting this knowledge into a data science problem defi ften structneg na oo R this kn e jects are often structured ay, Hesignet to achieve the objectives, Data science projects are band as eu Wesigned to achieve the objec tailored and built for. pevit Is of an industry sector (ns shown below) or even {ora ing 4 Stevitic needs of an industry sec € project starts from a well defined question op need Steanization. A suecesstil data sci Sata Acquisition i DAQ oF DAS) is the Cn vv auisition (commonly abbreviated as the Process of sampling ean e measure real-world physical phenomena and converting them aaa ter and software into a digital orm that can be manipulated by a compu! ‘oni istinct from earlier forms of recon Data Acy on is generally accepted to be distinc! +3 ing tote Reon a ey seed those methods, the signal ay cnt from the analog domain to the digital domain and then recorded to a digital Medium suchas ROM, flash media, or hard disk drives, SThe Purposes of Data Acq . The primary purpose ofa data acquisition system is to acquire and store the data. But th HS tended to provide realtime wat Post-recording visualization and analysis Of the Furthermore, most data ‘acquisition systems have some analytical and repon generation Sapability built-in, ey are Post-recording data review Data analysis usin, ° © Real-time data visualization © Report generation #8 various mathematical and Statistical calculations * Data Preparation tt . : pata is information typically Daregorical). Variables serve”), Tests of Mmeasureme " 4s placehol HeMeM (numerical nation, ‘ a cal) or cenuntiny (cfrables. numerical and cateworiany, sen mre ning numerical OF continuous varany Arerval (C2. height, weight. te int interval and ratio, Data on an j meaningfully multiplied or divided eg one day is twice as hot as another day. ¢ peadded, subtracted, multiplie SPL any value within a finite er infinite Toe ag re te tie type8 of numerical inter Cale ey types of nun One nile en ye: addled nd. wubtencee te coment ‘ise there Dn tha ere 9 true 2800, Por examples ave count nav thet yn the other hand, i data on a rat e hus true zero and can d or divided (e,p,, Weight), ratio scale his true ze blood gly categorical or discrete variable j two types of categorical dat ordering in the categories. For example: ordinal data does have an intris nominate cit HCE (wo oF more values (categories), ‘here camp at snd ordinal. Nominal data does nat have an inteinate dere ite” with two categories, male and female. In contrat, i mn aa 8. it the categories, For example, "level of energy” wit aee orderly categories (low, medium and high)? oe example, "I oY = ea Keon = Categorical Dataset Dataset is a collection of data, usually presented in a tabular form. Each column represents a particular variable, and each row corresponds to a given member of the data. Columns Values Rows False Tn 7 Tue values. here are some alternatives for columns, rows and * Columns, Fields, Attributes, Variables Vectors * Rows, Records, Objects, Cases, Instances, Examples, * Values, Data Im predictive modeling, predictors or attri Auribute is the output variable whose value ii function of the predictive model, Database Database collects, stores and mana such information. It presents information in ta @ relation in the sense that it is be related ace: rding to commo related tables j (DBMS) handk SQL (Structured Query Language) is a database computer language for managing and ‘manipulating data in relational database management systems (RDBMS). SQL Data Definition Language (DDL) permits database tables to be created, altered or deleted. We can also define indexes (keys), specify links between tables, and impose constraints between database tables. CREATE INDEX : creates an index DROP INDEX : deletes an index + CREATE TABLE : creates a new table + ALTER TABLE : alters a table + DROP TABLE : deletes a table and SQL Data Manipulation Language (DML) is a language which enables users to access manipulate data. + SELECT : retrieval of data from the database INSERT INTO: insertion of new data into the database UPDATE : modification of data in the database DELETE : deleti f leletion of data in the datab tabase 1p (Extractit « ion, Transformation and Londi sonding) E pile funtion xtracts data from d lata soures es and loads it it ads it into data destinatio stinations using a set of set of transformation Data extraction provi a a provides th ili fiat fies, relational databases, ently to extract data from a vari ea Geaaelarmation provi WUE aoe eS Onc aaaS such as s the abilily to ‘cleanse, es, and ooneubed data sources. : |. aggregate, merge, and split , and split data. Data loading provi provides the abili ability to load data into destination datab: : latabases via update, insert or delete statements, or in bulk. ETL Process Destination Source cz | 6 | | - = O Credit Default Datasets a and visualization of statistical to focus for » Data Exploration 5 about desc! pects of that data in fore data in order t a by means ribing the dat important as] hata Exploration i shniques. We ©XP bring irther analysis. y Univariate Analysis Modeling Predictive modeling is the process by which a model is created to predict an outcome. If the outcome is categorical it is called classification and if the outcome is numerical it is called resression Descriptive modeling or clustering is the assignment of observations into clusters so that observations in the same cluster are similar. Finally, association rules can find interesting associations amongst observations. Model Evaluation Model Evaluation is an integral part of the model development process. It helps to find the best model that represents our data and how well the chosen model will work in the future. Evaluating model performance with the data used for training is not acceptable in data science because it can easily generate overoptimistic and over fitted models. There are two methods of evaluating models in data science, Hold-Out and Cross-Validation. To avoid overfitting, both methods use a test set ( zo y the model) to evaluate model performance. gjold-Out jpthis method, the mostly large dataset is randomly divided to three subsets: 1. oe set is a subset of the dataset used to build predictive models. 2 Na pee estes cuits dataset used to assess the performance of model built in the Ta Prming model. Not test platform for fine tuning model's parameters and selecting the bestperforming modcl Not ll modfing algorithms need a validation set. a amples are a subset of the dataset to assess the likely future performance of a model. If a model ae 8 ae el fit to the training set much better than it fits the test set, overfitting is cross-Validation When only @ Hines eae of data is available, to achieve an unbiased estimate of the model peor Oe nt k-fold cross-validation. In k-fold cross-validation, we divide the data into A Fi equal size. We build models & times, each time leaving out one of the subsets from yaining and use it as the test set. If k equals the sample size, this is called "leave-one-out". Model evaluation can be divided to two sections: + Classification Evaluation + Regression Evaluation Model Deployment ‘The concept of deployment in data science refers to the application of & model for prediction using 2 tev data, Building a model is generally not the end of the project. Even if the purpose of the model is increase knowledge of the data, the knowledge gained will need to be organized and presented in @ way that the customer can use it. Depending on the requirements, the deployment phase can be as imple as generating a report or as complex as implementing a repeatable data science process. In many cases, it will be the customer, not the data analyst, who will carry out the deployment steps. For ‘ample, a credit card company may want fo deploy a trained model or set of models (¢.g., neural werk, meta-learner) 10 quickly identify transactions, which have a high probability of being fandulent. However, even if the analyst will not carry out the deployment effort it is important for the customer to understand up front what actions ‘will need to be carried out in order to actually make use of the created models. methods: foe way of deploying the models in data science. tools (or cloud) gram C, VB, «.-) Tanguage (Java, C+ Data a SOL seript (TSQL- PL-SQt. PMML (Predictive Model ‘Markup Lang! mining too! (OsE0) (0 deploy ad Model deployment Ingeneral, there is yeep ision tree model. Anexample of using a data

You might also like