0% found this document useful (0 votes)

23 views7 pages

Unit - 2 Learning Notes

The document outlines the syllabus for a unit on Big Data Analytics, detailing key roles in analytics projects such as Business User, Project Sponsor, and Data Scientist. It describes the data analytics lifecycle in six phases: Discovery, Data Preparation, Model Planning, Model Building, Communicating Results, and Operationalizing. Additionally, it lists common tools used in the model building phase and key outputs expected from a successful analytics project.

Uploaded by

Krishnaprasanna M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views7 pages

Unit - 2 Learning Notes

Uploaded by

Krishnaprasanna M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Unit – 2

(Learning Notes)
SYLLABUS:
 Introduction to Big Data Analytics: Big Data Overview
 State of Practice in Analytics Role of Data Scientists
 Big Data Analytics in Industry Verticals

Key Roles for a Successful Analytics Project

 Business User: Someone who understands the domain area and
usually benefits from the results. This person can consult and advise
the project team on the context of the project, the value of the results,
and how the outputs will be operationalized. Usually a business
analyst, line manager, or deep subject matter expert in the project
domain fulfils this role.

 Project Sponsor: Responsible for the genesis of the project. Provides

the impetus and requirements for the project and defines the core
business problem. Generally provides the funding and gauges the
degree of value from the final outputs of the working team. This
person sets the priorities for the project and clarifies the desired
outputs.

 Project Manager: Ensures that key milestones and objectives are met
on time and at the expected quality.

 Business Intelligence Analyst: Provides business domain expertise

based on a deep understanding of the data, key performance
indicators (KPis), key metrics, and business intelligence from a
reporting perspective. Business Intelligence Analysts generally create
dashboards and reports and have knowledge of the data feeds and
sources.

 Database Administrator (DBA): Provisions and configures the

database environment to support the analytics needs of the working
team. These responsibilities may include providing access to key
databases or tables and ensuring the appropriate security levels are in
place related to the data repositories.

 Data Engineer: Leverages deep technical skills to assist with tuning

SQL queries for data management and data extraction, and provides
support for data ingestion into the analytic sandbox, which was
discussed in Chapter 1, "Introduction to Big Data Analytics." Whereas
the DBA sets up and configures the databases to be used, the data
engineer executes the actual data extractions and performs
substantial data manipulation to facilitate the analytics. The data
engineer works closely with the data scientist to help shape data in
the right ways for analyses.

 Data Scientist: Provides subject matter expertise for analytical

techniques, data modelling, and applying valid analytical techniques
to given business problems. Ensures overall analytics objectives are
met. Designs and executes analytical methods and approaches with
the data available to the project.
Overview of Data Analytics Lifecycle

 Phase 1- Discovery: In Phase 1, the team learns the business

domain, including relevant history such as whether the organization
or business unit has attempted similar projects in the past from
which they can learn. The team assesses the resources available to
support the project in terms of people, technology, time, and data.
Important activities in this phase include framing the business
problem as an analytics challenge that can be addressed in
subsequent phases and formulating initial hypotheses (IHs) to test
and begin learning the data.

 Phase 2- Data preparation: Phase 2 requires the presence of an

analytic sandbox, in which the team can work with data and perform
analytics for the duration of the project. The team needs to execute ext
ract, load, and transform (ELT) or extract, transform and load (ETL) to
get data into the sandbox. The ELT and ETL are sometimes
abbreviated as ETLT. Data should be t transformed in the ETLT
process so the team can work with it and analyze it. In this phase, the
team also needs to familiarize itself with the data thoroughly and take
steps to condition the data.

 Phase 3-Model planning: Phase 3 is model planning, where the team

determines the methods, techniques, and workflow it intends to follow
for the subsequent model building phase. The team explores the data
to learn about the relationships between variables and subsequently
selects key variables and the most suitable models.

 Phase 4-Model building: In Phase 4, the team develops data sets for
testing, training, and production purposes. In addition, in this phase
the team builds and executes models based on the work done in the
model planning phase. The team also considers whether its existing
tools will suffice for running the models, or if it will need a more
robust environment for executing models and work flows (for example,
fast hardware and parallel processing, if applicable).

 Phase 5-Communicate results: In Phase 5, the team, in collaboration

with major stakeholders, determines if the results of the project are a
success or a failure based on the criteria developed in Phase 1. The
team should identify key findings, quantify the business value, and
develop a narrative to summarize and convey findings to stakeholders.

 Phase 6-0perationalize: In Phase 6, the team delivers final reports,

briefings, code, and technical documents. In addition, the team may
run a pilot project to implement the models in a production
environment.
Common Tools for the Model Building Phase

 SAS Enterprise Miner allows users to run predictive and descriptive

models based on large volumes of data from across the enterprise. It
interoperates with other large data stores, has many partnerships,
and is built for enterprise-level computing and analytics.

 SPSS Modeler (provided by IBM and now called IBM SPSS Modeler)
offers methods to explore and analyze data through a GUI.

 Matlab provides a high-level language for performing a variety of data

analytics, algorithms, and data exploration.

 Alpine Miner provides a GUI front end for users to develop analytic
workfiows and interact with Big Data tools and platforms on the back
end.

 STATISTICA and Mathematica are also popular and well-regarded

data mining and analytics tools.

 Rand PL/R R was described earlier in the model planning phase, and
PL!R is a procedural language for PostgreSQL with R. Using this
approach means that R commands can be executed in database. This
technique provides higher performance and is more scalable than
running R in memory.

 Octave, a free software programming language for computational

modelling, has some of the functionality of Matlab. Because it is freely
available, Octave is used in major universities when teaching machine
learning.

 WEKA is a free data mining software package with an analytic

workbench. The functions created in WEKA can be executed within
Java code.

 Python is a programming language that provides toolkits for machine

learning and analysis, such as scikit-learn, numpy, scipy, pandas,
and related data visualization using matplotlib.

 SQL in-database implementations, such as MADlib provide an

alterative to in -memory desktop analytical tools. MADiib provides an
open-source machine learning library of algorithms that can be
executed in-database, for PostgreSQL or Greenplum.
Key Outputs from a Successful Analytic Project

Phase – 1: Discovery

 Learning Business Domain

 Resources – Technology, Tools, Systems, Data & People
 Problem Formulation
 Identify the key Stake Holder
 Interview Stake Holders (Prepare questions and ask open ended)
 Develop Hypothesis
 Identify Data sources

Phase – 2: Prepare Data

 Prepare Analytics Sandbox

 Making ETLT - OLTP
 Learning Data
 Data Conditioning (Tables – Columns)
 Survey & Visualize

Phase – 3: Model Planning

 Variable Selection
 Model Selection (Selection, Classification, Clustering) | Input / Output –
Continuous & Discrete
Phase – 4: Model Building

 Building
 Accuracy Analysis
 Deploy
 Testing

Phase – 5: Communicate Results

 Build Reports
 Communicate Reports
 Summery Presentation

Phase – 6: Operationalize

 Closer of the Project

Data Analytics Essentials for Students
No ratings yet
Data Analytics Essentials for Students
24 pages
Non-Disclosure and Confidentiality Agreement
100% (1)
Non-Disclosure and Confidentiality Agreement
2 pages
Chapter 1
No ratings yet
Chapter 1
41 pages
Chapter 02 DataAnalyticsLifecycle
No ratings yet
Chapter 02 DataAnalyticsLifecycle
44 pages
Unit 1
No ratings yet
Unit 1
60 pages
Patanjali Marketing Mix Study
No ratings yet
Patanjali Marketing Mix Study
3 pages
Da Mod2
No ratings yet
Da Mod2
88 pages
Big Data Analytics
100% (1)
Big Data Analytics
11 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
15 pages
Data Analytics I Unit Notes
No ratings yet
Data Analytics I Unit Notes
8 pages
Overview of Data Analytics Lifecycle: Unit 2
No ratings yet
Overview of Data Analytics Lifecycle: Unit 2
100 pages
Data Analytics Notes
No ratings yet
Data Analytics Notes
26 pages
Introduction To Data Analytics PDF
No ratings yet
Introduction To Data Analytics PDF
15 pages
Data Analytics Lifecycle
No ratings yet
Data Analytics Lifecycle
50 pages
DSV-Module1 & Module2 Notes
No ratings yet
DSV-Module1 & Module2 Notes
62 pages
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
No ratings yet
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
8 pages
Applications of PLC & HMI
100% (1)
Applications of PLC & HMI
5 pages
Statement of Financial Performance
No ratings yet
Statement of Financial Performance
3 pages
Module 1B
No ratings yet
Module 1B
65 pages
Unit 2 - Data Science
No ratings yet
Unit 2 - Data Science
37 pages
PM Unit 1
No ratings yet
PM Unit 1
41 pages
DA&V Module 1 (SAMI)
No ratings yet
DA&V Module 1 (SAMI)
4 pages
Module I (Introduction Data Analytics Life Cycle) Part II
No ratings yet
Module I (Introduction Data Analytics Life Cycle) Part II
103 pages
Da Unit 2
No ratings yet
Da Unit 2
18 pages
BSR-Data Science
No ratings yet
BSR-Data Science
308 pages
CSCI946 w2-BDLifecycle
No ratings yet
CSCI946 w2-BDLifecycle
76 pages
Data Analytics
No ratings yet
Data Analytics
25 pages
2-Evolution of Analytic Scalability-07!01!2025
No ratings yet
2-Evolution of Analytic Scalability-07!01!2025
21 pages
Data Analytics Lifecycle
No ratings yet
Data Analytics Lifecycle
16 pages
Life Cycle
No ratings yet
Life Cycle
35 pages
Riyan
No ratings yet
Riyan
3 pages
LAW Title XV
No ratings yet
LAW Title XV
4 pages
CRM Section Two
No ratings yet
CRM Section Two
4 pages
Data Analytics Lifecycle & Key Roles
No ratings yet
Data Analytics Lifecycle & Key Roles
50 pages
Unit I Big Data
No ratings yet
Unit I Big Data
256 pages
Draw The Data Analytics Life Cycle and Explain Each Phase With Examples
No ratings yet
Draw The Data Analytics Life Cycle and Explain Each Phase With Examples
17 pages
Unit - 2 PDA
No ratings yet
Unit - 2 PDA
20 pages
Unit2 DATA SCIENCE
No ratings yet
Unit2 DATA SCIENCE
8 pages
Unit - I DA
No ratings yet
Unit - I DA
107 pages
Ch1-Introduction To Data Analytics & LifeCycle
No ratings yet
Ch1-Introduction To Data Analytics & LifeCycle
26 pages
Resume Headline Examples
100% (1)
Resume Headline Examples
8 pages
ATW115 Slides Chp02
No ratings yet
ATW115 Slides Chp02
52 pages
DS&BDA Unit 3
No ratings yet
DS&BDA Unit 3
51 pages
Data Science Roles & Lifecycle Guide
No ratings yet
Data Science Roles & Lifecycle Guide
20 pages
Unit - I - 2
No ratings yet
Unit - I - 2
63 pages
Introduction
No ratings yet
Introduction
14 pages
Data Science Methodology - English Template
No ratings yet
Data Science Methodology - English Template
23 pages
Unit 1
No ratings yet
Unit 1
36 pages
Business Analytics Unit I
No ratings yet
Business Analytics Unit I
45 pages
Analytics and Data Science
No ratings yet
Analytics and Data Science
12 pages
Digital Marketing Diploma Guide
No ratings yet
Digital Marketing Diploma Guide
4 pages
Data Analytics Lifecycle
No ratings yet
Data Analytics Lifecycle
4 pages
Adobe Scan 27-Mar-2024
No ratings yet
Adobe Scan 27-Mar-2024
12 pages
Data Analytics 1
No ratings yet
Data Analytics 1
13 pages
Introduction to Data Analytics
No ratings yet
Introduction to Data Analytics
42 pages
Data Analytics
No ratings yet
Data Analytics
11 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
61 pages
Unit1 Introduction To Data Analytics and Data Analytics Lifecycle Notes
No ratings yet
Unit1 Introduction To Data Analytics and Data Analytics Lifecycle Notes
13 pages
Tugas Analitika Data (Yasa Hapipudin)
No ratings yet
Tugas Analitika Data (Yasa Hapipudin)
4 pages
Unit 1 - DSA
No ratings yet
Unit 1 - DSA
12 pages
Key Roles and Life Cycle
No ratings yet
Key Roles and Life Cycle
4 pages
Lec.3.Intro.D.S. Fall 2023
No ratings yet
Lec.3.Intro.D.S. Fall 2023
21 pages
Unit 3
No ratings yet
Unit 3
28 pages
Chap 13 (Target Costing)
No ratings yet
Chap 13 (Target Costing)
7 pages
Data Analytics 1
No ratings yet
Data Analytics 1
4 pages
Module I - 1
No ratings yet
Module I - 1
23 pages
Data Science: Lesson 5
No ratings yet
Data Science: Lesson 5
6 pages
Unit-1 DA
No ratings yet
Unit-1 DA
23 pages
W3 - DA Life Cycle
No ratings yet
W3 - DA Life Cycle
49 pages
Week 2 - Data Analytics Life Cycle
No ratings yet
Week 2 - Data Analytics Life Cycle
41 pages
Mid-1 Answer - BUS 685
No ratings yet
Mid-1 Answer - BUS 685
4 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
10 pages
BDA Unit2 Notes
No ratings yet
BDA Unit2 Notes
23 pages
Unit 4
No ratings yet
Unit 4
30 pages
An Experience Study On General Banking Rules & Procedures of United Commercial Bank
No ratings yet
An Experience Study On General Banking Rules & Procedures of United Commercial Bank
26 pages
BDA Unit1 Notes
No ratings yet
BDA Unit1 Notes
14 pages
DM Water Plant
No ratings yet
DM Water Plant
6 pages
HT With R
No ratings yet
HT With R
33 pages
BSNL Broadband Error Codes Guide
No ratings yet
BSNL Broadband Error Codes Guide
4 pages
Intel Inside
No ratings yet
Intel Inside
5 pages
Annual Return for Large Companies
No ratings yet
Annual Return for Large Companies
25 pages
Review Questions Costs of Production
No ratings yet
Review Questions Costs of Production
3 pages
Cuet 2022
No ratings yet
Cuet 2022
53 pages
Unit - 3 Learning Notes
No ratings yet
Unit - 3 Learning Notes
8 pages
Finance & Accounting Expert Resume
No ratings yet
Finance & Accounting Expert Resume
7 pages
Business II-9.1 - Over To You
No ratings yet
Business II-9.1 - Over To You
4 pages
Notes
No ratings yet
Notes
17 pages
वैदिक विचार संग्रह -भाग -1 PDF
No ratings yet
वैदिक विचार संग्रह -भाग -1 PDF
196 pages
2019 Financial Statements Report
No ratings yet
2019 Financial Statements Report
160 pages
CE115 2 Lesson 4 3 18 21
No ratings yet
CE115 2 Lesson 4 3 18 21
22 pages
Hourly Employment Agreement Guide
No ratings yet
Hourly Employment Agreement Guide
4 pages
Reward Personalization in Frequent Flyer Programs
No ratings yet
Reward Personalization in Frequent Flyer Programs
13 pages
Procurement Guidelines for Kenya
No ratings yet
Procurement Guidelines for Kenya
12 pages
UAE's Home Furnishing Market Growth
No ratings yet
UAE's Home Furnishing Market Growth
2 pages
Volume Forecasting in Food Production
No ratings yet
Volume Forecasting in Food Production
2 pages
Volkswagen Marketing Strategy Analysis
No ratings yet
Volkswagen Marketing Strategy Analysis
5 pages

Unit - 2 Learning Notes

Uploaded by

Unit - 2 Learning Notes

Uploaded by

Unit – 2

Key Roles for a Successful Analytics Project

 Project Sponsor: Responsible for the genesis of the project. Provides

 Business Intelligence Analyst: Provides business domain expertise

 Database Administrator (DBA): Provisions and configures the

 Data Engineer: Leverages deep technical skills to assist with tuning

 Data Scientist: Provides subject matter expertise for analytical

 Phase 1- Discovery: In Phase 1, the team learns the business

 Phase 2- Data preparation: Phase 2 requires the presence of an

 Phase 3-Model planning: Phase 3 is model planning, where the team

 Phase 5-Communicate results: In Phase 5, the team, in collaboration

 Phase 6-0perationalize: In Phase 6, the team delivers final reports,

 SAS Enterprise Miner allows users to run predictive and descriptive

 Matlab provides a high-level language for performing a variety of data

 STATISTICA and Mathematica are also popular and well-regarded

 Octave, a free software programming language for computational

 WEKA is a free data mining software package with an analytic

 Python is a programming language that provides toolkits for machine

 SQL in-database implementations, such as MADlib provide an

 Learning Business Domain

Phase – 2: Prepare Data

 Prepare Analytics Sandbox

Phase – 3: Model Planning

Phase – 5: Communicate Results

 Closer of the Project

You might also like