Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
23 views7 pages

Unit - 2 Learning Notes

The document outlines the syllabus for a unit on Big Data Analytics, detailing key roles in analytics projects such as Business User, Project Sponsor, and Data Scientist. It describes the data analytics lifecycle in six phases: Discovery, Data Preparation, Model Planning, Model Building, Communicating Results, and Operationalizing. Additionally, it lists common tools used in the model building phase and key outputs expected from a successful analytics project.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views7 pages

Unit - 2 Learning Notes

The document outlines the syllabus for a unit on Big Data Analytics, detailing key roles in analytics projects such as Business User, Project Sponsor, and Data Scientist. It describes the data analytics lifecycle in six phases: Discovery, Data Preparation, Model Planning, Model Building, Communicating Results, and Operationalizing. Additionally, it lists common tools used in the model building phase and key outputs expected from a successful analytics project.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Unit – 2

(Learning Notes)
SYLLABUS:
 Introduction to Big Data Analytics: Big Data Overview
 State of Practice in Analytics Role of Data Scientists
 Big Data Analytics in Industry Verticals

Key Roles for a Successful Analytics Project


 Business User: Someone who understands the domain area and
usually benefits from the results. This person can consult and advise
the project team on the context of the project, the value of the results,
and how the outputs will be operationalized. Usually a business
analyst, line manager, or deep subject matter expert in the project
domain fulfils this role.

 Project Sponsor: Responsible for the genesis of the project. Provides


the impetus and requirements for the project and defines the core
business problem. Generally provides the funding and gauges the
degree of value from the final outputs of the working team. This
person sets the priorities for the project and clarifies the desired
outputs.

 Project Manager: Ensures that key milestones and objectives are met
on time and at the expected quality.

 Business Intelligence Analyst: Provides business domain expertise


based on a deep understanding of the data, key performance
indicators (KPis), key metrics, and business intelligence from a
reporting perspective. Business Intelligence Analysts generally create
dashboards and reports and have knowledge of the data feeds and
sources.

 Database Administrator (DBA): Provisions and configures the


database environment to support the analytics needs of the working
team. These responsibilities may include providing access to key
databases or tables and ensuring the appropriate security levels are in
place related to the data repositories.

 Data Engineer: Leverages deep technical skills to assist with tuning


SQL queries for data management and data extraction, and provides
support for data ingestion into the analytic sandbox, which was
discussed in Chapter 1, "Introduction to Big Data Analytics." Whereas
the DBA sets up and configures the databases to be used, the data
engineer executes the actual data extractions and performs
substantial data manipulation to facilitate the analytics. The data
engineer works closely with the data scientist to help shape data in
the right ways for analyses.

 Data Scientist: Provides subject matter expertise for analytical


techniques, data modelling, and applying valid analytical techniques
to given business problems. Ensures overall analytics objectives are
met. Designs and executes analytical methods and approaches with
the data available to the project.
Overview of Data Analytics Lifecycle

 Phase 1- Discovery: In Phase 1, the team learns the business


domain, including relevant history such as whether the organization
or business unit has attempted similar projects in the past from
which they can learn. The team assesses the resources available to
support the project in terms of people, technology, time, and data.
Important activities in this phase include framing the business
problem as an analytics challenge that can be addressed in
subsequent phases and formulating initial hypotheses (IHs) to test
and begin learning the data.

 Phase 2- Data preparation: Phase 2 requires the presence of an


analytic sandbox, in which the team can work with data and perform
analytics for the duration of the project. The team needs to execute ext
ract, load, and transform (ELT) or extract, transform and load (ETL) to
get data into the sandbox. The ELT and ETL are sometimes
abbreviated as ETLT. Data should be t transformed in the ETLT
process so the team can work with it and analyze it. In this phase, the
team also needs to familiarize itself with the data thoroughly and take
steps to condition the data.

 Phase 3-Model planning: Phase 3 is model planning, where the team


determines the methods, techniques, and workflow it intends to follow
for the subsequent model building phase. The team explores the data
to learn about the relationships between variables and subsequently
selects key variables and the most suitable models.

 Phase 4-Model building: In Phase 4, the team develops data sets for
testing, training, and production purposes. In addition, in this phase
the team builds and executes models based on the work done in the
model planning phase. The team also considers whether its existing
tools will suffice for running the models, or if it will need a more
robust environment for executing models and work flows (for example,
fast hardware and parallel processing, if applicable).

 Phase 5-Communicate results: In Phase 5, the team, in collaboration


with major stakeholders, determines if the results of the project are a
success or a failure based on the criteria developed in Phase 1. The
team should identify key findings, quantify the business value, and
develop a narrative to summarize and convey findings to stakeholders.

 Phase 6-0perationalize: In Phase 6, the team delivers final reports,


briefings, code, and technical documents. In addition, the team may
run a pilot project to implement the models in a production
environment.
Common Tools for the Model Building Phase

 SAS Enterprise Miner allows users to run predictive and descriptive


models based on large volumes of data from across the enterprise. It
interoperates with other large data stores, has many partnerships,
and is built for enterprise-level computing and analytics.

 SPSS Modeler (provided by IBM and now called IBM SPSS Modeler)
offers methods to explore and analyze data through a GUI.

 Matlab provides a high-level language for performing a variety of data


analytics, algorithms, and data exploration.

 Alpine Miner provides a GUI front end for users to develop analytic
workfiows and interact with Big Data tools and platforms on the back
end.

 STATISTICA and Mathematica are also popular and well-regarded


data mining and analytics tools.

 Rand PL/R R was described earlier in the model planning phase, and
PL!R is a procedural language for PostgreSQL with R. Using this
approach means that R commands can be executed in database. This
technique provides higher performance and is more scalable than
running R in memory.

 Octave, a free software programming language for computational


modelling, has some of the functionality of Matlab. Because it is freely
available, Octave is used in major universities when teaching machine
learning.

 WEKA is a free data mining software package with an analytic


workbench. The functions created in WEKA can be executed within
Java code.

 Python is a programming language that provides toolkits for machine


learning and analysis, such as scikit-learn, numpy, scipy, pandas,
and related data visualization using matplotlib.

 SQL in-database implementations, such as MADlib provide an


alterative to in -memory desktop analytical tools. MADiib provides an
open-source machine learning library of algorithms that can be
executed in-database, for PostgreSQL or Greenplum.
Key Outputs from a Successful Analytic Project

Phase – 1: Discovery

 Learning Business Domain


 Resources – Technology, Tools, Systems, Data & People
 Problem Formulation
 Identify the key Stake Holder
 Interview Stake Holders (Prepare questions and ask open ended)
 Develop Hypothesis
 Identify Data sources

Phase – 2: Prepare Data

 Prepare Analytics Sandbox


 Making ETLT - OLTP
 Learning Data
 Data Conditioning (Tables – Columns)
 Survey & Visualize

Phase – 3: Model Planning

 Variable Selection
 Model Selection (Selection, Classification, Clustering) | Input / Output –
Continuous & Discrete
Phase – 4: Model Building

 Building
 Accuracy Analysis
 Deploy
 Testing

Phase – 5: Communicate Results

 Build Reports
 Communicate Reports
 Summery Presentation

Phase – 6: Operationalize

 Closer of the Project

You might also like