0% found this document useful (0 votes)

6 views15 pages

Introduction To Data Analytics PDF

The document provides an overview of data analytics, including its definition, lifecycle phases, and types of analytics such as predictive, descriptive, prescriptive, and diagnostic. It also discusses various tools for data analysis, data modeling techniques, and the importance of business modeling in enhancing decision-making and operational efficiency. Additionally, it outlines the structure and purpose of databases and different types of data variables.

Uploaded by

sahithi.n64

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views15 pages

Introduction To Data Analytics PDF

Uploaded by

sahithi.n64

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

UNIT 2

CONTENTS : Introduction to Data Analytics, Introduction to tools and environment,

application of modeling in business, Data modeling techniques, Need for business modeling,
Database and types of data variables, , Missing imputations.

1. Introduction to Data Analytics

Definition: Data analytics is defined as a science of extracting meaningful,
valuable information from raw data.

Definition: Data analytics is the process of examining data sets to draw

conclusions, predict trends and inform decision making through various analytical
techniques.

The goal of Data analytics is to get actionable insights from raw data resulting
better decisions.

Data analysis is key part of data analytics.

It involves in scrutinizing existing data to gain insights and draw conclusions.

The data analytics lifecycle is a process that consists of six basic stages/phases

1. Data discovery 2.Data preparation 3.Model planning

4. Model building 5.communication results 6.Operationalize

• The lifecycle of the data analytics provides a framework for the best
performances of each phase from the creation of the project until its completion.
Phase 1 - Data Discovery:

• Data discovery is the 1st phase to set project's objectives and find ways to
achieve a complete data analytics lifecycle.

• Data discovery phase defining the purpose of data and how to achieve it by
the end of the data analytics lifecycle.

• Data discovery phase consists of identifying critical objectives a business

is trying to discover by mapping out the data.

Phase 2 - Data Preparation:

• Data preparation phase of the data analytics lifecycle involves data

preparation, which includes the steps to explore, preprocess and condition
data prior to modeling and analysis.

• Data are loaded in the sandbox in three ways namely, ETL (Extract,
Transform and Load), ELT (Extract, Load, and Transform) and ETLT.

Phase 3 - Model Planning:

• The 3rd phase of the lifecycle is model planning, where the data analytics
team members makes proper planning of the methods to be adapted and the
various workflow to be followed during the next phase of model building.

• Model planning is a phase where the data analytics team members have to
analyze the quality of data and find a suitable model for the project.

Phase 4 - Model Building:

• In this phase the team works on developing datasets for training and testing
as well as for production purposes.

• This phase is based on the planning made in the previous phase; the
execution of the model is carried out by the team.

• Model building is the process where team has to deploy the planned model
in a real-time environment.

• The environment needed for the execution of the model is decided and
prepared so that if a more robust environment is required, it is accordingly
applied.
Phase 5 - Communicate Results:

• The 5th phase of the life cycle of data analytics checks the results of the
project to find whether it is a success or failure.

• The result is scrutinized by the entire team to draw inferences on the key
findings and summarize the entire work done.

Phase 6 - Operationalize: • In 6th phase, the team delivers final reports is

prepared by the team along with the briefings, source code and related
technical documents.

• Operationalize phase also involves running the pilot project to implement

the model and test it in a real-time environment.

• As soon the team prepares a detailed report including the key findings,
documents, and briefings, the data analytics life cycle almost comes close to
the end.

Types of Data Analytics

There are four major types of data analytics:
1. Predictive (forecasting)
2. Descriptive (business intelligence and data mining)
3. Prescriptive (optimization and simulation)
4. Diagnostic analytics
Predictive analytics:
Predictive analytics turn the data into valuable, actionable
information. Predictive analytics holds a variety of statistical techniques
that analyze current and historical facts to make predictions about a future
event.
Techniques that are used for predictive analytics are:
 Linear Regression
 Time Series Analysis and Forecasting
 Data Mining

Descriptive Analytics
Descriptive analytics looks at past performance and understands the
performance by mining historical data to understand the cause of success
or failure in the past. Almost all management reporting such as sales,
marketing, operations, and finance uses this type of analysis.
Prescriptive Analytics
Prescriptive analytics goes beyond predicting future outcomes by also
suggesting action benefits from the predictions. Prescriptive Analytics not
only anticipates what will happen and when to happen but also why it will
happen. Further, Prescriptive Analytics can suggest decision options on
how to take advantage of a future opportunity or mitigate a future risk.
Diagnostic Analytics
In this analysis, we generally use historical data over other data to answer
any question or for the solution of any problem. We try to find any
dependency and pattern in the historical data of the particular problem.
Common techniques used for Diagnostic Analytics are:
 Data discovery
 Data mining
 Correlations

2. Introduction to tools and environment

Many tools are available to facilitate data analysis, ranging from simple
spreadsheets to advanced software with powerful machine-learning
capabilities. Some popular tools include:

1. Power BI
 Power BI provides interactive visualizations and business intelligence
capabilities with a user-friendly interface for creating reports and
dashboards.
 It's a data simplification powerhouse, connecting to numerous data
sources, and delivering visually engaging reports that translate into
meaningful business insights.
 Power BI Available as software as a service (SaaS), desktop
application, and mobile app, Power BI offers a comprehensive view
of business data, making team collaboration easy.

2. Excel
 It is a versatile tool for both professionals and beginners.
 Excel can handle large datasets, perform complex calculations, and
automate tasks using macros.
 It features tables for dynamic data summarization and a variety of
charting tools for data visualization. Excel supports functions for
statistical analysis financial modeling and data cleaning.

3. Tableau

 Tableau is a powerful data visualization tool that specializes in

creating interactive and shareable dashboards. It converts raw data
into an understandable format using advanced visualizations, making
it easier to extract insights.
 Tableau supports drag-and-drop functionalities, real-time data
analysis, and integration with various data sources, including
databases, spreadsheets, and cloud services.
 Its analytical capabilities are enhanced by advanced chart types,
filtering, and dashboard actions, while its user-friendly interface
allows for quick learning and application.

4. SAS

 SAS (Statistical Analysis System) is a comprehensive software suite

designed for advanced analytics, business intelligence, data management,
and predictive analytics. It is known for its robust analytical power and
extensive capabilities.
 SAS provides advanced data manipulation, extensive statistical analysis,
predictive modeling, and machine learning capabilities.
 It supports large-scale data analysis with highly customizable options,
allowing for detailed data preparation, reporting, and visualization. SAS
also offers specialized solutions for various industries, including
healthcare, finance, and retail.
5. R programming
 R is an open-source programming language and software
environment tailored for statistical computing and graphics.
 It is widely used for data analysis, modeling, and visualization,
particularly in academic and research settings.
 R excels in data manipulation, calculation, and graphical display.
 It offers a vast library of statistical and graphical techniques, including
linear and nonlinear modeling, classical statistical tests, time-series
analysis, and machine learning.
6. Python

 Python is a versatile, high-level programming language renowned for its

readability and extensive libraries.
 It is widely used in data science, machine learning, and data analysis,
offering a powerful and flexible environment for various tasks.
 Python thrives in automating data manipulation tasks, performing
complex statistical analyses, and creating visualizations. Libraries such
as Pandas, NumPy, SciPy, and Matplotlib enhance Python's data analysis
capabilities, while frameworks like Tensor Flow, scikit-learn, and Keras
enable sophisticated machine learning models.

3. Data modeling techniques

The main objective of data modeling is to provide a precise and well-organized
framework for data organization and representation, since it enables efficient
analysis and decision-making. Analysts can discover trends, understand the
connections between various data items, and make sure that data is efficiently and
accurately stored by building models.

What is Data Model?

Data models are visual representations of an enterprise’s data elements and the
connections between them. Models assist to define and arrange data in the
context of key business processes, hence facilitating the creation of successful
information systems.
Data Modeling Process
The practice of conceptually representing data items and their connections to one
another is known as data modeling.
1. Identifying data sources: The first stage is to identify and investigate the
different sources of data both inside and outside the company. Determining
the sources of data is essential since it guarantees a thorough framework for
data modeling..
2. Defining Entities and Attributes: This stage is all on identifying the entities
(items or ideas) and the characteristics that go along with them. Entities
constitute the subject matter of the data, whereas attributes specify the
particular qualities of each entity. The foundation of data modeling is the
definition of entities and characteristics.
3. Mapping Relationships: Relationships show the connections or associations
between various things. Relationship mapping entails locating and
characterizing these linkages, indicating the nature and cardinality of every
relationship. In order to capture the interdependencies within the data, it is
essential to understand relationships. It improves the correctness of the model
by capturing the relationships between various data pieces that exist in the real
world.
5. Choosing a model Type: The right data model type is selected based on the
project needs and data properties. Choosing between conceptual, logical, or
physical models, or going with a particular model like relational or object-
oriented, may be part of this decision. The degree of abstraction and detail in
the representation is determined by the model type that is selected

6. Implementing and Maintaining: The process of implementation converts

a physical or logical data model into a database schema. This entails
establishing constraints, generating tables, and adding database-specific
information. Updating the model to account for shifting technological or
commercial needs is called maintenance.

Types of Data Modeling

These are the 5 different types of data models:
1. Hierarchical Model: The structure of the hierarchical model resembles
a tree. The remaining child nodes are arranged in a certain sequence, and
there is only one root node or one parent node. However, the hierarchical
approach is no longer widely applied. Approach connections in the actual
world may be modeled using this approach.
2. Relational Model: Relational Model represents the links between tables
by representing data as rows and columns in tables. It is frequently utilized
in database design and is strongly related to relational database
management systems (RDBMS).
3. Object-Oriented Data Model: In this model, data is represented as
objects, similar to those used in object-oriented programming; creating
objects with stored values are the object-oriented method. In addition to
allowing data abstraction, inheritance, and encapsulation, the object-
oriented architecture facilitates communication.
4. Network Model: We have a versatile approach to represent objects and
the relationships among these things. One of its features is a schema,
which is a graph representation of the data. An item is stored within a
node, and the relationship between them is represented as an edge. This
allows them to generalize the maintenance of many parent and child
records.
5. ER-Model: A high-level relational model called the entity-relationship
model (ER model) is used to specify the data pieces and relationships
between the entities in a system. This conceptual design gives us an easier-
to-understand perspective on the facts. An entity-relationship diagram,
which is made up of entities, attributes, and relationships, is used in this
model to depict the whole database.
A relationship between entities is called an association. Mapping
cardinality many associations like:
one to one; one to many; many to one; many to many.

4. Application of modeling in business

Data modeling plays a crucial role in business data analytics. It enables

organizations to transform raw data into actionable insights, fostering better
decision-making and business growth.
Data modeling is applied in business data analytics is as follows:

1. Organizing and understanding data

 Data modeling structures data in a logical and organized manner, making it easier
to understand and manage.

 It provides a clear and standardized way of illustrating data structures, allowing

analysts and stakeholders to effectively communicate about data requirements.

 Through this organization, analysts can identify gaps and inconsistencies in

existing data structures and business rules.

2. Enhancing data quality and integrity

 Data modeling improves data quality by identifying and correcting errors and
inconsistencies.

 It ensures data integrity and prevents anomalies through the enforcement of

constraints and relationships.

 By minimizing redundancy, data modeling reduces unnecessary data duplication.

3. Optimizing for analytics and reporting
 Data modeling provides a structured framework for data analysis and reporting,
aiding in the generation of insights.

 It facilitates efficient data retrieval, improving system performance and reducing

query and report generation time.

 This optimization supports faster query execution and scalable analytics

infrastructure, particularly with large datasets.

4. Supporting data-driven decision-making

 Data modeling enables organizations to make decisions based on empirical

evidence.

 It helps optimize strategies for better results and competitiveness by analyzing past
performance and predicting future outcomes.

 It empowers employees to make data-driven decisions by ensuring data is available

and interpretable.

5. Improving customer experience and operational efficiency

 Data modeling and analytics provide insights into customer preferences and
behaviors through the analysis of customer data and feedback.

 This enables personalized offerings and tailored marketing, enhancing customer

experience.

 Analyzing operational data through data modeling allows businesses to streamline

processes and allocate resources efficiently, increasing productivity and reducing
costs.

6. Facilitating database design and application development

 Data modeling is a crucial step in designing efficient database structures, acting as

a blueprint.

 It guides application development by defining data requirements upfront, reducing

errors.
5. Need for business modeling
Business modeling is essential in data analytics for providing a structured approach
to understanding, analyzing, and improving business operations and decision-
making. It helps organizations define their goals, identify key performance
indicators (KPIs), and understand how data can be used to achieve those goals. By
creating business models, companies can ensure alignment between their data
strategies and overall business objectives.

Business modeling is crucial in data analytics:

1. Defining Business Objectives and Requirements:

 Business modeling helps clarify what the organization wants to achieve and
identifies the specific data needed to support those goals.
 It ensures that data analytics efforts are aligned with business needs and priorities,
preventing wasted resources on irrelevant data analysis.
2. Optimizing Business Processes:
 By modeling business processes, organizations can identify areas for improvement,
such as inefficiencies or bottlenecks.
 Data analytics can then be used to analyze these processes, identify root causes of
problems, and develop data-driven solutions.
3. Enhancing Decision-Making:
 Business models provide a framework for understanding the relationships between
different aspects of the business.
 This understanding allows for more informed and effective decision-making,
leading to better business outcomes.
4. Improving Data Quality and Accessibility:
 Data modeling techniques, often part of business modeling, ensure data
consistency, accuracy, and accessibility.
 This is crucial for reliable data analysis and reporting, preventing errors and
inconsistencies that can lead to poor decisions.
5. Facilitating Communication and Collaboration:
 Business models provide a common language and framework for communication
between business and technical teams.
 This facilitates collaboration and ensures that everyone is working towards the
same goals with a shared understanding of the business.
6. Supporting Predictive Analytics and Forecasting:
 Business models can be used to build predictive models that help organizations
anticipate future trends and make proactive decisions.
 This can include forecasting sales, predicting customer behavior, or identifying
potential risks.
7. Driving Innovation and Competitive Advantage:
 By understanding their business and leveraging data effectively, organizations can
identify new opportunities for innovation and gain a competitive edge.
 Data analytics, guided by business modeling, can help organizations develop new
products, services, or business models.

6. Database and types of data variables

A database is a structured collection of information, or data, stored electronically in
a computer system. It is designed for efficient storage, retrieval, and manipulation
of data, often managed using a Database Management System (DBMS).

1. Hierarchical Databases
Hierarchical databases organize data in a tree-like structure, where each
parent record can have multiple child records. This model works well for
scenarios where data follows a predefined hierarchical relationship, where
data is arranged in levels or ranks.
2. Network Databases
A network databases build on the hierarchical model but allow child
records to be linked to multiple parent records, creating a web-like
structure of interconnected data. These results in a more flexible structure
often referred to as a graph model, where entities can be connected in
many different ways.

3. Object-Oriented Databases
Object-oriented databases are based on the principles of object-oriented
programming where data is stored as objects. These objects include
attributes (data) and methods (functions), making them easily referenced
and manipulated. These databases are designed to handle complex data
structures such as multimedia, graphics, and large files.
4. Relational Databases
Relational databases are the most widely used type of database today. They
store data in tables, with rows representing records and columns
representing attributes of the records. In these databases, every piece of
information has a relationship with every other piece of information. Every
row of data in the database is linked with another row using a primary key.
Similarly, every table is linked with another table using a foreign key.

5. Cloud Databases
A cloud database operates in a virtual environment hosted on cloud
computing platforms. It is designed for storing, managing, and executing
data over the internet, providing flexibility and scalability. Cloud databases
are widely used for applications requiring dynamic workloads, as they
eliminate the need for on-premises infrastructure.

6. Centralized Databases
A centralized database is a database stored and managed at a single
location, such as a central server or data center. It ensures higher security
and consistency as all data are maintained in one place, making it easier to
control and manage.
Users can access the database remotely to fetch or update information.
Centralized databases are commonly used in enterprise systems where data
consistency and security are critical. However, scalability and performance
limitations should be carefully considered.
7. Personal Databases
A personal database is a small-scale database designed for a single user,
typically used on personal computers or mobile devices. These databases
are ideal for managing individual data like contacts, budgets, notes, or
schedules. They are lightweight, easy to use, and require minimal database
administration, making them accessible for non-technical users.
8. Operational Databases
An operational database is designed to manage and process real-time data
for daily operations within organizations and businesses. It allows users to
create, update, and delete data efficiently, ensuring that the database
reflects current activities and transactions.
9. NoSQL Databases
A NoSQL database (short for "non-SQL" or "non-relational") provides a
mechanism for storing and retrieving data that does not rely on traditional
table-based relational models.
NoSQL databases are known for their simplicity of design, horizontal
scalability (adding more servers for scaling), and high availability. Unlike
relational databases, their data structures allow faster operations in certain
use cases. MongoDB, for instance, is a widely used document-based
NoSQL database.

Types of data

Data can be broadly classified into two main categories: qualitative and
quantitative, each further subdivided.

1. Qualitative (Categorical) data

This type describes qualities or characteristics and is typically non-numeric.

 Nominal Data: Labels variables without any order or numerical value, like hair
color or nationality.

 Ordinal Data: Presents a natural order or ranking but does not quantify the
differences between categories, such as customer satisfaction ratings.

2. Quantitative (Numerical) data

This data represents measurable quantities or numerical values that can be counted
or expressed using numbers.

 Discrete Data: Consists of countable, distinct values, usually whole numbers, like
the number of students in a class.

 Continuous Data: Can assume any value within a given range, such as height,
temperature, or time.

Variables: the building blocks of data

Understanding the types of data and variables is crucial for effective data
analysis. Different data types require specific statistical methods for accurate
interpretation

A variable is any characteristic, number, or quantity that can be measured or

counted.

Types of variables

 Independent Variable: The factor that is manipulated or changed in an

experiment to observe its effect on the dependent variable.
 Dependent Variable: The outcome or effect that is measured in response to
changes in the independent variable.

 Categorical Variables: Represent categories or groups, including nominal and

ordinal variables.

 Continuous Variables: Quantitative variables capable of taking an infinite number

of values within a range.

 Confounding Variables: Extraneous variables that can cause a false association

between independent and dependent variables, potentially leading to incorrect
conclusions.

7. Missing imputations
Missing data is a common issue in data analysis and machine learning. It occurs
when data is not recorded for certain variables or participants, appearing as blank
cells, null values (like "NA" or "NaN"), or special symbols. Failing to address
missing data can negatively impact the accuracy and reliability of models and
analysis.

Imputation is a frequently used method to handle missing data by replacing absent

values with substituted ones.

Imputation is important because it:

 Preserves data integrity: Replacing missing values helps retain valuable data
points without deleting rows or columns.

 Improves model accuracy: Addressing missing data helps reduce bias and
enhance model performance by training on more complete datasets.

 Reduces bias: Proper handling of missing data helps avoid biased results,
especially when the Missingness is not random.

 Enables use of machine learning algorithms: Many algorithms require complete

datasets to function effectively.

Types of missing data

Identifying the type of missing data is essential for selecting the appropriate
imputation method.
 Missing completely at Random (MCAR): Missingness is random and unrelated
to other variables.

 Missing at Random (MAR): Missingness is related to other observed variables

but not the missing values themselves.

 Missing Not at Random (MNAR): Missingness is related to the missing values or

unobserved factors.

Imputation techniques
Common imputation techniques include:

 Mean/Median/Mode Imputation: Replacing missing values with the mean,

median, or mode of the feature.

 Forward Fill and Backward Fill: Filling missing values with the last or next
known value, often used for time-series data.

 K-Nearest Neighbors (KNN) Imputation: Replacing missing values based on

similar data points.

 Regression Imputation: Using a regression model to predict missing values based

on relationships in the data.

 Multiple Imputation: Creating multiple datasets with different imputed values to

account for uncertainty.

Choosing the right method

Selecting the best imputation method depends on factors such as:

 Type of Data: Different data types may require different approaches.

 Missing Data Mechanism: Understanding the missing data type helps minimize
bias.

 Proportion of Missing Data: The amount of missing data can influence the
complexity of the needed method.

 Impact on Analysis: Consider how the chosen method will affect your results or
model performance.

Data Analytics Essentials for Students
No ratings yet
Data Analytics Essentials for Students
24 pages
Data Analytics Unit1
No ratings yet
Data Analytics Unit1
24 pages
Data Analytics Lecture Notes
100% (3)
Data Analytics Lecture Notes
10 pages
Data Processing & Analysis Guide
100% (3)
Data Processing & Analysis Guide
38 pages
Unit-II (Data Analytics)
100% (1)
Unit-II (Data Analytics)
17 pages
HR Analytics: Executive Development Program in
No ratings yet
HR Analytics: Executive Development Program in
13 pages
All Questions 1
No ratings yet
All Questions 1
39 pages
Advantages and Disadvantages of Data Analytics
No ratings yet
Advantages and Disadvantages of Data Analytics
6 pages
School Management
100% (2)
School Management
78 pages
2 Data Analytics
No ratings yet
2 Data Analytics
49 pages
Introduction to Data Analytics
No ratings yet
Introduction to Data Analytics
42 pages
Data Science Roles & Lifecycle Guide
No ratings yet
Data Science Roles & Lifecycle Guide
20 pages
Module I - 1
No ratings yet
Module I - 1
23 pages
Unit II
No ratings yet
Unit II
91 pages
Adobe Scan 27-Mar-2024
No ratings yet
Adobe Scan 27-Mar-2024
12 pages
Data Visualization & Analytics Guide
No ratings yet
Data Visualization & Analytics Guide
10 pages
Introduction
No ratings yet
Introduction
14 pages
What Is Data Anaysis
No ratings yet
What Is Data Anaysis
8 pages
Unit - I - 2
No ratings yet
Unit - I - 2
63 pages
Module 1 - BA
No ratings yet
Module 1 - BA
24 pages
Module 1
No ratings yet
Module 1
40 pages
Data Analytics
No ratings yet
Data Analytics
11 pages
Module 1B
No ratings yet
Module 1B
65 pages
Ch1-Introduction To Data Analytics & LifeCycle
No ratings yet
Ch1-Introduction To Data Analytics & LifeCycle
26 pages
D.A - Introduction To Data Analytics
No ratings yet
D.A - Introduction To Data Analytics
16 pages
Unit 2 DS
No ratings yet
Unit 2 DS
30 pages
Unit 1 - DSA
No ratings yet
Unit 1 - DSA
12 pages
DataAnalytics Chap 1
No ratings yet
DataAnalytics Chap 1
36 pages
Unit V
No ratings yet
Unit V
3 pages
ToolKit 1 - Unit 1 - Introduction To Data Analytics
No ratings yet
ToolKit 1 - Unit 1 - Introduction To Data Analytics
15 pages
Business Analytics and Intelligence From IIM-Bengaluru
No ratings yet
Business Analytics and Intelligence From IIM-Bengaluru
12 pages
Q) Concept of Data Analytics
No ratings yet
Q) Concept of Data Analytics
28 pages
Business Analytics Unit I
No ratings yet
Business Analytics Unit I
45 pages
Da Mod2
No ratings yet
Da Mod2
88 pages
AA THeory and Methods
No ratings yet
AA THeory and Methods
40 pages
Data Handling
No ratings yet
Data Handling
7 pages
wk6 - Data Analytics
No ratings yet
wk6 - Data Analytics
25 pages
Data Analytics
No ratings yet
Data Analytics
7 pages
Unit 1-2
No ratings yet
Unit 1-2
8 pages
All About Data Science
No ratings yet
All About Data Science
35 pages
Unit2 DATA SCIENCE
No ratings yet
Unit2 DATA SCIENCE
8 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
15 pages
Data Analytics: Process and Types
No ratings yet
Data Analytics: Process and Types
81 pages
Unit 2
No ratings yet
Unit 2
26 pages
DA Unit 2 Trio 1
No ratings yet
DA Unit 2 Trio 1
26 pages
Big - Data Unit-2
100% (2)
Big - Data Unit-2
64 pages
Data Analytics Lifecycle Guide
No ratings yet
Data Analytics Lifecycle Guide
8 pages
DAUnit 2
No ratings yet
DAUnit 2
18 pages
Unit 2 - Data Science
No ratings yet
Unit 2 - Data Science
37 pages
New Question Bank Business Analytics PDF
No ratings yet
New Question Bank Business Analytics PDF
6 pages
CH 1
No ratings yet
CH 1
31 pages
Module 2
No ratings yet
Module 2
18 pages
Sascheatsheet 170401221255
100% (1)
Sascheatsheet 170401221255
29 pages
Data Analytics 1
No ratings yet
Data Analytics 1
13 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
19 pages
LWPRG1 003 Chapter 4
No ratings yet
LWPRG1 003 Chapter 4
134 pages
Week 2 - Data Analytics Life Cycle
No ratings yet
Week 2 - Data Analytics Life Cycle
41 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
24 pages
Unit 1
No ratings yet
Unit 1
57 pages
Unit 1
No ratings yet
Unit 1
21 pages
Ansys Fluent 14.0: Workbench Guide
No ratings yet
Ansys Fluent 14.0: Workbench Guide
86 pages
Preparing Analysis Data Model (ADaM) Data Sets and Related Files For FDA Submission
No ratings yet
Preparing Analysis Data Model (ADaM) Data Sets and Related Files For FDA Submission
12 pages
Data Analytics Using Spreadsheets I-Reference Notes
No ratings yet
Data Analytics Using Spreadsheets I-Reference Notes
27 pages
Unit 1 Topic 1 Intro
No ratings yet
Unit 1 Topic 1 Intro
30 pages
SAS Base SAS 9.4 Procedures Guide 7e (2017)
No ratings yet
SAS Base SAS 9.4 Procedures Guide 7e (2017)
2,884 pages
Comparison of Business Intelligence and Decision Support System
No ratings yet
Comparison of Business Intelligence and Decision Support System
4 pages
Immediate Download Multilevel Modeling Using R Ebooks 2024
100% (4)
Immediate Download Multilevel Modeling Using R Ebooks 2024
24 pages
Full Download Multilevel Modeling Using R PDF
100% (2)
Full Download Multilevel Modeling Using R PDF
24 pages
2.data Analysis Vs Analytics
No ratings yet
2.data Analysis Vs Analytics
6 pages
1st Unit
No ratings yet
1st Unit
20 pages
1st Unit
No ratings yet
1st Unit
20 pages
Connect To Teradata PDF
No ratings yet
Connect To Teradata PDF
18 pages
Unit 1
No ratings yet
Unit 1
50 pages
Topic 1 & 2 Data Architecture
No ratings yet
Topic 1 & 2 Data Architecture
7 pages
Saswp
No ratings yet
Saswp
17 pages
Data Scientist Resume - 12 Years Experience
No ratings yet
Data Scientist Resume - 12 Years Experience
9 pages
Best Practices For Resource Management in Hadoop: James Kochuba, SAS Institute Inc., Cary, NC
No ratings yet
Best Practices For Resource Management in Hadoop: James Kochuba, SAS Institute Inc., Cary, NC
10 pages
Session 10: Batch Edit and Export
No ratings yet
Session 10: Batch Edit and Export
10 pages
Data Science With SAS Ebook PDF
No ratings yet
Data Science With SAS Ebook PDF
783 pages
Resume - Ayan Majumdar
No ratings yet
Resume - Ayan Majumdar
4 pages
17 Data Science Certifications For Your Career in 2024
No ratings yet
17 Data Science Certifications For Your Career in 2024
5 pages
SAS ODS PDF Options Guide
0% (1)
SAS ODS PDF Options Guide
2 pages
Yasaman Tahernezhad CV
No ratings yet
Yasaman Tahernezhad CV
3 pages
Personal Information: Curriculum Vitae
No ratings yet
Personal Information: Curriculum Vitae
6 pages
SAS Macro for Clinical Lab Shift Tables
No ratings yet
SAS Macro for Clinical Lab Shift Tables
6 pages
Tech-Savvy SAS Programmer CV
No ratings yet
Tech-Savvy SAS Programmer CV
2 pages
Topic 4
No ratings yet
Topic 4
2 pages
Rakshit Resume
No ratings yet
Rakshit Resume
1 page
Sas Functions Pocketref
No ratings yet
Sas Functions Pocketref
171 pages
Entry-Level Data Scientist Resume
No ratings yet
Entry-Level Data Scientist Resume
1 page
Programming Tools for Economics
No ratings yet
Programming Tools for Economics
2 pages

Introduction To Data Analytics PDF

Uploaded by

Introduction To Data Analytics PDF

Uploaded by

UNIT 2

CONTENTS : Introduction to Data Analytics, Introduction to tools and environment,

1. Introduction to Data Analytics

Definition: Data analytics is the process of examining data sets to draw

Data analysis is key part of data analytics.

It involves in scrutinizing existing data to gain insights and draw conclusions.

1. Data discovery 2.Data preparation 3.Model planning

4. Model building 5.communication results 6.Operationalize

• Data discovery phase consists of identifying critical objectives a business

Phase 2 - Data Preparation:

• Data preparation phase of the data analytics lifecycle involves data

Phase 3 - Model Planning:

Phase 4 - Model Building:

Phase 6 - Operationalize: • In 6th phase, the team delivers final reports is

• Operationalize phase also involves running the pilot project to implement

Types of Data Analytics

2. Introduction to tools and environment

 Tableau is a powerful data visualization tool that specializes in

 SAS (Statistical Analysis System) is a comprehensive software suite

 Python is a versatile, high-level programming language renowned for its

3. Data modeling techniques

What is Data Model?

6. Implementing and Maintaining: The process of implementation converts

Types of Data Modeling

4. Application of modeling in business

Data modeling plays a crucial role in business data analytics. It enables

1. Organizing and understanding data

 It provides a clear and standardized way of illustrating data structures, allowing

 Through this organization, analysts can identify gaps and inconsistencies in

2. Enhancing data quality and integrity

 It ensures data integrity and prevents anomalies through the enforcement of

 By minimizing redundancy, data modeling reduces unnecessary data duplication.

 It facilitates efficient data retrieval, improving system performance and reducing

 This optimization supports faster query execution and scalable analytics

4. Supporting data-driven decision-making

 Data modeling enables organizations to make decisions based on empirical

 It empowers employees to make data-driven decisions by ensuring data is available

5. Improving customer experience and operational efficiency

 This enables personalized offerings and tailored marketing, enhancing customer

 Analyzing operational data through data modeling allows businesses to streamline

6. Facilitating database design and application development

 Data modeling is a crucial step in designing efficient database structures, acting as

 It guides application development by defining data requirements upfront, reducing

Business modeling is crucial in data analytics:

1. Defining Business Objectives and Requirements:

6. Database and types of data variables

1. Qualitative (Categorical) data

This type describes qualities or characteristics and is typically non-numeric.

2. Quantitative (Numerical) data

Variables: the building blocks of data

A variable is any characteristic, number, or quantity that can be measured or

 Independent Variable: The factor that is manipulated or changed in an

 Categorical Variables: Represent categories or groups, including nominal and

 Continuous Variables: Quantitative variables capable of taking an infinite number

 Confounding Variables: Extraneous variables that can cause a false association

Imputation is a frequently used method to handle missing data by replacing absent

Imputation is important because it:

 Enables use of machine learning algorithms: Many algorithms require complete

Types of missing data

 Missing at Random (MAR): Missingness is related to other observed variables

 Missing Not at Random (MNAR): Missingness is related to the missing values or

 Mean/Median/Mode Imputation: Replacing missing values with the mean,

 K-Nearest Neighbors (KNN) Imputation: Replacing missing values based on

 Regression Imputation: Using a regression model to predict missing values based

 Multiple Imputation: Creating multiple datasets with different imputed values to

Choosing the right method

 Type of Data: Different data types may require different approaches.

You might also like