0% found this document useful (0 votes)

12 views8 pages

EN-Week 1

The document provides an overview of data analytics, detailing its definition, key skills required, and various types of analytics including descriptive, diagnostic, predictive, and prescriptive analysis. It also outlines the data analytics process, basic tools like Excel, Tableau, and Python, and introduces concepts such as data modeling, data warehousing, and big data. The content emphasizes the importance of critical thinking, hypothesis testing, data wrangling, and visualization in deriving insights from data.

Uploaded by

mangsbill78

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views8 pages

EN-Week 1

Uploaded by

mangsbill78

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Skills for Hire

Data Analytics
Week 1 – Data Analytics 101

What is Data Analytics?

Data analytics is the process of analyzing raw data in order to draw out meaningful, actionable insights, which are
then used to inform and drive smart business decisions.

Data Science vs Data Analytics

Data science is the process of building, cleaning, and structuring datasets to analyze and extract meaning.

Data analytics, on the other hand, refers to the process and practice of analyzing data to answer questions, extract
insights, and identify trends.

You can think of data science as a precursor to data analysis. If your dataset isn’t structured, cleaned, and wrangled,
how will you be able to draw accurate, insightful conclusions?

7 Data Analytics Skills you need

● Critical Thinking: If you’re interested in using data to solve business problems, you need to be adept at
thinking critically about challenges and solutions. While data can provide many answers, it’s nothing
without a human’s discerning eye. “From the first steps of determining the quality of a data source to
determining the success of an algorithm, critical thinking is at the heart of every decision data
scientists—and those who work with them—make,” Tingley says in the Harvard Online course Data
Science Principles. “Data science is a discipline that’s built on a foundation of critical thinking.”

● Hypothesis Formation and Testing: At the heart of data and analytics is the desire to answer questions.
The proposed explanations for these leading questions are called hypotheses, which must be formed
before analysis takes place. An example of a hypothesis is, “I predict that a person’s likelihood of
recommending our product is directly proportional to their reported satisfaction with the product.” You
predict the data will show this trend and must prove or disprove the hypothesis through analysis. Without
a hypothesis, your analysis has no clear direction.

● Data Wrangling: Data wrangling is the process of cleaning raw data in preparation for analysis. It involves
identifying and resolving mistakes, filling in missing data, and organizing and transferring it into an easily
understandable format. This is an important skill for anyone dealing with data to acquire because it leads
to a more efficient and organized data analysis process. You can extract valuable insights from data more
quickly when it’s cleaned and in its optimal viewing format.

● Mathematical Ability: You don’t have to be a mathematician to become data literate, but strong math skills
become increasingly important as you deal with more complex analyses. A seasoned data professional
needs a solid understanding of statistics, probability, linear algebra, and multivariable calculus. Data
scientists often call on statistical methods to find structure in data and make predictions, and linear
algebra and calculus can make machine-learning algorithms easier to comprehend. If you’re not a data
scientist or analyst, your work may not require you to understand the more complex mathematical
concepts, but having a basic understanding of statistics can go a long way.

● Data Visualization: It’s crucial to know how to transform raw data into compelling visuals that tell a story.
Rather than simply presenting a list of values to your stakeholders, it’s more effective to visually
communicate data in a way that’s easily digestible. Some popular data visualization techniques that all
business professionals should know include pie charts, bar charts, and histograms. To create these
visualizations, use a data visualization tool, a form of software designed to present data. Each tool’s
capabilities vary but, at their most basic, allow you to input a dataset and visually manipulate it. Most, but
not all, come with built-in templates you can use to generate basic visualizations. Examples include
Microsoft Excel and Power BI, Google Charts, Tableau.

● Programming: Programming languages, like Python and R, are commonly used to solve complex statistical
problems with data. Proficiency in a database querying language, like SQL, can also help you more easily
extract and change data in a database. While programming skills are immensely valuable, they’re not
necessary for beginners dabbling in data. It’s more important to focus on effectively analyzing and
visualizing data to draw conclusions.

● Machine Learning As artificial intelligence grows in popularity, machine learning is a highly valuable skill
for professionals working with big data.

Types of Data Analytics

● Descriptive analysis purely describes what has happened and presents it in a digestible snapshot.

Descriptive analytics answers the question, “What happened?”

For example, imagine you’re analyzing your company’s data and find there’s a seasonal surge in sales for
one of your products: a video game console. Here, descriptive analytics can tell you, “This video game
console experiences an increase in sales in October, November, and early December each year.”

Data visualization is a natural fit for communicating descriptive analysis because charts, graphs, and maps
can show trends in data—as well as dips and spikes—in a clear, easily understandable way.

● Diagnostic analysis seeks to establish why those things may have happened.

Taking the analysis a step further, this type includes comparing coexisting trends or movements,
uncovering correlations between variables, and determining causal relationships where possible.

Continuing the example, you may dig into video game console users’ demographic data and find that
they’re between the ages of eight and 18. The customers, however, tend to be between the ages of 35 and
55.

Analysis of customer survey data reveals that one primary motivator for customers to purchase the video
game console is to gift it to their children. The spike in sales in the fall and early winter months may be
due to the holidays that include gift-giving.

Diagnostic analytics is useful for getting at the root of an organizational issue.

● Predictive analysis makes use of past patterns and trends in data to estimate the likelihood of a future
outcome or event.

By analyzing historical data in tandem with industry trends, you can make informed predictions about
what the future could hold for your company.

For instance, knowing that video game console sales have spiked in October, November, and early
December every year for the past decade provides you with ample data to predict that the same trend will
occur next year. Backed by upward trends in the video game industry as a whole, this is a reasonable
prediction to make.
Making predictions for the future can help your organization formulate strategies based on likely
scenarios.

● Prescriptive analysis is the conclusion of the other forms of analysis: now that we’ve found out what
happened, why it happened, and what may happen in the future, what should be done next?

Prescriptive analytics takes into account all possible factors in a scenario and suggests actionable
takeaways. This type of analytics can be especially useful when making data-driven decisions.

Rounding out the video game example: What should your team decide to do given the predicted trend in
seasonality due to winter gift-giving? Perhaps you decide to run an A/B test with two ads: one that caters
to product end-users (children) and one targeted to customers (their parents). The data from that test can
inform how to capitalize on the seasonal spike and its supposed cause even further. Or, maybe you decide
to increase marketing efforts in September with holiday-themed messaging to try to extend the spike into
another month.

While manual prescriptive analysis is doable and accessible, machine-learning algorithms are often
employed to help parse through large volumes of data to recommend the optimal next step. Algorithms
use “if” and “else” statements, which work as rules for parsing data. If a specific combination of
requirements is met, an algorithm recommends a specific course of action. While there’s far more to
machine-learning algorithms than just those statements, they—along with mathematical
equations—serve as a core component in algorithm training.

Data Analytics Process

The steps are as follows:

● The data analyst will first need to define their objective, otherwise known as a ‘problem statement.’
● Once the analyst has established their objective for the analysis, they’ll need to design a strategy for
collecting the appropriate data. Firstly, they’ll need to determine what kind of data they’ll need:
quantitative (numeric) data such as sales figures, or qualitative (descriptive) data, which may include
customer surveys.
● The data analyst will need to clean the data to make sure it’s of high quality. This cleaning—or
“scrubbing”—process involves:
o Removing unwanted data points
o Removing major errors, duplicates, and outliers
o Filling in any missing data
o Bringing structure to the data
● The data analyst will apply the methodologies associated with the analysis type that will best “solve” their
problem statement.
● The data analyst must now present their findings in a way that’s clear and easily understood by key
stakeholders. In order to do this, an analyst may use visualization software—such as Tableau or Microsoft
Power BI—that will generate reports, dashboards, or interactive visualizations.

Basic Data Analytics Tools

Excel

● Excel is a spreadsheet and a simple yet powerful tool for data collection and analysis.
● Excel is not free; it is a part of the Microsoft Office “suite” of programs.
● Excel does not need a UI to enter data; you can start right away.
● It is readily available, widely used and easy to learn and start on data analysis
● The Data Analysis Toolpak in Excel offers a variety of options to perform statistical analysis of your data.
The charts and graphs in Excel give a clear interpretation and visualization of your
● Data, which helps in decision-making as they are easy to understand.
● Demo – Basic Excel Functions. Live Demo)
o Arithmetic Operations (+, -, *,/)
o Sorting and Filtering
o VLOOKUP and IF function-
o Visualization - Charts.
o Pivot Table - A Pivot Table is a summary of a large dataset that usually includes the total figures,
average, minimum, maximum, etc. let’s say you have sales data for different regions, with a pivot
table, you can summarize the data by region and find the average sales per region, the maximum
and minimum sale per region, etc. Pivot tables allow us to analyze, summarize and show only
relevant data in our reports.

Tableau

● Tableau is a BI (Business Intelligence) tool developed for data analysts where one can visualize, analyze,
and understand data.
● Tableau is not free software, and the pricing varies as per different data needs
● Tableau provides fast analytics; it can explore any type of data – spreadsheets, databases, data on Hadoop,
and cloud services
● It is easy to use as it has powerful drag-and-drop features that anyone with an intuitive mind can handle.
● The data visualization with smart dashboards can be shared within seconds.

Python
● Python was initially designed as an Object-Oriented Programming language for software and web
development and later enhanced for data science. Python is the fastest-growing programming language
today.
● It is a powerful Data Analysis tool and has a great set of friendly libraries for any aspect of scientific
computing.
● Python is free, open-source software, and it is easy to learn.
● Python’s data analysis library Pandas was built over NumPy, which is one of the earliest libraries in Python
for data science.

*Python will be covered in detail in the coming weeks

Databases

● Databases provide a tremendous amount of capacity and flexibility in working with our data far beyond
that of a spreadsheet. So when we need to store large quantities of data of we have complex relationships
in the data or we just simply need to interact with it in more sophisticated or advanced ways, databases
provide all the capabilities we need, whereas a spreadsheet is very limited to what you can do kind of by
hand working with it in Excel.

Two major types of databases

o Relational: In relational database, every piece of information has a relationship with every other
piece of information. This is on account of every data value in the database having a unique
identity in the form of a record.
▪ Note that all data is tabulated in this model. Therefore, every row of data in the
database is linked with another row using a primary key. Similarly, every table is linked
with another table using a foreign key.
▪ Refer to the diagram below and notice how the concept of ‘Keys’ is used to link two
tables:

o NoSQL: NoSQL Database is a non-relational Data Management System, that does not require a
fixed schema. It avoids joins, and is easy to scale. Traditional RDBMS uses SQL syntax to store and
retrieve data for further insights. Instead, a NoSQL database system encompasses a wide range of
database technologies that can store structured, semi-structured, unstructured and polymorphic
data.

*We will cover SQL in detail in the upcoming weeks

Data Modelling

Data modeling (data modeling) is the process of creating a data model for the data to be stored in a database. This
data model is a conceptual representation of Data objects, the associations between different data objects, and the
rules. Data modeling helps in the visual representation of data and enforces business rules, regulatory compliances,
and government policies on the data.

Types

● Conceptual Data Model: This Data Model defines WHAT the system contains. This model is typically
created by Business stakeholders and Data Architects. The purpose is to organize, scope, and define
business concepts and rules.
● Logical Data Model: Defines HOW the system should be implemented regardless of the DBMS. This model
is typically created by Data Architects and Business Analysts. The purpose is to develop a technical map of
rules and data structures.
● Physical Data Model: This Data Model describes HOW the system will be implemented using a specific
DBMS system. This model is typically created by DBA and developers. The purpose is actual
implementation of the database.

Key Terms

● Relationship: How tables will relate to each other so that you can do your transactional processing or do
your analytics or whatever you're using your database for. The different types of relationships: There's the
one-to-one, the one-to-many, and then conversely the many-to-one, and the many-to-many.

● Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to
eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into
a relationship due to a lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data
results in the unintended loss of some other important data.
o Updation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.

Data Warehouse

Data Warehousing (DW) is a process for collecting and managing data from varied sources to provide meaningful
business insights. It is electronic storage of a large amount of information by a business which is designed for query
and analysis instead of transaction processing. It is a process of transforming data into information and making it
available to users in a timely manner to make a difference.

Data may be: Structured Semi-structured and Unstructured data

The data is processed, transformed, and ingested so that users can access the processed data in the Data
Warehouse through Business Intelligence tools, SQL clients, and spreadsheets. A data warehouse merges
information coming from different sources into one comprehensive database.

A data mart is a subset of a data warehouse focused on a particular line of business, department, or subject area.
Data marts make specific data available to a defined group of users, which allows those users to quickly access
critical insights without wasting time searching through an entire data warehouse.

Star Schema – the foundation of data warehouse Star schema is the fundamental schema among the data mart
schema, and it is simplest. This schema is widely used to develop or build a data warehouse and dimensional data
marts. It includes one or more fact tables indexing any number of dimensional tables.

It is said to be the star as its physical model resembles the star shape having a fact table at its center and the
dimension tables at its periphery representing the star’s points. Below is an example to demonstrate the Star
Schema:

Big Data

Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is data with so large a
size and complexity that none of the traditional data management tools can store it or process it efficiently. Big
data is also data but with a huge size. For example – Social media 500+terabytes of new data get ingested into the
databases of the social media site Facebook, every day. This data is mainly generated in terms of photo and video
uploads, message exchanges, putting comments, etc.

Big data can be described by the following characteristics:

o Volume refers to the size of the data.

o Variety refers to heterogeneous sources and the nature of data, both structured and unstructured.
o Velocity refers to the speed of generation of data. How fast the data is generated and processed to meet
the demands, determines the real potential of the data.
o Variability refers to the inconsistency which can be shown by the data at times, thus hampering the
process of being able to handle and manage the data effectively.

Excel Lab

1. Sort, Search and Filter Functions

2. Arithmetic Operations
3. VLookup
4. IF

Girlfriend Ki Help Se Uski Sisters or Apni Sisters Ko Choda
65% (139)
Girlfriend Ki Help Se Uski Sisters or Apni Sisters Ko Choda
603 pages
Data Science & Analytics Basics
No ratings yet
Data Science & Analytics Basics
72 pages
DKV Card Specification - V - 1 - 21-1
No ratings yet
DKV Card Specification - V - 1 - 21-1
10 pages
HubSpots Guide To Data Analytics
No ratings yet
HubSpots Guide To Data Analytics
50 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
16 pages
Data Analytics Unit1
No ratings yet
Data Analytics Unit1
24 pages
Data Analytics for Beginners
No ratings yet
Data Analytics for Beginners
11 pages
Data Analyst
No ratings yet
Data Analyst
12 pages
Beginner's Guide To Data Analysis
No ratings yet
Beginner's Guide To Data Analysis
7 pages
Data Analytics For Beginners - Paul Kinley - CreateSpace Independent Publishing Platform 2016 - IsBN 978-1-53989-673-9
100% (2)
Data Analytics For Beginners - Paul Kinley - CreateSpace Independent Publishing Platform 2016 - IsBN 978-1-53989-673-9
51 pages
Data Analytics - 1
No ratings yet
Data Analytics - 1
21 pages
Week 1
No ratings yet
Week 1
50 pages
Data Analytics
No ratings yet
Data Analytics
16 pages
What Are Data Analytics
No ratings yet
What Are Data Analytics
19 pages
Intro To Data Analytics
No ratings yet
Intro To Data Analytics
42 pages
Data Science Analytics Module
No ratings yet
Data Science Analytics Module
5 pages
Unit I (Notes 2)
No ratings yet
Unit I (Notes 2)
16 pages
Power BI - Notes
No ratings yet
Power BI - Notes
13 pages
Unidad 3 The Wonders of The Modern Technology
No ratings yet
Unidad 3 The Wonders of The Modern Technology
34 pages
Enhanced Structured Notes - Introduction To Data Analytics
No ratings yet
Enhanced Structured Notes - Introduction To Data Analytics
5 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
6 pages
Business Analytics Unit I
No ratings yet
Business Analytics Unit I
45 pages
Week 1 Lecture
No ratings yet
Week 1 Lecture
26 pages
DA Unit 1
No ratings yet
DA Unit 1
14 pages
Data Analytics
No ratings yet
Data Analytics
32 pages
Unit 3 Nivelación de Inglés
No ratings yet
Unit 3 Nivelación de Inglés
34 pages
Data Analysis Vs Analytics
No ratings yet
Data Analysis Vs Analytics
4 pages
Data Literacy for Business Pros
No ratings yet
Data Literacy for Business Pros
24 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
4 pages
Notes Data Science With Python 1
No ratings yet
Notes Data Science With Python 1
18 pages
CH 1
No ratings yet
CH 1
31 pages
Data Science Essentials Guide
No ratings yet
Data Science Essentials Guide
12 pages
Data Analytics 1
No ratings yet
Data Analytics 1
3 pages
Unit 1
No ratings yet
Unit 1
50 pages
Data-Analysis-Chapter 1-Compressed
No ratings yet
Data-Analysis-Chapter 1-Compressed
20 pages
Data Sci Notes
No ratings yet
Data Sci Notes
88 pages
Document Read
No ratings yet
Document Read
4 pages
Notes Da - Foundations
No ratings yet
Notes Da - Foundations
4 pages
AA THeory and Methods
No ratings yet
AA THeory and Methods
40 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
96 pages
PrE7 Chapter 8 Data Analytics
No ratings yet
PrE7 Chapter 8 Data Analytics
20 pages
1overview of Data Analysis
No ratings yet
1overview of Data Analysis
3 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
19 pages
Misheck Mlambo n02217292f Data Analytics Test 2
No ratings yet
Misheck Mlambo n02217292f Data Analytics Test 2
12 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
44 pages
Introduction To Data Science and Data Analytics
No ratings yet
Introduction To Data Science and Data Analytics
85 pages
Data Analytics - TYBCS
No ratings yet
Data Analytics - TYBCS
6 pages
1 Introduction To Data Analytics
No ratings yet
1 Introduction To Data Analytics
14 pages
Data Analyst
No ratings yet
Data Analyst
4 pages
Overview of Data Analysis
No ratings yet
Overview of Data Analysis
4 pages
Chap 1 Notes
No ratings yet
Chap 1 Notes
26 pages
Data Analytics
No ratings yet
Data Analytics
9 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
30 pages
Online Return Center
No ratings yet
Online Return Center
2 pages
Finalized Cocktails For Menu Flip
No ratings yet
Finalized Cocktails For Menu Flip
4 pages
Sign PDF ESign Your PDFs With Digital Signatures Online
No ratings yet
Sign PDF ESign Your PDFs With Digital Signatures Online
1 page
Nine Tails
No ratings yet
Nine Tails
10 pages
MS Excel Full Notes PDF Free Download - Google Search
No ratings yet
MS Excel Full Notes PDF Free Download - Google Search
3 pages
Maths Project File
No ratings yet
Maths Project File
9 pages
Design of OFDM Transmitter and Receiver For Error Free Communication
No ratings yet
Design of OFDM Transmitter and Receiver For Error Free Communication
61 pages
Work BRITISH Council
No ratings yet
Work BRITISH Council
2 pages
Abstract Logical Reasoning Reviewer
No ratings yet
Abstract Logical Reasoning Reviewer
5 pages
HandsOn Solutions
No ratings yet
HandsOn Solutions
41 pages
Mysql Json Export
100% (1)
Mysql Json Export
7 pages
Ba I Khao Sat HSG Anh 8 - V1-2021 39144
No ratings yet
Ba I Khao Sat HSG Anh 8 - V1-2021 39144
6 pages
CHAITYAVANDAN
No ratings yet
CHAITYAVANDAN
4 pages
What The Internet Really Is
No ratings yet
What The Internet Really Is
3 pages
69th Film Awards MCQs
No ratings yet
69th Film Awards MCQs
25 pages
March Topic Letter Final Nursery
No ratings yet
March Topic Letter Final Nursery
4 pages
A. Nagoor Kani - Circuit Theory-McGraw-Hill Education (2018)
67% (3)
A. Nagoor Kani - Circuit Theory-McGraw-Hill Education (2018)
808 pages
Engleza Clasa7
No ratings yet
Engleza Clasa7
4 pages
Handout Number Week 1
No ratings yet
Handout Number Week 1
1 page
Determinants: 97 Questions & Solutions
No ratings yet
Determinants: 97 Questions & Solutions
13 pages
Cambridge Checkpoint Science Student's Book 1 Riley Peter Download
100% (2)
Cambridge Checkpoint Science Student's Book 1 Riley Peter Download
31 pages
National Anthem Player System IOT (1) (Perfect)
No ratings yet
National Anthem Player System IOT (1) (Perfect)
11 pages
12 Ip
No ratings yet
12 Ip
4 pages
24 Lessons Learned
No ratings yet
24 Lessons Learned
3 pages
Loading Data in +snowflake
No ratings yet
Loading Data in +snowflake
10 pages
Tesla's TTPoE for AI Supercomputers
No ratings yet
Tesla's TTPoE for AI Supercomputers
23 pages
Theory of L-Functions: An Introduction To The
No ratings yet
Theory of L-Functions: An Introduction To The
205 pages
Face-to-Face Communication & Tech
No ratings yet
Face-to-Face Communication & Tech
4 pages
Harmonize 2 TRM Review 58 Vocab Grammar Worksheets
No ratings yet
Harmonize 2 TRM Review 58 Vocab Grammar Worksheets
6 pages
Future Continuous Tense Guide
0% (1)
Future Continuous Tense Guide
2 pages
Manning 2010
No ratings yet
Manning 2010
13 pages
Officer, General Admin, Level 6
No ratings yet
Officer, General Admin, Level 6
8 pages

EN-Week 1

Uploaded by

EN-Week 1

Uploaded by

Skills for Hire

What is Data Analytics?

Data Science vs Data Analytics

7 Data Analytics Skills you need

Types of Data Analytics

Descriptive analytics answers the question, “What happened?”

Diagnostic analytics is useful for getting at the root of an organizational issue.

Data Analytics Process

The steps are as follows:

Basic Data Analytics Tools

*Python will be covered in detail in the coming weeks

Two major types of databases

*We will cover SQL in detail in the upcoming weeks

Data may be: Structured Semi-structured and Unstructured data

Big data can be described by the following characteristics:

o Volume refers to the size of the data.

1. Sort, Search and Filter Functions

You might also like