Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views8 pages

EN-Week 1

The document provides an overview of data analytics, detailing its definition, key skills required, and various types of analytics including descriptive, diagnostic, predictive, and prescriptive analysis. It also outlines the data analytics process, basic tools like Excel, Tableau, and Python, and introduces concepts such as data modeling, data warehousing, and big data. The content emphasizes the importance of critical thinking, hypothesis testing, data wrangling, and visualization in deriving insights from data.

Uploaded by

mangsbill78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views8 pages

EN-Week 1

The document provides an overview of data analytics, detailing its definition, key skills required, and various types of analytics including descriptive, diagnostic, predictive, and prescriptive analysis. It also outlines the data analytics process, basic tools like Excel, Tableau, and Python, and introduces concepts such as data modeling, data warehousing, and big data. The content emphasizes the importance of critical thinking, hypothesis testing, data wrangling, and visualization in deriving insights from data.

Uploaded by

mangsbill78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Skills for Hire

Data Analytics
Week 1 – Data Analytics 101

What is Data Analytics?

Data analytics is the process of analyzing raw data in order to draw out meaningful, actionable insights, which are
then used to inform and drive smart business decisions.

Data Science vs Data Analytics

Data science is the process of building, cleaning, and structuring datasets to analyze and extract meaning.

Data analytics, on the other hand, refers to the process and practice of analyzing data to answer questions, extract
insights, and identify trends.

You can think of data science as a precursor to data analysis. If your dataset isn’t structured, cleaned, and wrangled,
how will you be able to draw accurate, insightful conclusions?

7 Data Analytics Skills you need

● Critical Thinking: If you’re interested in using data to solve business problems, you need to be adept at
thinking critically about challenges and solutions. While data can provide many answers, it’s nothing
without a human’s discerning eye. “From the first steps of determining the quality of a data source to
determining the success of an algorithm, critical thinking is at the heart of every decision data
scientists—and those who work with them—make,” Tingley says in the Harvard Online course Data
Science Principles. “Data science is a discipline that’s built on a foundation of critical thinking.”

● Hypothesis Formation and Testing: At the heart of data and analytics is the desire to answer questions.
The proposed explanations for these leading questions are called hypotheses, which must be formed
before analysis takes place. An example of a hypothesis is, “I predict that a person’s likelihood of
recommending our product is directly proportional to their reported satisfaction with the product.” You
predict the data will show this trend and must prove or disprove the hypothesis through analysis. Without
a hypothesis, your analysis has no clear direction.

● Data Wrangling: Data wrangling is the process of cleaning raw data in preparation for analysis. It involves
identifying and resolving mistakes, filling in missing data, and organizing and transferring it into an easily
understandable format. This is an important skill for anyone dealing with data to acquire because it leads
to a more efficient and organized data analysis process. You can extract valuable insights from data more
quickly when it’s cleaned and in its optimal viewing format.

● Mathematical Ability: You don’t have to be a mathematician to become data literate, but strong math skills
become increasingly important as you deal with more complex analyses. A seasoned data professional
needs a solid understanding of statistics, probability, linear algebra, and multivariable calculus. Data
scientists often call on statistical methods to find structure in data and make predictions, and linear
algebra and calculus can make machine-learning algorithms easier to comprehend. If you’re not a data
scientist or analyst, your work may not require you to understand the more complex mathematical
concepts, but having a basic understanding of statistics can go a long way.

● Data Visualization: It’s crucial to know how to transform raw data into compelling visuals that tell a story.
Rather than simply presenting a list of values to your stakeholders, it’s more effective to visually
communicate data in a way that’s easily digestible. Some popular data visualization techniques that all
business professionals should know include pie charts, bar charts, and histograms. To create these
visualizations, use a data visualization tool, a form of software designed to present data. Each tool’s
capabilities vary but, at their most basic, allow you to input a dataset and visually manipulate it. Most, but
not all, come with built-in templates you can use to generate basic visualizations. Examples include
Microsoft Excel and Power BI, Google Charts, Tableau.

● Programming: Programming languages, like Python and R, are commonly used to solve complex statistical
problems with data. Proficiency in a database querying language, like SQL, can also help you more easily
extract and change data in a database. While programming skills are immensely valuable, they’re not
necessary for beginners dabbling in data. It’s more important to focus on effectively analyzing and
visualizing data to draw conclusions.

● Machine Learning As artificial intelligence grows in popularity, machine learning is a highly valuable skill
for professionals working with big data.

Types of Data Analytics

● Descriptive analysis purely describes what has happened and presents it in a digestible snapshot.

Descriptive analytics answers the question, “What happened?”

For example, imagine you’re analyzing your company’s data and find there’s a seasonal surge in sales for
one of your products: a video game console. Here, descriptive analytics can tell you, “This video game
console experiences an increase in sales in October, November, and early December each year.”

Data visualization is a natural fit for communicating descriptive analysis because charts, graphs, and maps
can show trends in data—as well as dips and spikes—in a clear, easily understandable way.

● Diagnostic analysis seeks to establish why those things may have happened.

Taking the analysis a step further, this type includes comparing coexisting trends or movements,
uncovering correlations between variables, and determining causal relationships where possible.

Continuing the example, you may dig into video game console users’ demographic data and find that
they’re between the ages of eight and 18. The customers, however, tend to be between the ages of 35 and
55.

Analysis of customer survey data reveals that one primary motivator for customers to purchase the video
game console is to gift it to their children. The spike in sales in the fall and early winter months may be
due to the holidays that include gift-giving.

Diagnostic analytics is useful for getting at the root of an organizational issue.

● Predictive analysis makes use of past patterns and trends in data to estimate the likelihood of a future
outcome or event.

By analyzing historical data in tandem with industry trends, you can make informed predictions about
what the future could hold for your company.

For instance, knowing that video game console sales have spiked in October, November, and early
December every year for the past decade provides you with ample data to predict that the same trend will
occur next year. Backed by upward trends in the video game industry as a whole, this is a reasonable
prediction to make.
Making predictions for the future can help your organization formulate strategies based on likely
scenarios.

● Prescriptive analysis is the conclusion of the other forms of analysis: now that we’ve found out what
happened, why it happened, and what may happen in the future, what should be done next?

Prescriptive analytics takes into account all possible factors in a scenario and suggests actionable
takeaways. This type of analytics can be especially useful when making data-driven decisions.

Rounding out the video game example: What should your team decide to do given the predicted trend in
seasonality due to winter gift-giving? Perhaps you decide to run an A/B test with two ads: one that caters
to product end-users (children) and one targeted to customers (their parents). The data from that test can
inform how to capitalize on the seasonal spike and its supposed cause even further. Or, maybe you decide
to increase marketing efforts in September with holiday-themed messaging to try to extend the spike into
another month.

While manual prescriptive analysis is doable and accessible, machine-learning algorithms are often
employed to help parse through large volumes of data to recommend the optimal next step. Algorithms
use “if” and “else” statements, which work as rules for parsing data. If a specific combination of
requirements is met, an algorithm recommends a specific course of action. While there’s far more to
machine-learning algorithms than just those statements, they—along with mathematical
equations—serve as a core component in algorithm training.

Data Analytics Process

The steps are as follows:

● The data analyst will first need to define their objective, otherwise known as a ‘problem statement.’
● Once the analyst has established their objective for the analysis, they’ll need to design a strategy for
collecting the appropriate data. Firstly, they’ll need to determine what kind of data they’ll need:
quantitative (numeric) data such as sales figures, or qualitative (descriptive) data, which may include
customer surveys.
● The data analyst will need to clean the data to make sure it’s of high quality. This cleaning—or
“scrubbing”—process involves:
o Removing unwanted data points
o Removing major errors, duplicates, and outliers
o Filling in any missing data
o Bringing structure to the data
● The data analyst will apply the methodologies associated with the analysis type that will best “solve” their
problem statement.
● The data analyst must now present their findings in a way that’s clear and easily understood by key
stakeholders. In order to do this, an analyst may use visualization software—such as Tableau or Microsoft
Power BI—that will generate reports, dashboards, or interactive visualizations.

Basic Data Analytics Tools

Excel

● Excel is a spreadsheet and a simple yet powerful tool for data collection and analysis.
● Excel is not free; it is a part of the Microsoft Office “suite” of programs.
● Excel does not need a UI to enter data; you can start right away.
● It is readily available, widely used and easy to learn and start on data analysis
● The Data Analysis Toolpak in Excel offers a variety of options to perform statistical analysis of your data.
The charts and graphs in Excel give a clear interpretation and visualization of your
● Data, which helps in decision-making as they are easy to understand.
● Demo – Basic Excel Functions. Live Demo)
o Arithmetic Operations (+, -, *,/)
o Sorting and Filtering
o VLOOKUP and IF function-
o Visualization - Charts.
o Pivot Table - A Pivot Table is a summary of a large dataset that usually includes the total figures,
average, minimum, maximum, etc. let’s say you have sales data for different regions, with a pivot
table, you can summarize the data by region and find the average sales per region, the maximum
and minimum sale per region, etc. Pivot tables allow us to analyze, summarize and show only
relevant data in our reports.

Tableau

● Tableau is a BI (Business Intelligence) tool developed for data analysts where one can visualize, analyze,
and understand data.
● Tableau is not free software, and the pricing varies as per different data needs
● Tableau provides fast analytics; it can explore any type of data – spreadsheets, databases, data on Hadoop,
and cloud services
● It is easy to use as it has powerful drag-and-drop features that anyone with an intuitive mind can handle.
● The data visualization with smart dashboards can be shared within seconds.

Python
● Python was initially designed as an Object-Oriented Programming language for software and web
development and later enhanced for data science. Python is the fastest-growing programming language
today.
● It is a powerful Data Analysis tool and has a great set of friendly libraries for any aspect of scientific
computing.
● Python is free, open-source software, and it is easy to learn.
● Python’s data analysis library Pandas was built over NumPy, which is one of the earliest libraries in Python
for data science.

*Python will be covered in detail in the coming weeks

Databases

● Databases provide a tremendous amount of capacity and flexibility in working with our data far beyond
that of a spreadsheet. So when we need to store large quantities of data of we have complex relationships
in the data or we just simply need to interact with it in more sophisticated or advanced ways, databases
provide all the capabilities we need, whereas a spreadsheet is very limited to what you can do kind of by
hand working with it in Excel.

Two major types of databases

o Relational: In relational database, every piece of information has a relationship with every other
piece of information. This is on account of every data value in the database having a unique
identity in the form of a record.
▪ Note that all data is tabulated in this model. Therefore, every row of data in the
database is linked with another row using a primary key. Similarly, every table is linked
with another table using a foreign key.
▪ Refer to the diagram below and notice how the concept of ‘Keys’ is used to link two
tables:

o NoSQL: NoSQL Database is a non-relational Data Management System, that does not require a
fixed schema. It avoids joins, and is easy to scale. Traditional RDBMS uses SQL syntax to store and
retrieve data for further insights. Instead, a NoSQL database system encompasses a wide range of
database technologies that can store structured, semi-structured, unstructured and polymorphic
data.

*We will cover SQL in detail in the upcoming weeks

Data Modelling

Data modeling (data modeling) is the process of creating a data model for the data to be stored in a database. This
data model is a conceptual representation of Data objects, the associations between different data objects, and the
rules. Data modeling helps in the visual representation of data and enforces business rules, regulatory compliances,
and government policies on the data.

Types

● Conceptual Data Model: This Data Model defines WHAT the system contains. This model is typically
created by Business stakeholders and Data Architects. The purpose is to organize, scope, and define
business concepts and rules.
● Logical Data Model: Defines HOW the system should be implemented regardless of the DBMS. This model
is typically created by Data Architects and Business Analysts. The purpose is to develop a technical map of
rules and data structures.
● Physical Data Model: This Data Model describes HOW the system will be implemented using a specific
DBMS system. This model is typically created by DBA and developers. The purpose is actual
implementation of the database.

Key Terms

● Relationship: How tables will relate to each other so that you can do your transactional processing or do
your analytics or whatever you're using your database for. The different types of relationships: There's the
one-to-one, the one-to-many, and then conversely the many-to-one, and the many-to-many.

● Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to
eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into
a relationship due to a lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data
results in the unintended loss of some other important data.
o Updation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.

Data Warehouse

Data Warehousing (DW) is a process for collecting and managing data from varied sources to provide meaningful
business insights. It is electronic storage of a large amount of information by a business which is designed for query
and analysis instead of transaction processing. It is a process of transforming data into information and making it
available to users in a timely manner to make a difference.

Data may be: Structured Semi-structured and Unstructured data

The data is processed, transformed, and ingested so that users can access the processed data in the Data
Warehouse through Business Intelligence tools, SQL clients, and spreadsheets. A data warehouse merges
information coming from different sources into one comprehensive database.

A data mart is a subset of a data warehouse focused on a particular line of business, department, or subject area.
Data marts make specific data available to a defined group of users, which allows those users to quickly access
critical insights without wasting time searching through an entire data warehouse.

Star Schema – the foundation of data warehouse Star schema is the fundamental schema among the data mart
schema, and it is simplest. This schema is widely used to develop or build a data warehouse and dimensional data
marts. It includes one or more fact tables indexing any number of dimensional tables.

It is said to be the star as its physical model resembles the star shape having a fact table at its center and the
dimension tables at its periphery representing the star’s points. Below is an example to demonstrate the Star
Schema:

Big Data

Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is data with so large a
size and complexity that none of the traditional data management tools can store it or process it efficiently. Big
data is also data but with a huge size. For example – Social media 500+terabytes of new data get ingested into the
databases of the social media site Facebook, every day. This data is mainly generated in terms of photo and video
uploads, message exchanges, putting comments, etc.

Big data can be described by the following characteristics:

o Volume refers to the size of the data.


o Variety refers to heterogeneous sources and the nature of data, both structured and unstructured.
o Velocity refers to the speed of generation of data. How fast the data is generated and processed to meet
the demands, determines the real potential of the data.
o Variability refers to the inconsistency which can be shown by the data at times, thus hampering the
process of being able to handle and manage the data effectively.

Excel Lab

1. Sort, Search and Filter Functions


2. Arithmetic Operations
3. VLookup
4. IF

You might also like