Business Analytics Material
Business Analytics Material
DATA VISUALISATION
Business Analytics - Definition, Journey of Information System. Types of analytics - Descriptive,
Diagnostics, Discovery, Predictive and Prescriptive analytics. Data visualisation - Overview of Data
Visualization, The Shapes of Data, Common Visualization Idioms, Visualization of Spatial Data, Data
Storytelling, Visualization of Non-Numerical Data, Using Colour and Size in Visualization, Visualization of
Numerical Data.
Unit II MARKETING ANALYTICS 9
Market Segmentation: Cluster Analysis and Collaborative Filtering. What do Consumers want? – Conjoint
Analysis. Pricing Analytics: Estimating Demand Curves and Optimize Price, Price Bundling, Non-linear
Pricing, Price Skimming and Sales, Revenue Management. Promotion Analytics. Retail Analytics: Market
Basket Analysis and Lift, RFM analysis.
Unit FINANCIAL ANALYTICS 9
III
Profitability Analysis for firm’s business performance & profit earning ability. Working Capital Analysis for
firms’ operational efficiency. Activity Analysis for evaluation of a company’s production process, human
resource requirements, time taken, raw materials consumed, and value creation. Financial Structure Analysis
for the interpretation of the business capital structure to balance the firm’s debt and equity proportion.
Unit OPERATIONS ANALYTICS 9
IV
Forecasting as Integral part of Business Planning – Qualitative and Quantitative Forecasting Techniques.
Decision Tree analysis for facility planning. Evaluating location alternatives – Cost profit and volume
analysis, Center of gravity method. Line Balancing Heuristics – Incremental utilization heuristics, longest
time heuristics. Project Planning and Control techniques. Determining Order quantities.
Unit V HUMAN RESOURCE ANALYTICS 9
Workforce Segmenting – based on employee facts, visualizing headcount by segment, analyzing metrics by
segment, understanding segment hierarchies, creating calculated segments. Estimating Employee Life - time
Value (ELV) – importance, calculating ELV and making better decisions with ELV. Creating Employee
journey Map. Talent acquisition analytics. Measuring ABCs of a Productive Worker, Analyzing Employee
commitment and Attrition.
1.1. Definition
Business analytics, or simply analytics, is the use of data, information technology, statistical
analysis, quantitative methods, and mathematical or computer-based models to help managers gain
improved insight about their business operations and make better, fact based decisions.
Business analytics is “a process of transforming data into actions through analysis and insights in the context
of organizational decision making and problem solving.”
Modern business analytics can be viewed as an integration of BI/IS, statistics, and modeling and
optimization as illustrated in Figure. While the core topics are traditional and have been used for decades,
the uniqueness lies in their intersections.
Data mining is focused on better understanding characteristics and patterns among variables
in large databases using a variety of statistical and analytical tools. Data Pre-processing,
Exploratory Data Analysis, Data Selection, and Knowledge Discovery.
Simulation and risk analysis relies on spread sheet models and statistical analysis to examine
the impacts of uncertainty in the estimates and their potential interaction with one another on
the output variable of interest.
Spreadsheets and formal models allow one to manipulate data to perform what-if analysis—
how specific combinations of inputs that reflect key assumptions will affect model outputs.
What-if analysis is also used to assess the sensitivity of optimization models to changes in
data inputs and provide better insight for making good decisions.
Visualization. Visualizing data and results of analyses provide a way of easily
communicating data at all levels of a business and can reveal surprising patterns and
relationships. Software such as IBM’s Cognos system exploits data visualization for query and
reporting, data analysis, dashboard presentations, and scorecards linking strategy to
operations.
o The Cincinnati Zoo, for example, has used this on an iPad to display hourly, daily,
and monthly reports of attendance, food and retail location revenues and sales, and
other metrics for prediction and marketing strategies.
o UPS uses telematics to capture vehicle data and display them to help make decisions
to improve efficiency and performance.
o A tag cloud is a visualization of text that shows words that appear more frequently
using larger fonts.
1.3. Impacts and Challenges
The impact of applying business analytics can be significant. Companies report
1. Reduced costs,
2. Better risk management,
3. Faster decisions,
4. Better productivity, and
5. Enhanced bottom- line performance such as profitability and customer satisfaction.
For example, 1-800-flowers.com uses analytic software to target print and online promotions with
greater accuracy; change prices and offerings on its Web site (sometimes hourly); and optimize its
marketing, shipping, distribution, and manufacturing operations, resulting in a $50 million cost savings in
one year.
IBM suggests that traditional management approaches are evolving in today’s analytics-driven
environment to include more fact-based decisions as opposed to judgment and intuition, more prediction
rather than reactive decisions, and the use of analytics by everyone at the point where decisions are made
rather than relying on skilled experts in a consulting group.
Nevertheless, organizations face many challenges in developing analytics capabilities, including lack
of understanding of how to use analytics, competing business priorities, insufficient analytical skills,
difficulty in getting good data and sharing information, and not understanding the benefits versus perceived
costs of analytics studies. Successful application of analytics requires more than just knowing the tools; it
requires a high-level understanding of how analytics supports an organization’s competitive strategy and
effective execution that crosses multiple disciplines and managerial levels.
A. Discovery Analytics:
Data discovery refers to the process of exploring and analyzing data to uncover patterns, identify
relationships, and gain insights that improve decision making and business performance. It
involves combining and transforming data from various sources, examining data structures,
and applying visualization techniques to understand and extract valuable information.
Utilizes data mining and machine learning algorithms to identify hidden relationships and patterns
in large datasets, often without a predetermined hypothesis. Aims to discover new insights and
opportunities that might not be readily apparent
B. Descriptive analytics.
Most businesses start with descriptive analytics—the use of data to understand past and current
business performance and make informed decisions.
Uses basic statistical analysis to present summaries of historical data like sales figures, customer
demographics, or website traffic.
Answers questions like "How many products were sold last month?" or "What is the average
customer age?"
These techniques categorize, characterize, consolidate, and classify data to convert it into useful
information for the purposes of understanding and analyzing business performance.
Descriptive analytics summarizes data into meaningful charts and reports, for example, about
budgets, sales, revenues, or cost.
This process allows managers to obtain standard and customized reports and then drill down into
the data and make queries to understand.
Descriptive analytics also helps companies to classify customers into different segments, which
enables them to develop specific marketing campaigns and advertising strategies.
C. Diagnostic Analytics:
Explores the underlying causes behind trends or patterns identified in descriptive analytics.
Uses techniques like drill-down analysis to identify specific factors contributing to a problem.
D. Predictive analytics.
Predictive analytics seeks to predict the future by examining historical data, detecting patterns or
relationships in these data, and then extrapolating these relationships forward in time.
Leverages statistical models and machine learning to predict future outcomes based on historical
data.
Examples include forecasting customer churn, predicting sales trends, or identifying potential risks
A marketer might wish to predict the response of different customer segments to an advertising
campaign, a commodities trader might wish to predict short-term movements in commodities
prices, or a skiwear manufacturer might want to predict next season’s demand for skiwear of a
specific colour and size.
Predictive analytics can predict risk and find relationships in data not readily apparent with
traditional analyses. Using advanced techniques, predictive analytics can help to detect hidden
patterns in large quantities of data to segment and group data into coherent sets to predict
behaviour and detect trends.
For instance, a bank manager might want to identify the most profitable customers or predict the
chances that a loan applicant will default, or alert a credit-card customer to a potential fraudulent
charge. Predictive analytics helps to answer questions such as “What will happen if demand falls
by 10% or if supplier prices go up 5%?” “What do we expect to pay for fuel over the next several
months?” “What is the risk of losing money in a new business venture?”
E. Prescriptive analytics.
Goes beyond prediction by suggesting the optimal course of action to achieve a desired outcome.
Often incorporates optimization algorithms to identify the best possible decision based on
predicted scenarios.
Answers questions like "What marketing campaign should be launched to maximize sales?"
Many problems, such as aircraft or employee scheduling and supply chain design, for example,
simply involve too many choices or alternatives for a human decision maker to effectively consider.
Prescriptive analytics uses optimization to identify the best alternatives to minimize or maximize
some objective. Prescriptive analytics is used in many areas of business, including operations,
marketing, and finance. For example, we may determine the best pricing and advertising strategy to
maximize revenue, the optimal amount of cash to store in ATMs, or the best mix of investments in a
retirement portfolio to manage risk. The mathematical and statistical techniques of predictive
analytics can also be combined with optimization to make decisions that take into account the
uncertainty in the data. Prescriptive analytics addresses questions such as “How much should we
produce to maximize profit?” “What is the best way of shipping goods from our factories to
minimize costs?” “Should we change our plans if a natural disaster closes a supplier’s factory: if so,
by how much?”
Data visualization is the process of displaying data (often in large quantities) in a meaningful fashion to
provide insights that will support better decisions.
Researchers have observed that data visualization improves decision-making, provides managers with
better analysis capabilities that reduce reliance on IT professionals, and improves collaboration and
information sharing.
The shapes of data" refers to the visual pattern a dataset forms when plotted on a graph.
When a data set is graphed, each point is arranged to produce one of dozens of different shapes. The distribution
shape can give you a visual which helps to show how the data is:
1. Number of peaks
The peaks are usually called modes; The mode tells you that the data count is higher in these areas than in any
other areas on the graph.
2. Symmetry
A symmetric graph has two sides that are mirror images of each other. The normal distribution is one example
of a symmetric graph.
Another type of symmetric graph is the U-distribution, which—perhaps not surprisingly— looks like the letter
“U”.
A symmetric box plot has the “box” in the center of the graph:
A symmetric box plot.
3. Skewness
Shapes of distributions can differ in skewness; these distributions are not symmetrical distributions. Instead,
they have more points plotted on one side of the mean than on the other. This causes long tails either in the
negative direction on the number line (a negative, or left skew) or in the positive direction on the number
line (a positive, or right skew). For more on how skewness affects shapes of distributions, see: Skewed
Distribution in Statistics.
A Left-skewed, negative
distribution with a long tail in the negative direction of the number line.
The tails of a distribution (i.e. how thin or fat they are) can also be described by kurtosis, which is measured
against the standard normal distribution. A positive value for kurtosis means you have a large peak and little
data in the tails. A negative value means you have a flattened peak with lots of data in the tails.
The shape of distribution helps us understand the spread and behavior of a given distribution. With visual
representations such as the distribution’s shapes, we can easily represent important data components and help others
understand how our data behave visually.
The shape of distribution provides helpful insights about the distribution. This includes the distribution’s
peaks, symmetry, uniformity, as well as its tendency to lean towards the left or right corner.
Thanks to the shape of the distribution, identifying the descriptive statistics of the distribution will be much easier.
This also means that the distribution’s shape will come in handy when reporting and observing distributions.
In this article, we’ll show you the fundamental features of a distribution’s curve and how to use these factors to
describe the shape of a given distribution.
The shape of the distribution is a helpful feature that easily reflects the frequency of values within given
intervals. When given a distribution and its shape, here are other helpful details we can learn about a data set from
the shape of its distribution:
Helps identify which range the mean of the data set lies
As we have learned in the past, we can visualize distributions such as the frequency or probability
distribution using histograms. The shape formed by the histogram represents the shape of the distribution.
Here’s an example of a distribution and its shape. By inspecting its shape, we’ll have an idea of the peaks of the
data set. The distribution’s shape also allows us to identify whether the distribution is skewed or symmetric, unimodal
or bimodal, and more.
The shape of the distribution will depend on many factors, so let’s break down these factors and understand what
they represent.
There are different factors that affect the shape of a distribution as discussed in the previous section. These factors
also help us identify key measures of the distribution.
When the shape shows three or more peaks, the distribution is multimodal.
2. As with a function’s curve, distributions and their shapes may or may not exhibit symmetry.
When the distribution’s shape is folded and the left and right folds are each other’s mirror images, the
distribution is symmetrical.
When the shape of the distribution return folds that are not mirror images, the distribution is asymmetrical.
3. When the shape of the distribution is asymmetric, we can also see whether the distribution is positively or
negatively skewed.
When the shape of the distribution is leaning towards the right corner, the distribution is positively skewed.
Meanwhile, when the shape of the distribution is leaning towards the left corner, the distribution is negatively
skewed.
These are the properties needed for us to describe the shape of a given distribution. By being aware of these
factors, we also immediately know the important components and behavior of the distribution. In the next section,
we’ll explore different distributions and shapes to help you master the process of describing the shape of a
distribution.
Describe the shape of distribution by using the different factors affecting its shape: its peaks, symmetry,
skewness, and at times, uniformity.
Observe the curve’s shape – this represents the shape of the distribution.
Use the features we’ve discussed to thoroughly describe the shape of a distribution.
After determining whether the shape or curve has one or more peaks, study the curve’s symmetry or lack thereof.
When the distribution, such as the normal distribution, is symmetric, its mean, mode, and median will have the
same values.
When the curve is negatively skewed, we expect that the mode has the largest value followed by the
median and then the mean. Similarly, when the shape of the distribution is positively skewed, the mean has the
highest value followed by the median and then the mode.
Suppose that we have the data of the test results from an online quiz of a virtual math class. The histogram of the
frequency distribution is as shown below.
By observing the chart alone, we can see that the histogram is symmetric. This means that when we fold this
chart, its left half will be the mirror image of its right. As we expect from a symmetric distribution, the chart only has
one peak and consequently, one mode.
The peak occurs at 44. Since the distribution is symmetric, we also expect the mean and the median to occur at
the peak. This means that the average score of the students from the virtual math class is 44.
When the line of symmetry lies on the peak of the distribution, we can also call the curve a bell-shaped curve.
When it’s the reverse, where the line of symmetry lies at its minimum, we call the distribution a U-shaped curve.
Suppose that we have the test results represented by the distribution shown above. From inspection, we can see
that the distribution is also symmetric. However, the line of symmetry is at the test score, 44, with the lowest peak.
Taking a look at its peaks, we can see that the mode occurs twice: when the test score is 38 and when the test
score is 50. This means that the distribution is bimodal.
Let’s now take a look at the third distribution – a histogram that’s heavily skewed to the right. As we have
expected, the distribution’s peak (or its mode) will lie within the lower end of the range. When the distribution
is positively skewed, we also expect that the mode has the least value among the three central measures.
Last but not the least, what if we’re given a distribution such as the one shown above?
We can see that the distribution is skewed to the left where the peak lies at the higher end. As we have learned of
the negatively skewed distribution, the mode will have the highest value.
These are just four examples of different distributions with different shapes. Don’t worry, we‘ve prepared more
practice questions for you to work on. When you’re ready, head on over to the section below!
Example 1
Harry runs a convenience store with his partner. On Monday, he did a quick survey to understand his customers’
coffee size preferences. The convenience store currently offers four sizes: Small ($1.00), Medium ($1.20), Large
($1.40), and XL ($1.60). After one whole day of asking their customers who ordered coffee, Harry tallied the chart
shown below.
Coffee Size Number of Customers
Small ($1.00) 24
Medium ($1.20) 12
Large ($1.40) 12
XL ($1.60) 24
What is the shape of the distribution that represents the chart shown above?
Solution
Sketching the data’s distribution, we’ll see that the histogram is symmetric with its lowest value found at the line
of symmetry.
This means that we’re looking at a U-shaped curve. Aside from the distribution being symmetric, there is the
same number of customers who ordered coffee in small and extra-large cups. From this, we can see that the
distribution is also bimodal.
There is a very large number of potential shapes of data sets, but there are only a handful of data distributions that are
commonly found. An example of each common shape of data is drawn below.
In each instance, the x-axis can represent something different (the number of coin tosses that result in heads, or length,
or time, etc.) depending on what is measured in the experiment. The y-axis indicates how likely each data value is in
the data set.
Amazingly, there are only about 14 shapes of data that are commonly found. The shape of a data set is a graphical way
of understanding the distribution in a data set, which describes how frequently each value occurs within a data set.
1. The Bernoulli distribution has exactly two outcomes: for example, heads or tails, success or failure, 0's and
1's. Consider flipping a coin and either getting 0 heads (i.e., tails) or 1 head. These two outcomes are equally
likely, as illustrated in the diagram with two lines of equal height.
The Bernoulli distribution could represent outcomes that aren’t equally likely, like the result of an
unfair coin toss. Then, the probability of heads is not 0.5, but some other value p, and the probability
of tails is 1-p. (For example, it could be 0.4 probability of getting heads and 0.6 probability of getting
tails.)
2. A uniform distribution has many equally-likely outcomes, characterized by its flat shape. For example,
imagine rolling a standard 6 sided die, such that the outcomes 1 to 6 are equally likely.
3. A binomial distribution describes the sum of multiple Bernoulli trials. For example, if you flipped a coin 10
times, how often would you expect to get 5 heads? or 4 heads? A binomial distribution describes how many
successes occur when n trials are conducted, where each trial can be a success or failure (e.g., heads or tails if
flipping a coin n times). In a binomial distribution, each trial is independent (i.e., plotting how many times a
coin comes up heads when it is flipped 20 times takes on a binomial distribution because each flip, or trial, is
independent and has the same probability of success regardless of how the other trials turned out.)
4. The Poisson distribution describes situations where random events occur at a certain rate over a period of
time. The Poisson distribution is used for determining the probability for a number of events occurring in a
fixed interval of time given a process in which events occur continuously and independently and at a constant
average rate.
5. Whereas the binomial distribution plots how many successes occur in n trials, the geometric
distribution describes how many failures occur before a success (e.g., how many tails do you get before a
heads).
6. While the geometric distribution describes how many failures occur before 1 success, the negative binomial
distribution is a generalization that plots the number of failures until r successes have occurred, not just 1.
7. The exponential distribution is the probability distribution of the time between events in a process in which
events occur continuously and independently at a constant average rate.
8. The Weibull distribution is a generalization of the exponential distribution. Whereas the exponential
distribution describes the time between events that are constantly occurring, the Weibull distribution can
model increasing (or decreasing) rates of failure over time. The exponential is simply a special case of the
Weibull.
9. The symmetric, bell-shaped curve of a normal distribution is one of the most important distributions in
statistics because it arises naturally in so many applications. Large sums of random variables often turn out to
be normally distributed. In a normal distribution, the mean, median, and mode are all equal, and the curve is
symmetric around this center value. Exactly half of the values are to the left of center and exactly half of the
values are to the right.
10. The Student’s t distribution has fatter tails than those of the normal distribution (as described in Why
visualize data?).
11. The chi-squared distribution is the distribution of the sum of squares of normally-distributed values.
12. Like the exponential distribution, the gamma distribution is used to model waiting times.
Idioms are "a distinct approach to creating and manipulating visual representations” (Munzner 2014). In other words
they are the form or type of visualization that you are choosing to represent the data. Common idioms include bar
graphs, pie charts etc.
Data visualization presents raw data in graphical visualization formats that allow users to answer questions and
discover insights. As such, there are many different ways to visually present the data. An idiom is the specific way to
visually create and manipulate data [5]. Common idioms include bar graphs, pie charts, scatter plots, bubble charts,
and heat maps. Each idiom has its own strengths and weaknesses; deciding which idiom to use to represent a dataset
depends on the research question and the type of data present. Common univariate visualization idioms include the bar
chart and the pie chart. To visualize bivariate data, graphs are commonly used to provide information on the
relationship between the two variables. Scatter plots are one of the most popular options for graphically representing
bivariate data. There are many idiom options for visualizing multivariate data, including radar charts and tree maps.
Additionally, some multivariate data may also be tightly coupled with geospatial regions. Common idioms used to
visualize these kinds of data include spiral theme plots, cartograms, heat maps and ring maps.
Borner and Polley identified 5 major categories of data in Visual Insights. Of course this isn't exhaustive and your data
might side outside of this.
1. Temporal Data - This type of data answers the "when" question and highlights the temporal distribution of
datasets; to identify growth rates, latency to peak times, or decay rates; to see patterns in time-series data, such
as trends, seasonality, or bursts.
2. Geospatial Data - This type of data answers the "where" question and uses location information to identify
position or movement over geographic space.
3. Topical Data - This type of data is textual, linguistic or semantic data, often used in the humanities and social
sciences. This data answers the "what" question.
4. Tree Data - Answers the question "with whom." Tree datasets, such as directory structures, organizational
hierarchies, branch- ing processes, genealogies, or classification hierarchies are commonly organized and
displayed using tree visualizations: for example, tree views, treemaps, or tree graphs
5. Network Data - Answers the question "with whom" as well. This type of data aims to increase our
understanding of natural and manmade networks. This can look like social network analysis, bibliometrics,
mapping in physics etc.
Data Storytelling
Data storytelling is the concept of building a compelling narrative based on complex data and analytics that help tell
your story and influence and inform a particular audience.
Data storytelling is very similar to human storytelling but provides the added benefits of deeper insights and
supporting evidence through graphs and charts. Through data storytelling, complicated information is simplified so
that your audience can engage with your content and make critical decisions quicker and more confidently.
Constructing a data story that moves a person to take action can be a very powerful tool. Effective data storytelling
can have a positive impact on people and your organization. Some benefits of successful data storytelling include:
Interpreting complex information and highlighting essential key points for the audience.
Through a structured approach, data storytelling and data visualization work together to communicate your insights
through three essential elements: narrative, visuals, and data. As you create your data story, it is important to combine
the following three elements to write a well-rounded anecdote of your theory and the resulting actions you’d like to
see from users.
Data storytelling is the art of presenting data with a contextual narrative. There are a few different ways to present
your data story. A data dashboard presents all available data so you’re able to create your narrative. Below are a few
examples of eye-catching data storytelling.
A dashboard presents all your information front and center. While your dashboard might provide some context, you
will need to build your narrative and connect the dots. Simplicity works best. Just providing an intro sentence with a
data-driven graphic is often the quickest way to tell a short data story.
Source: Microsoft Power BI
Another data storytelling example of connecting two or more data visualizations—a call center analysis that shows
customer satisfaction based on subject, percentage of satisfied and unsatisfied customers, total number of satisfied and
unsatisfied customers, and other smaller stories that together tell a larger story.
There’s a possibility that data storytelling has the potential to make a huge shift in changing the face of how we
consume data and analytics. Data storytelling adds a human touch to the sometimes-indecipherable numbers and
figures raw data presents to us. Building a narrative is a major component of the process, but creating a strong story is
dependent on your being able to understand and translate that information from an unbiased point of view. Microsoft
Power BI can help you tell that story.
Data storytelling is the practice of crafting compelling narratives to effectively convey data-driven insights to
stakeholders. Its objective is to boil down complex information into only its most essential elements so that it is easily
understood and grasped by others through a compelling, engaging narrative.
Conceptually, data storytelling is similar to storytelling in general: a narrative unfolds as the natural consequence of a
series of events. The difference is that in data storytelling, those events are data points (rather than characters or plot
points) that, taken together, start to tell their story.
A social media marketer illustrates a particular post's positive impact on engagement by showcasing it beside
others that performed less well. They organize their data so that the reason for its strong performance is clear
to stakeholders.
A public health agency releases a report detailing the personal experiences of individuals impacted by a
disease alongside statistics about infection and hospitalization rates and demographic breakdowns.
Data storytelling is important because it helps communicate data insights in a way that others can understand and
encourages them to take meaningful action. Storytelling has been shown to activate certain areas of the brain that
assist with developing long-term memories, making it more likely people will retain the information presented through
data storytelling.
In addition to making the information easier to remember, data storytelling allows you to present findings in a
digestible way. Having people fully understand what the data suggests means they can feel more confident when
making data-based decisions and coming to these conclusions sooner.
3 key elements of data storytelling
Effective data storytelling primarily involves three key areas: data, visualizations, and narrative. The following offers
a closer look at those elements to help you properly utilize all three to assemble a quality story with your data.
1. Data
You'll first need to grasp data and its information before constructing a data story. The insights the data
analysis provides ultimately work as the basis of your story and give you something to center your narrative around.
Before performing your analysis, you will have to sift through the data set to identify the most relevant insights. This
makes it crucial to have robust data literacy and the ability to go in and analyze the data.
2. Visualizations
Data visualizations not only help make your story more interesting to your audience, but they’re also useful tools for
helping to further explain and uncover data insights.
Selecting a visualization type that correctly represents the data is essential. You should consider factors such as who
you’ll be presenting the data to, the question your data answers, how much data you’re working with, and the type of
data used in the analysis. For example, you could use columns to represent quantitative data or implement maps when
you develop a visual narrative surrounding geographical data.
3. Narrative
The narrative ties everything together in an impactful way. Before developing your story, consider who your audience
is so that you can convey your message in a manner that will interest them.
A good narrative should have several fundamental elements. Within your story, identify a “hero.” In this context, your
hero could be the individual or team helping to work towards an established goal, such as improved customer
retention metrics. The narrative should also have a beginning, middle, and end that’s easy to follow, rather than
jumping back and forth between timelines. Lastly, construct a narrative that the listeners can relate to to make a real
impact on your audience. Doing so will get them more invested and help them get more out of the information.
Implementing data storytelling has plenty of benefits. Not only will the data and its insights be easier for the
audience to understand and remember, but you can also get them more engaged and ready to take action. It also
encourages data-driven decision-making and helps increase data literacy throughout your organization.
However, this process has some challenges as well. For example, it’s critical that you implement the proper data
visualization method, or you risk incorrectly conveying information. Additionally, while you can view this as a
positive if you’re looking to enter the industry, another challenge associated with data storytelling is the lack of
professionals with adequate data skills. By developing your data storytelling abilities, you can become a desired
commodity by employers.
https://www.writingbeginner.com/data-storytelling/
https://www.geeksforgeeks.org/storytelling-in-data-science/
Data Visualization Charts is a method of presenting data in a visual way. In this guide we'll explore about the different
types of data visualization charts in very detailed manner
These are the charts you'll face when starting with data visualization. They are simple to create easy to understand and
help you start making sense of your data right away. we use Python libraries like Matplotlib and Seaborn to create
these type of charts.
1. Bar Charts
Bar charts are one of the common visualization tool used to compare facts by showing square bars. A bar chart has X
and Y Axis where the X Axis represents the types and the Y axis represents the price. There are various types of Bar
charts like horizontal bar chart, Stacked bar chart, Grouped bar chart and Diverging bar Chart.
Comparing Categories: Used to show differences between categories and understand relationships in the
data.
Ranking: When we got records with categories need to be arranged from highest to lowest we use bar charts.
Relationship between categories: When you have a dataset with multiple specific variables it can help to
display relationship between them to discover patterns and tendencies.
2. Line Charts
Line chart is a type of graph that displays information over time. It uses markers to represent data points and these
dots are connected by lines to show how the values change over a period. This makes easy to see trends such as
whether something is increasing, decreasing, or staying the same.
Line charts also are used in comparing trends among more than one facts series.
3. Pie Charts
A pie chart is a circular visualization divided into slices to show numerical percentages of a whole. Each slice
represents a category and its size is proportional to the share it represents. They are only valid with small variety of
categories. Simple Pie chart and Exploded Pie charts are distinctive varieties of Pie charts.
A scatter chart is a tool that uses dots to represent data points showing the relationship between two numerical
variables. The X-axis represents the independent variable and the Y-axis represents the dependent variable. Type of
scatter chart consists of simple scatter chart, scatter chart with trendline and scatter chart with coloration coding. They
are used for identifying outliers or unusual remark for your facts.
5. Histogram
A histogram shows the distribution of numerical data by dividing it into intervals (bins) and displaying the frequency
of data points as bars. It helps visualize patterns like skewness, central tendency, and variability.
Distribution Visualization: Histograms are best for visualizing the distribution of numerical information
allow customers to recognize shape of the records.
Data Exploration: It provides records exploration by using revealing patterns, trends, and outliers inside
datasets.
After learning basic charts Now let's move toward advanced charts It allow you to dive deeper into your data help to
find detailed insights show multiple variables and find hidden patterns or relationships.
1. Heatmap
A heatmap visualizes statistics in a matrix layout the usage of colors to symbolize the values of person cells. It is good
for figuring out patterns, correlation and variations within big datasets. Heatmaps are usually utilized in fields like in
finance for portfolio analysis , in biology for gene expression analysis, and in advertising for customer segmentation.
Identify Clusters: Heatmaps help us identify clusters or groups within datasets make it easier to segment the
data.
Correlation Analysis: They are useful for visualizing correlations between variables to discover relationships
and traits.
Risk Assessment: They are useful for assessing risk like identifying high-risk areas in financial portfolios or
spotting unusual patterns in network traffic.
Area Chart
An area chart displays data trends over time by filling the area beneath lines. It’s similar to a line chart used for
displaying time-series data, where data points are measured over a specific period.
Tracking Trends: It shows how something changes over time like stock prices or temperatures.
Comparative Analysis: They allow to compare multiple categories or variable at a time.
Highlighting Patterns: Area charts useful for plotting spotting pattern such as seasonality or cyclical
tendencies in time-collection facts.
A box plot summarizes the distribution of numerical data show quartiles, outliers, and the median. It helps to identify
variability, skewness, and outliers in datasets and is commonly used in statistical analysis ,quality control and data
exploration.
Identify Outliers: Box plots is used to identify outliers in datasets for data cleaning and anomaly detection.
Visualize Spread: They visualize the spread and variability of information providing insights.
4. Bubble Chart
A bubble chart represents records points as bubbles in which the dimensions and color of every bubble deliver
additional facts. It is powerful for visualizing three-dimensional facts and comparing more than one variables
simultaneously. They are commonly used in finance for portfolio evaluation, in marketing for market segmentation,
and in biology for gene expression evaluation.
Multivariate Analysis: Bubble charts allow you to compare three or more variables in a single visualization.
Size and Color Encoding: They use size and coloration to deliver extra information such as fee or class.
Relationship Visualization: Bubble charts help visualize relationships between variables and make easier to
find pattern
5. Tree Map
A tree map displays hierarchical data using nested rectangles where each rectangle's size represents a quantitative
value. It is useful for visualizing hierarchical structures and comparing proportions within the hierarchy.
6. Parallel Coordinates
Parallel coordinates visualize multivariate statistics through representing every information point as a line connecting
values across multiple variables. They are useful for exploring relationships among variables and figuring out styles or
trends. Parallel coordinates are generally used in data evaluation, gadget learning, and sample popularity.
Multivariate Analysis: Parallel coordinates help you compare many variables at the same time to find
patterns.
Relationship Visualization: They help visualize relationships among variables such as correlations or
clusters.
Outlier Detection: They help us to find unusual data points that don't follow the common pattern
7. Choropleth Map
A choropleth map uses shade shading or styles to symbolize statistical records over geographic regions. It is generally
used to visualize variations and identify geographic patterns. Choropleth maps are broadly used in fields which
includes demography for populace density mapping, in economics for income distribution visualization, and in
epidemiology for disease prevalence mapping.
Spatial Analysis: Choropleth maps are best for spatial analysis allow the visualization of variations in
records.
Geographic Patterns: They help to become aware of geographic styles which include clusters or gradients in
datasets used in fashion analysis and decision-making.
Comparison Across Regions: It allow for clean evaluation of information values throughout geographic
regions and provide local evaluation.
8. Sankey Diagram
A Sankey diagram is a type of flow chart that shows how data or resources move between different points (called
nodes) using arrows. The width of each arrow shows how much flow there is so thicker arrows represent more flow.
They are helpful for understanding complex systems and finding patterns in data used in areas like energy flow
analysis, supply chain management and web analytics.
Bottleneck Identification: They help to spot bottlenecks or areas where the flow of resources slows down or
becomes inefficient.
Comparative Analysis: They are useful for comparing how flows change over time or in different scenario
and help in evaluating performance and finding opportunities to improve efficiency.
A radar chart shows multivariate information on a two-dimensional aircraft with a couple of axes emanating from a
primary point. It is beneficial for comparing a couple of variables across distinct categories and identifying strengths
and weaknesses. Radar charts are usually utilized in sports for overall performance analysis and in selection-making
for multi-criteria decision evaluation.
Multi-Criteria Comparison: Radar charts permit for the evaluation of more than one criteria or variables
across extraordinary classes.
Strengths and Weaknesses Analysis: They help to discover strengths and weaknesses within categories or
variables and visualizing their relative overall performance.
Pattern Recognition: Radar charts useful resource in pattern recognition ,highlighting similarities or
variations between classes..
A network graph represents relationships between entities as nodes and edges. It is useful for visualizing complicated
networks consisting of social networks, transportation networks, and organic networks. Network graphs are typically
utilized in social network analysis for community detection and in biology for gene interaction analysis.
Relationship Visualization: Network graphs visualize relationships among entities which includes
connections or interactions and make them valuable for network analysis
Community Detection: They help to discover communities or clusters within networks by using visualizing
node connections and densities.
Path Analysis: It help in route analysis by showing the shortest paths or routes between points makes it easier
to optimize routes and plan efficiently.
In a donut chart the outer ring represents 100% and each slice represents a category. The size of each slice shows how
much each category contributes to the whole.
It is useful for showing how different categories contribute to a total. For example showing market share or
sales breakdowns.
Donut charts are great for showing progress towards a goal like a percentage of a target achieved
A Gauge chart used to display the progress of a single value, like a key performance indicator (KPI), toward a goal. It
looks like a speedometer with a circular arc showing how close the value is to the target. There two different kinds of
Gauge charts specifically Circular Gauge or Radial Gauge which resembles a speedometer and Linear Gauge.
It is useful in monitoring metrics like income or consumer satisfaction towards benchmark signs set.
This Can be utilized in project control to music the fame of project progress against assignment timeline.
A sunburst chart presents hierarchical records using nested rings in which each ring represents a degree within the
hierarchy. It is beneficial for visualizing hierarchical structures with more than one tiers of aggregation. They allow
customers to explore relationships and proportions inside complicated datasets in an interactive way.
To Communicate with complex records structures and dependencies in a visually attractive layout.
A hexbin plot represents the distribution of dimensional facts by using binning records points into hexagonal cells and
coloring each cellular based totally on the range of factors it contains. It is effective for visualizing density in scatter
plots with a huge wide variety of information points. It provide insights into spatial patterns and concentrations within
datasets.
Handling massive datasets with overlapping data factors in a clear and informative way.
A violin plot combines a box plot with a kernel density plot to show the distribution of statistics together with its
summary statistics. It is useful for comparing the distribution of more than one organizations or categories. It provide
insights into the shape, unfold, and important tendency of statistics distributions.
Visualizing the shape and spread of information distributions, including skewness and multimodality.
To Present precise information and outliers within information distributions in a visually appealing layout.
Data visualization charts for textual and symbolic data represent information made up of words, symbols, or other
non-numeric forms. These charts are helpful for displaying data that isn't numbers but still needs to be visualized.
There are mainly of two types let's understand them:
1. Word Cloud
A word cloud is a visual representation of textual content records in which phrases are sized based totally on their
frequency or significance inside the textual content. Common words seem larger and greater outstanding at the same
time as less common phrases are smaller. Word clouds provide a short and intuitive manner to identify distinguished
phrases or issues within a frame of textual content.
2. Pictogram Chart
A pictogram chart makes use of icons or symbols to represent information values wherein the size or amount of icons
corresponds to the value they represent. It is an powerful way to deliver information in a visually appealing way
mainly when coping with categorical or qualitative records.
Emphasizing key statistics points or tendencies the usage of without difficulty recognizable symbols or icons.
Temporal and trend charts are used to show patterns and changes over time especially for time-series data where
each data point is linked to a specific time. Let's understand them one by one:
1. Streamgraph
A streamgraph shows how the composition of a dataset changes over time using stacked areas along a baseline. It's
great for visualizing trends and changes in data distribution over the years.
2. Bullet Graph
A bullet graph is a variant of a bar chart but it includes markers and reference lines to show progress toward a goal. It's
useful for tracking performance against a target.
3. Gantt Chart
A Gantt chart shows project tasks as horizontal bars along a time axis. It is beneficial for planning, scheduling, and
monitoring progress in venture control. Gantt charts offer a visual evaluation of venture timelines, dependencies, and
aid allocation.
When to use Gantt Chart:
Do Planning and scheduling complicated tasks with multiple duties and dependencies.
To Track progress and managing resources at some stage in the mission lifecycle.
4. Waterfall Chart
A waterfall chart visualizes the cumulative impact of sequential high-quality and negative values on an starting point.
It is generally utilized in financial analysis to show adjustments in net price over time. They provide a clean visual
representation of the way individual factors make contributions to the general alternate in a dataset.
To Analyze and visualize modifications in economic performance or budget allocations through the years.
For Identify the sources of gains or losses within a dataset and their cumulative impact.
In this article we learned how different charts are used to show data in a simple way. Basic charts like bar charts and
line charts help compare things while advanced charts like heatmaps and box plots show deeper details and Charts like
word clouds and streamgraphs help display text data or data over time.
https://www.safegraph.com/guides/visualizing-geospatial-data
https://humansofdata.atlan.com/2016/10/7-techniques-to-visualize-geospatial-data/
Data visualisation Overview of Data Visualization, The Shapes of Data, Common Visualization Idioms, Visualization
of Spatial Data, Data Storytelling, Visualization of Non-Numerical Data, Using Colour and Size in Visualization,
Visualization of Numerical Data.