Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
45 views55 pages

Fallsem2023-24 Cse3020 Eth5

Uploaded by

lol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views55 pages

Fallsem2023-24 Cse3020 Eth5

Uploaded by

lol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Visualizing Amounts

Bar Charts
Charts
Marks
Points- Cartesian Coordinates

• The most widely used coordinate system for data visualization


is the 2D Cartesian coordinate system, where each location is
uniquely specified by an x and a y value. The x and y axes run
orthogonally to each other, and data values are placed in an
even spacing along both axes (Figure 3-1). The two axes are
continuous position scales, and they can represent both positive
and negative real numbers. To fully specify the coordinate
system, we need to specify the range of numbers each axis
covers. In Figure 3-1, the x axis runs from –2.2 to 3.2 and
the y axis runs from –2.2 to 2.2. Any data values between these
axis limits are placed at the appropriate respective location in
the plot. Any data values outside the axis limits are discarded.
Cartesian Coordinates

• Data values usually aren’t just numbers, however. They


come with units. For example, if we’re measuring
temperature, the values may be measured in degrees
Celsius or Fahrenheit. Similarly, if we’re measuring
distance, the values may be measured in kilometers or
miles, and if we’re measuring duration, the values may be
measured in minutes, hours, or days. In a Cartesian
coordinate system, the spacing between grid lines along an
axis corresponds to discrete steps in these data units. In a
temperature scale, for example, we may have a grid line
every 10 degrees Fahrenheit, and in a distance scale, we
may have a grid line every 5 kilometers.
Using Color Scales
Color as a Tool to Highlight

• As an example of how the same data can support differing


stories with different coloring approaches, I have created a
variant of Figure 4-2 where now I highlight two specific
states, Texas and Louisiana (Figure 4-8). Both states are in
the South, they are immediate neighbors, and yet one state
(Texas) was the fifth fastest growing state within the US
from 2000 to 2010 whereas the other was the third slowest
growing.
• Color can also be an effective tool to highlight specific
elements in the data. There may be specific categories or
values in the dataset that carry key information about the
story we want to tell, and we can strengthen the story by
emphasizing the relevant figure elements to the reader. An
easy way to achieve this emphasis is to color these figure
elements in a color or set of colors that vividly stand out
against the rest of the figure. This effect can be achieved
with accent color scales, which are color scales that contain
both a set of subdued colors and a matching set of stronger,
darker, and/or more saturated colors (Figure 4-7).
• In many scenarios, we are interested in the magnitude of some set of
numbers.
• For example, we might want to visualize the total sales volume of
different brands of cars, or the total number of people living in
different cities, or the age of Olympians performing different sports.
• In all these cases, we have a set of categories (e.g., brands of cars,
cities, or sports) and a quantitative value for each category.
• I refer to these cases as visualizing amounts, because the main
emphasis in these visualizations will be on the magnitude of the
quantitative values.
• The standard visualization in this scenario is the bar plot, which has
several variations, including simple bars as well as grouped and
stacked bars.
• Alternatives to the bar plot are the dot plot and the heatmap.
Bar Charts
• A bar chart (aka bar graph, column chart) plots numeric values for levels of a categorical feature
as bars. Levels are plotted on one chart axis, and values are plotted on the other axis. Each
categorical value claims one bar, and the length of each bar corresponds to the bar’s value. Bars
are plotted on a common baseline to allow for easy comparison of values.
• The primary variable of a bar chart is its categorical variable. A categorical variable takes discrete
values, which can be thought of as labels.
• Examples include state or country, industry type, website access method (desktop, mobile), and
visitor type (free, basic, premium).
• Some categorical variables have ordered values, like dividing objects by size (small, medium,
large).
• In addition, some non-categorical variables can be converted into groups, like aggregating
temporal data based on date (eg. dividing by quarter into 20XX-Q1, 20XX-Q2, 20XX-Q3, 20XX-
Q4, etc.)
• The important point for this primary variable is that the groups are distinct.
Bar Charts
• In contrast, the secondary variable will be numeric in nature.
• The secondary variable’s values determine the length of each
bar.
• These values can come from a great variety of sources. In its
simplest form, the values may be a simple frequency count or
proportion for how much of the data is divided into each
category – not an actual data feature at all.
• For example, the following plot counts pageviews over a period
of six months. You can see from this visualization that there
was a small peak in June and July before returning to the
previous baseline.
• Other times, the values may be an average, total, or some other
summary measure computed separately for each group.
• In the following example, the height of each bar depicts the average
transaction size by method of payment.
• Note that while the average payments are highest with checks, it
would take a different plot to show how often customers actually
use them.
Bar Plots

• To motivate the concept of a bar plot, consider the total


ticket sales for the most popular movies on a given
weekend. Table 6-1 shows the top five highest-grossing
films for the weekend before Christmas in 2017.
• Star Wars: The Last Jedi was by far the most popular movie on
that weekend, outselling the fourth- and fifth-ranked
movies, The Greatest Showman and Ferdinand, by almost a
factor of 10.
• This kind of data is commonly visualized with vertical bars.
For each movie, we draw a bar that starts at zero and
extends all the way to the dollar value for that movie’s
weekend gross (Figure 6-1). This visualization is called a bar
plot or bar chart.
• One problem we commonly encounter with vertical bars is
that the labels identifying each bar take up a lot of
horizontal space. In fact, I had to make Figure 6-1 fairly
wide and space out the bars so that I could place the movie
titles underneath. To save horizontal space, we could place
the bars closer together and rotate the labels (Figure 6-2).
However, I am not a big proponent of rotated labels. I find
the resulting plots awkward and difficult to read. And, in
my experience, whenever the labels are too long to place
horizontally, they also don’t look good rotated.
• The better solution for long labels is usually to swap
the x and y axes, so that the bars run horizontally (Figure 6-
3). After swapping the axes, we obtain a compact figure in
which all visual elements, including all text, are
horizontally oriented. As a result, the figure is much easier
to read than Figure 6-2 or even Figure 6-1.
• This example bar chart depicts the number of purchases made
on a site by different types of users. The categorical feature,
user type, is plotted on the horizontal axis, and each bar’s
height corresponds to the number of purchases made under
each user type.
Use a common zero-valued baseline
Include value annotations
• Data for grouped bar charts usually come in a tabular form like
the one above.
• The first column indicates the levels of the primary categorical
variable, while the second and subsequent columns correspond
with each level of the secondary categorical variable.
• The numeric variables in the cells indicate the height of each
bar; bars are plotted by row to generate the bar groups.
Choosing efficient colors
Stacked bar chart and grouped bar chart

• Bar charts can be extended when we introduce a second categorical


variable to divide each of the groups in the original categorical
variable.
• If the bar values depict group frequencies, the second categorical
variable can divide each bar’s count into subgroups.
• Applied to the original bars, this results in a stacked bar chart, seen
on the left in the figure below.
• Alternatively, if we move the different subgroups’ bars to the
baseline, the resulting chart type is the grouped bar chart, seen on
the right.
• We also use the grouped bar chart when we compute statistical
summary measures across levels of two categorical variables.
Grouped bar chart
• A grouped bar chart (aka clustered bar chart, multi-series bar
chart) extends the bar chart, plotting numeric values for levels
of two categorical variables instead of one. Bars are grouped by
position for levels of one categorical variable, with color
indicating the secondary category level within each group.
Grouped bar chart

• The grouped bar chart above compares new quarterly revenue for four sales
representatives across a year.
• One bar cluster is plotted for each quarter, and in each cluster, one bar for each
representative.
• Colors and positions are consistent within each cluster: for example, we can see that Kent
is always in blue and plotted first.
• We can see from the plot that Lincoln had the best performance in Q1 with Kent best in
all remaining quarters.
• We can also check individual performances such as Mersey’s relatively stable
performance across the year, or York’s major bump in Q4 after a slide from Q1 through
Q3.
Pie Charts
• If the values in a bar chart represent parts of a whole (the sum
of bar lengths totals the number of data points or 100%), then
an alternative chart type you could use is the pie chart. While
the pie chart is much-maligned, it still fills a niche when there
are few categories to plot, and the parts-to-whole division needs
to be put front and center. Still, in general you are most likely to
use a bar chart in general usage, as it’s easier to make
comparisons between categories.
Pie Charts
Histograms
• Histograms are a close cousin to bar charts that depict
frequency values. While a bar chart’s primary variable is
categorical in nature, a histogram’s primary variable is
continuous and numeric. The bars in a histogram are typically
placed right next to each other to emphasize this continuous
nature: bar charts usually have some space between bars to
emphasize the categorical nature of the primary variable.
Histograms
• A histogram is a chart that plots the distribution of a numeric
variable’s values as a series of bars. Each bar typically covers a
range of numeric values called a bin or class; a bar’s height
indicates the frequency of data points with a value within the
corresponding bin.
Histograms

• The histogram above shows a frequency distribution for time to response for tickets sent
into a fictional support system. Each bar covers one hour of time, and the height indicates
the number of tickets in each time range. We can see that the largest frequency of
responses were in the 2-3 hour range, with a longer tail to the right than to the left.
There’s also a smaller hill whose peak (mode) at 13-14 hour range. If we only looked at
numeric statistics like mean and standard deviation, we might miss the fact that there
were these two peaks that contributed to the overall statistics.
• Histograms are good for showing general distributional
features of dataset variables. You can see roughly where the
peaks of the distribution are, whether the distribution is
skewed or symmetric, and if there are any outliers.
HeatMap

You might also like