DVT UNIT-V
1. Multiple Quantities
📍 Scatterplots
Definition:
A scatterplot is a graph that shows the relationship between two numbers by using dots.
Each dot shows one observation or item.
Purpose:
Used to find out if two things are related, for example, does more sales mean more profit?
Important Principles (Ben Jones):
o Know Your Goal:
Are you checking for correlation, finding outliers, or checking impact of another
factor?
o Use Correct Visualization:
Scatterplots are only for two numbers (quantitative data).
o Clarity & Aesthetics:
Label X and Y axes properly with units.
No need to always start axes at zero if it hides useful detail, but indicate it
clearly.
Make sure dots are visible; avoid too many overlapping dots:
Make dots smaller.
Add transparency (alpha).
Use outlines.
Sample fewer points if needed.
Use jitter (small random movement).
o Using Color, Size, Shape:
Add a third variable (like Region or Discount) by changing color or size of dots.
o Annotations:
Mark special points, like biggest sale, or profit loss.
o Check the Output:
Make sure your scatterplot clearly shows the pattern or the message you want.
Reading a Scatterplot:
o Positive Correlation: Dots rise from left to right.
o Negative Correlation: Dots fall from left to right.
o No Correlation: Dots are randomly scattered.
o Tight or Loose Clusters:
Tight = Strong relation; spread out = Weak relation.
o Shape:
Straight line = Linear; Curved = Non-linear.
o Outliers:
Dots far away from others — special cases needing investigation.
Warning:
Scatterplots can only show correlation — they do not prove one thing causes the other.
Example:
o Sales vs. Profit scatterplot.
o Color dots by Product Category.
o Add a horizontal line at Profit = 0 to separate profits and losses.
📍 Stacked Bars
Definition:
A stacked bar chart is a bar chart divided into segments, each showing part of the total.
Purpose:
To compare totals and also the components inside each total.
Types:
o Standard Stacked Bar:
Bars have different lengths based on total value.
o 100% Stacked Bar:
All bars are same size, but segments show % share.
Important Principles (Ben Jones):
o Know Your Goal:
Are you comparing totals or compositions (parts)?
o Use Correct Visualization:
If comparing parts, stacked bar is good; if only totals, simple bar may be better.
o Design for Clarity:
Label each part clearly.
Choose different colors for different parts.
Always stack segments in the same order for all bars.
o Limitation:
It’s harder to compare the middle/top parts across bars (bottom part is easiest to
compare).
Example:
o Sales in different regions stacked by product category (Furniture, Office Supplies,
Technology).
📍 Regression and Trend Lines
Definition:
o A Trend Line shows the overall direction of data points.
o A Regression Line is the best fit line mathematically calculated.
Purpose:
o To find and explain trends.
o To check if the relation is positive, negative, or neutral.
Key Points:
o Slope (m): How fast Y changes when X changes.
o Intercept (c): Y value when X = 0.
o R² (R-squared): Percentage of how well the trend line fits the data (0%–100%).
o P-value: Tells if the trend is statistically important (<0.05 is good).
Principles:
o Add trend line only if it helps understanding.
o Choose right model (linear or curved).
o Always show the equation and statistics if possible.
o Trend line should not hide data points.
Example:
o Trend Line on scatterplot of Sales vs. Profit:
Positive slope → Higher sales = Higher profits (generally).
Moderate R² → Other factors also affect profit.
📍 Quadrant Chart
Definition:
A scatterplot with two crossing reference lines that divide the plot into 4 sections
(quadrants).
Purpose:
To categorize observations into groups for better strategic decisions.
Key Points:
o Choose important X and Y variables.
o Choose meaningful reference lines (mean, median, zero, or benchmark).
o Label quadrants clearly.
o Explain how reference lines were set.
Quadrant Interpretation:
o Top-Right: High on both axes → Best performers.
o Top-Left: High Y, Low X → Good profit but low sales.
o Bottom-Left: Low on both → Poor performance.
o Bottom-Right: High sales, low profit → Needs attention.
Example:
o Quantity Sold vs. Profit Ratio for Products.
o Top-Right quadrant → "Star Products."
2. Changes Over Time
📍 Line Chart
Definition:
A chart connecting data points over time using lines.
Purpose:
To show trends, patterns, seasonality, and overall movement across time.
Construction:
o X-axis: Time (continuous: Year, Quarter, Month).
o Y-axis: Measures like Sales, Profit.
Interpretation:
o Upward slope = Increasing trend.
o Downward slope = Decreasing trend.
o Peaks and valleys = Seasonal variations.
Tips:
o No need for zero baseline always.
o Annotate important events.
Example:
o Monthly Sales Trend.
📍 Dual-Axis Line Chart
Definition:
Line chart with two Y-axes showing two different measures.
Purpose:
To compare two trends together over time.
Caution:
o Scaling of axes may mislead.
o Careful labeling and explaining needed.
Example:
o Sales vs. Quantity Sold over years.
📍 Connected Scatterplot
Definition:
A scatterplot where points are connected based on time order.
Purpose:
To show both the relationship between two variables and how it changes over time.
Steps:
o Plot Sales vs. Profit.
o Connect points by Date.
Tip:
o Keep it simple with fewer points.
📍 Date Field Type and Seasonality
Understanding Date Field:
o Continuous Dates: Smooth time flow (good for trends).
o Discrete Dates: Separate categories (good for comparison).
Finding Seasonality:
o Check for repeating patterns.
o Add Moving Average to smooth fluctuations.
Example:
o Observe Sales patterns across months or years.
📍 Timeline
Definition:
A graphical representation of events or changes in order.
Purpose:
To highlight key milestones or important periods.
Steps:
o Date on X-axis.
o Sales or Profit on Y-axis.
o Add annotations for key events.
📍 Slope Graph
Definition:
A simple graph to compare two time points for multiple categories using sloped lines.
Purpose:
To show increase or decrease across time for categories.
Steps:
o Plot Year (start and end) on X-axis.
o Plot Measure (Sales, Profit) on Y-axis.
o Draw lines for each Category or Region.
o Label start and end points.
3. Maps and Locations
📍 Circle Maps
Definition:
Maps with circles sized by a measure (e.g., Sales).
Purpose:
o Compare data across locations.
Example:
o State-wise Sales shown with different sized circles.
Tips:
o Adjust circle size to avoid overlap.
o Add labels and interactive tooltips.
📍 Filled Maps
Definition:
Maps where areas (states, countries) are shaded by a value (like Sales).
Purpose:
o Spot patterns geographically.
Example:
o State-wise Sales colored (dark blue = high, light blue = low).
Tips:
o Use appropriate color ranges.
o Use maps only if geography matters.
📍 Dual-Encoded Maps
Definition:
Map using two visual encodings together: Size + Color.
Purpose:
o Show two measures at once (e.g., Sales and Profit).
Example:
o Size = Sales volume, Color = Profitability.
Tips:
o Keep encodings clear.
o Add tooltips and legends.