STATISTICS 1
DATA REPRESENTATION AND ANALYSIS
• . Types of Data
• Qualitative (categorical): Non-numerical (e.g., eye colour,
type of school).
• Quantitative (numerical): Numbers.
• Discrete: Countable (e.g., number of students).
• Continuous: Measured (e.g., height, time).
•
• Data Collection Methods
• Primary data: Collected directly (e.g., survey, experiment).
• Secondary data: Already collected (e.g., published reports,
databases).
• Sampling methods: Random, stratified, systematic, quota,
convenience.
• Good sampling avoids bias and ensures
representativeness.
•
DATA REPRESENTATION METHODS
1. TABLES (FREQUENCY DISTRIBUTION TABLES)
• Advantages:
• Easy to construct.
• Good for summarizing large sets of raw numbers.
• Shows exact frequencies.
• Disadvantages:
• Not very visual—hard to see trends quickly.
• Large tables may look overwhelming.
•
2. BAR CHARTS ( FOR DISCRETE DATA)
• Advantages:
• Clear and simple to understand.
• Easy to compare different categories.
• Disadvantages:
• Not suitable for continuous data.
• Can mislead if bar widths are inconsistent or scales
distorted.
•
3. HISTOGRAMS ( FOR CONTINUOUS
DATA)
• Advantages:
• Shows distribution shape (normal, skewed,
etc.).
• Handles large data sets well.
• Disadvantages:
• Exact values are lost (grouped).
• Choice of class intervals can affect
interpretation.
•
4. PIE CHARTS
• Advantages:
• Easy to visualize relative proportions.
• Works well for survey or categorical data.
• Disadvantages:
• Hard to compare slices with close sizes.
• Not effective with many categories.
•
5. LINE GRAPHS
• Advantages:
• Excellent for showing patterns and changes.
• Good for making predictions.
• Disadvantages:
• Not suitable for categorical data.
• Can exaggerate trends if axes aren’t scaled
carefully.
•
6. SCATTER DIAGRAMS
• Advantages:
• Makes correlation visible (positive, negative,
none).
• Useful for identifying outliers.
• Disadvantages:
• Doesn’t prove cause and effect.
• Dense plots can be hard to interpret.
•
7. STEM-AND-LEAF DIAGRAMS
• Advantages:
• Retains all raw data values.
• Quick to construct for small sets.
• Disadvantages:
• Not practical for large data sets.
• Less familiar to many audiences.
•
SUMMARY
• Discrete data (categories): Bar charts,
pie charts.
• Continuous data: Histograms, line graphs.
• Relationships: Scatter diagrams.
• Exact data values: Tables, stem-and-leaf.
•
DRAWING & INTERPRETING
STATISTICAL DIAGRAMS
• 1. Stem-and-Leaf Diagrams
• Drawing
• Split numbers into a stem (leading digits) and a leaf
(last digit).
• Write stems in a vertical column, leaves in ascending
order.
• Include a key (e.g., “2 | 5 = 25”).
•
•
INTERPRETING
• Shows raw data clearly.
• Easy to spot median, mode, range.
• Useful for small to medium data sets.
• Advantage: Retains all actual values.
Disadvantage: Becomes messy with large
data.
•
EXAMPLE
EXERCISE
2. BOX-AND-WHISKER PLOTS (BOXPLOTS)
• Drawing
• Based on the five-number summary:
• Minimum, Q1 (lower quartile), Median, Q3
(upper quartile), Maximum.
• Draw a box from Q1 to Q3, with a line at the
median.
• Extend whiskers to the min and max.
•
INTERPRETING
• Shows spread and skewness.
• Compares distributions between groups.
• Long whisker = more spread in that direction.
• Median closer to Q1 or Q3 = skew.
• Advantage: Excellent for comparing data sets.
Disadvantage: Raw data detail is lost.
•
EXERCISE
3. HISTOGRAMS
• Drawing
• For continuous data divided into intervals.
• Area (not height) of each bar is proportional to
frequency.
• If class widths differ, use frequency density:
• Frequency Density = Frequency / Class Width
•
INTERPRETING
• Reveals shape of distribution: normal,
skewed, bimodal.
• Height and width show concentration of
data.
• Advantage: Handles large continuous data
well.
Disadvantage: Exact values are lost when
grouped.
•
4. CUMULATIVE FREQUENCY GRAPHS
• Drawing
• Plot cumulative frequency against
the upper-class boundary of
each interval.
• Join points with a smooth curve.
•
INTERPRETING
• Estimate median (50th percentile), quartiles, and
percentiles.
• Shape shows spread: steep slope = many data points
in that interval.
• Advantage: Useful for finding medians, quartiles,
percentiles.
Disadvantage: Cannot recover individual data
values.
•
SUMMARY
• Stem-and-leaf: Good for raw values, small data.
• Boxplot: Compares spread & skew between sets.
• Histogram: Shows distribution shape for
continuous data.
• Cumulative frequency graph: Finds medians,
quartiles, percentiles.
•
MEASURES OF CENTRAL TENDENCY
• a) Mean
• Mean = Sum of all data values / Number of values
• Advantages: Uses all data; good for further
statistical calculations.
• Disadvantages: Affected by outliers and skewed
data.
•
• b) Median
• Middle value when data is ordered.
• For even number of values: average of two
middle values.
• Advantages: Not affected by extreme values.
• Disadvantages: Ignores actual data values
(except middle one).
•
• c) Mode
• Most frequent value (can be more than one).
• Advantages: Easy to identify; useful for
categorical data.
• Disadvantages: May not represent data well if
multiple modes or no clear mode.
•
SUMMARY
• Use mean for symmetric, outlier-free data.
• Use median when data is skewed or has
outliers.
• Use mode for categorical/discrete data.
•
MEASURES OF VARIATION
(SPREAD OF THE DATA)
• a) Range
• Range = Maximum − Minimum
• Advantages: Simple, quick.
• Disadvantages: Affected by extremes;
ignores distribution of rest of data.
•
• b) Interquartile Range (IQR)
• IQR = Q3−Q1
• Middle 50% of data.
• Advantages: Resistant to outliers; shows central spread.
• Disadvantages: Ignores extremes (which may be
relevant).
•
• c) Variance & Standard Deviation (SD)
•
•