Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
3 views32 pages

Statistics 1 Notes

The document covers data representation and analysis, detailing types of data (qualitative and quantitative), data collection methods, and various data representation techniques like tables, bar charts, histograms, and more. It explains the advantages and disadvantages of each method, as well as measures of central tendency and variation. The summary emphasizes the appropriate use of different statistical tools based on data characteristics.

Uploaded by

Tristan Graham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODP, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views32 pages

Statistics 1 Notes

The document covers data representation and analysis, detailing types of data (qualitative and quantitative), data collection methods, and various data representation techniques like tables, bar charts, histograms, and more. It explains the advantages and disadvantages of each method, as well as measures of central tendency and variation. The summary emphasizes the appropriate use of different statistical tools based on data characteristics.

Uploaded by

Tristan Graham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODP, PDF, TXT or read online on Scribd
You are on page 1/ 32

STATISTICS 1

DATA REPRESENTATION AND ANALYSIS

• . Types of Data
• Qualitative (categorical): Non-numerical (e.g., eye colour,
type of school).

• Quantitative (numerical): Numbers.


• Discrete: Countable (e.g., number of students).
• Continuous: Measured (e.g., height, time).

• Data Collection Methods
• Primary data: Collected directly (e.g., survey, experiment).
• Secondary data: Already collected (e.g., published reports,
databases).
• Sampling methods: Random, stratified, systematic, quota,
convenience.
• Good sampling avoids bias and ensures
representativeness.

DATA REPRESENTATION METHODS
1. TABLES (FREQUENCY DISTRIBUTION TABLES)

• Advantages:
• Easy to construct.
• Good for summarizing large sets of raw numbers.
• Shows exact frequencies.
• Disadvantages:
• Not very visual—hard to see trends quickly.
• Large tables may look overwhelming.

2. BAR CHARTS ( FOR DISCRETE DATA)

• Advantages:
• Clear and simple to understand.
• Easy to compare different categories.
• Disadvantages:
• Not suitable for continuous data.
• Can mislead if bar widths are inconsistent or scales
distorted.

3. HISTOGRAMS ( FOR CONTINUOUS
DATA)
• Advantages:
• Shows distribution shape (normal, skewed,
etc.).
• Handles large data sets well.
• Disadvantages:
• Exact values are lost (grouped).
• Choice of class intervals can affect
interpretation.

4. PIE CHARTS

• Advantages:
• Easy to visualize relative proportions.
• Works well for survey or categorical data.
• Disadvantages:
• Hard to compare slices with close sizes.
• Not effective with many categories.

5. LINE GRAPHS

• Advantages:
• Excellent for showing patterns and changes.
• Good for making predictions.
• Disadvantages:
• Not suitable for categorical data.
• Can exaggerate trends if axes aren’t scaled
carefully.

6. SCATTER DIAGRAMS

• Advantages:
• Makes correlation visible (positive, negative,
none).
• Useful for identifying outliers.
• Disadvantages:
• Doesn’t prove cause and effect.
• Dense plots can be hard to interpret.

7. STEM-AND-LEAF DIAGRAMS

• Advantages:
• Retains all raw data values.
• Quick to construct for small sets.
• Disadvantages:
• Not practical for large data sets.
• Less familiar to many audiences.

SUMMARY

• Discrete data (categories): Bar charts,


pie charts.
• Continuous data: Histograms, line graphs.
• Relationships: Scatter diagrams.
• Exact data values: Tables, stem-and-leaf.


DRAWING & INTERPRETING
STATISTICAL DIAGRAMS
• 1. Stem-and-Leaf Diagrams
• Drawing
• Split numbers into a stem (leading digits) and a leaf
(last digit).
• Write stems in a vertical column, leaves in ascending
order.
• Include a key (e.g., “2 | 5 = 25”).


INTERPRETING

• Shows raw data clearly.


• Easy to spot median, mode, range.
• Useful for small to medium data sets.
• Advantage: Retains all actual values.
Disadvantage: Becomes messy with large
data.

EXAMPLE
EXERCISE
2. BOX-AND-WHISKER PLOTS (BOXPLOTS)

• Drawing
• Based on the five-number summary:
• Minimum, Q1 (lower quartile), Median, Q3
(upper quartile), Maximum.
• Draw a box from Q1 to Q3, with a line at the
median.
• Extend whiskers to the min and max.

INTERPRETING

• Shows spread and skewness.


• Compares distributions between groups.
• Long whisker = more spread in that direction.
• Median closer to Q1 or Q3 = skew.
• Advantage: Excellent for comparing data sets.
Disadvantage: Raw data detail is lost.

EXERCISE
3. HISTOGRAMS

• Drawing
• For continuous data divided into intervals.
• Area (not height) of each bar is proportional to
frequency.
• If class widths differ, use frequency density:
• Frequency Density = Frequency / Class Width​

INTERPRETING

• Reveals shape of distribution: normal,


skewed, bimodal.
• Height and width show concentration of
data.
• Advantage: Handles large continuous data
well.
Disadvantage: Exact values are lost when
grouped.

4. CUMULATIVE FREQUENCY GRAPHS

• Drawing
• Plot cumulative frequency against
the upper-class boundary of
each interval.
• Join points with a smooth curve.

INTERPRETING

• Estimate median (50th percentile), quartiles, and


percentiles.
• Shape shows spread: steep slope = many data points
in that interval.
• Advantage: Useful for finding medians, quartiles,
percentiles.
Disadvantage: Cannot recover individual data
values.

SUMMARY

• Stem-and-leaf: Good for raw values, small data.


• Boxplot: Compares spread & skew between sets.
• Histogram: Shows distribution shape for
continuous data.
• Cumulative frequency graph: Finds medians,
quartiles, percentiles.

MEASURES OF CENTRAL TENDENCY

• a) Mean
• Mean = Sum of all data values / Number of values
• Advantages: Uses all data; good for further
statistical calculations.
• Disadvantages: Affected by outliers and skewed
data.

• b) Median
• Middle value when data is ordered.
• For even number of values: average of two
middle values.
• Advantages: Not affected by extreme values.
• Disadvantages: Ignores actual data values
(except middle one).

• c) Mode
• Most frequent value (can be more than one).
• Advantages: Easy to identify; useful for
categorical data.
• Disadvantages: May not represent data well if
multiple modes or no clear mode.

SUMMARY

• Use mean for symmetric, outlier-free data.


• Use median when data is skewed or has
outliers.
• Use mode for categorical/discrete data.

MEASURES OF VARIATION
(SPREAD OF THE DATA)

• a) Range
• Range = Maximum − Minimum
• Advantages: Simple, quick.
• Disadvantages: Affected by extremes;
ignores distribution of rest of data.

• b) Interquartile Range (IQR)
• IQR = Q3−Q1
• Middle 50% of data.
• Advantages: Resistant to outliers; shows central spread.
• Disadvantages: Ignores extremes (which may be
relevant).

• c) Variance & Standard Deviation (SD)

You might also like