Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
24 views12 pages

E9 Statistics

The document provides an overview of statistical measures including mean, median, mode, range, and interquartile range, explaining their definitions and applications. It discusses methods for estimating mean and comparing distributions, as well as various statistical diagrams such as bar charts, pie charts, and histograms. Additionally, it covers cumulative frequency, box-and-whisker plots, and scatter diagrams, emphasizing the importance of context and outliers in data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views12 pages

E9 Statistics

The document provides an overview of statistical measures including mean, median, mode, range, and interquartile range, explaining their definitions and applications. It discusses methods for estimating mean and comparing distributions, as well as various statistical diagrams such as bar charts, pie charts, and histograms. Additionally, it covers cumulative frequency, box-and-whisker plots, and scatter diagrams, emphasizing the importance of context and outliers in data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

E9 Statistics

Mean – the total of all the values divided by the number of values.
• What occurs on average
• Problems with the mean occur when there are one or two unusually high or low values in the data, or outliers.

Median – the middle value when data is arranged into numerical order

• Use median to avoid extreme values (outliers)


• Odd number of values: use one middle value
• Even number of values: find the mean of the middle values

Mode – the value occurring most often

• Also referred to the modal value


• If no value occurs more often than the others, there is no mode
• If two values occur the most, there are two modes (bi-modal)

In Data Tables

• Mean: find overal total of the data values and divide it by the frequency
• Median: find where the (n + 1)/2th value lies
• Mode: look for the highest frequency and the corresponding data value

9.4 Grouped and Continuous Data

• Grouping data into classes (raw data is lost)


• Use for continuous data (height)
• Mean: can be estimated
• Median: there can be a class intrval that the median lies in
• Mode: There is a modal class with the highest frequency

Estimating Mean

1. Use the class midpoints as data values (the midpoints of 150 ≤ x < 160 would be 155)
2. Multiply frequency by midpoint
3. Divide the total (frequency x midpoint) by the number of data values

Be careful in finding the midpoints

Range – the difference between the highest data value and the lowest data value

• Measures how spread out the data is


• Measures the variation in the data
• Problem: Highly influenced by anomalies / outliers in the data, may not show a true representation of how
spread the rest of the data may be
Interquartile Range (IQR)

• First quartile / lower quartile is a quarter of the way along the data (when in order)
• Second quartile / median splits the data set into two parts, halway along the data \
• Third quartile / upper quartile is three quarters of the way along the data

Lower Quartile Median Upper Quartile

Interquartile Range is the difference between the upper quartile and the lower quartile

• Might be a better measure of how spread data is, as an outlier could heavily affect range

Comparing Distributions

Look to compare two things: The average of the distributions and the spread of the distributions
Average Choose the appropriate average
• Mean: includes all the data
• Median: not affected by extreme values
• Mode: used for non-numerical data
Consider whether it is better for the average to be bigger or smaller
- time to complete a puzzle – the smaller the better,
- test scores – the bigger the average the better
Give numerical values for the average and explicitly compare
Give comparison in context
Example:
- The mean for dogs is 17 kg which is bigger than the mean for cats which is 13 kg
- The mean for dogs is bigger which suggests that, on average, dogs are heavier than cats
Spread Choose the appropriate range
- Range: affected by extreme values
- Interquartile range: focuses on the middle 50%
Consider whether it is better for the range to be bigger or smaller
- Smaller range / IQR = consistency
- Bigger range / IQR = more spread
Give numerical values for range and explicitly
compare Give comparison in context Example:
- The interquartile range for dogs is 6 kg which is bigger than the interquartile range for cats
which is 4 kg
- The interquartile range for cats is smaller which suggests that the weights of cats are more
consistent and less spread out than dogs
Perhaps check for outliers when comparing raw data (If one, or both, of the data sets has a data value
that is much larger or smaller than the others, this may need mentioning and a possible reason given)
More Examples

9.3 Bar Charts, Pie Charts, Pictograms, Stem-and-leaf Diagrams, Simple Frequency Distributions,
Histograms, Scatter Diagrams

Statistical Diagrams Stem & Leaf Diagrams

Simple but effective way to show data

• Raw data is available, as the numbers themselves create the diagram


• Puts data into order and classes (groups)
• Must have a key (two-digit data could be 26 but could also be 2.6)
• Median: use the formula or cross out the next lowest/highest numbers until you meet in the middle
• Quartiles: repeat the process for median on the upper/bottom half of the data (not including the median)
Bar Charts

• Represent qualitative and discrete data (colours of cars, show sizes, names of students)
• Use when there is a small number of possible outcomes
• Data is categorical (non-numeral)

Drawing

• Y-axis: frequency
• X-axis: the different outcomes
• Bar should have the same width
• Gap between each bar

Pictogram

• Visual way to represent qualitative or discrete data


• Used for categorical (non-numerical) data, alternative to bar charts
• No axes, frequency is represented by symbols

Pie Charts

• Circle divided into slices to show proportion, the relative size of categories of data
• It’s a pie chart come on now
• Find the angles and use a protractor to draw the pie
Reading and Interpreting Statistical Diagrams

• Read the question carefully


• Gather meaningful, required information from a diagram such as the mean, median, mode, range, and
interquartile range to be calculated
• Look for:
o A key or shading that indicate what certain parts of the diagram mean
o Information given through the labels on the axes (frequency)
o Anything unusual or unexpected
o Anomalies / Outliers

Comparing Statistical Diagrams

• Compare one diagram with another that represent different characteristics (dogs and cats)
• Comment on differences or similarities in: mean, median, mode, range, interquartile range, anomalies
• Make at least two pairs of comments: averange and range
• A comparison of the average/range mentioning the numbers involved
• What the comparison means in the context of the question
• Suggest assumptions / problems with the data that could affect the reliability of results and comparisons

Example

1. Class A's median of 11 was higher than class B's median of 6


2. On average class A scored higher marks on the test than class B
3. Class A's interquartile range of 5 was higher than class B's interquartile range of 3
4. The test scores in class A showed more variation than the scores in class B

• e.g. do we assume that the test class A and class B took were the same?
• e.g. were class A and class B of similar ability/age?
The modal temperature is 9 °C

Mean temperature for the week


is 10 °C

is 3 °C

Range of temperature for the week


Histograms

Frequency density – measure of how spread out data within its class interval is, relative to its size. The larger the
frequency density, the more densely spread and closer together the data is.

Worked example

Histograms

• Used for continuous data, grouped in unequal class intervals


• The area of the bar determines the frequency (frequency of a class interval is proportional to the area of the
bar for that interval)

Drawing

• Bars are drawn with the widths being measured on the x-axis
• Vertical axis is frequency density, the height of each bar
• Area is proportional to frequency
• As the data is continuous, the bars will be touching Example
Another Example: https://www.savemyexams.com/igcse/maths_extended/cie/23/revision-notes/4-
probabilityand-statistics/4-6-histograms/4-6-1-histograms/
9.6 Cumulative Frequency, Box-and-whisker Plots

Cumulative Frequency – all the frequencies for the different groups total up to the end of that group

• Tables may be relabelled as starting at the beginning ( x < 20, x < 40, x < 60)
Cumulative Frequency Graph

• Plotted against the end of the class interval


• Join points up with a smooth curve

Finding Stuff (for n data values)

• Lower quartile: n / 4
• Median: n / 2
• Upper quartile: 3n / 4
• Percentile: n / 100

1. Draw a horizontal line from the result on the cumulative frequency axis until it hits the curve
2. Draw a vertical line from the curve down to the horizontal axis and take the reading

Example
Box and Whisker Diagrams

• Divides data up into quartiles


• Data might contain extreme values, and we can see what is happening at the low, middle, and high points
• The box represents the interquartile range (middle 50% of the data)
• The lowest data value and the highest data value are joined to the box by horizontal lines, whiskers

Comparing the number of goals scored per game by the two teams:

1. The median number of goals per game is higher for Union Athletic (4 goals) than Albion Rovers (3 goals).
2. This means that on average, Union Athletic scored more goals per game than Albion Rovers.
3. The interquartile range (IQR) is higher for Union Athletic (4) than Albion Rovers (3).
4. This means that Albion Rovers were more consistent regarding the number of goals they scored per game.

9.7 Scatter Diagrams

• Positive correlation: when one quantity increases, the other quantity increases
• Negative correlation: when one quantity decreases, the other quantity increases
• No correlation: no relationship bro
9.8 Line of Best Fit
• Some cases would make sense to draw through the origin (height and weight of a kitten)
• Roughtly be as many points on one side of the line as the other
• Spaces between the points and the line should roughly be the same
• Do not extrapolate with the line of best fit (extrapolation is estimating the value of a variable or function
outside of the observed range, using a value outside the points that have been plotted)

You might also like