Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
23 views7 pages

Unit 3

The document provides an overview of central tendency and measures of dispersion in statistics, including definitions and calculations for mean, median, mode, range, standard deviation, and variance. It also covers skewness and kurtosis, explaining their significance in data distribution and how they relate to symmetry and outliers. Additionally, it describes quartiles and quartile deviation, emphasizing their role in understanding data spread.

Uploaded by

Sayyan Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views7 pages

Unit 3

The document provides an overview of central tendency and measures of dispersion in statistics, including definitions and calculations for mean, median, mode, range, standard deviation, and variance. It also covers skewness and kurtosis, explaining their significance in data distribution and how they relate to symmetry and outliers. Additionally, it describes quartiles and quartile deviation, emphasizing their role in understanding data spread.

Uploaded by

Sayyan Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

STATISTICS & ANALYTICS (20SC02P)

Central Tendency:
The central tendency is the descriptive summary of a data set. Through the single value
from the dataset, it reflects the centre of the data distribution. Also it does not provide
information regarding individual data from the dataset, where it gives a summary of the
dataset. Generally, the central tendency of a dataset can be defined using some of the
measures in statistics.
Definition: The statistical measure that represents the single value of the entire
distribution or a dataset is called as Central Tendency. It aims to provide an accurate
description of the entire data in the distribution.
The central tendency of the dataset can be measured by using the three important
measures namely Mean, Median and Mode.
It is a single value that attempts to describe a set of data by identifying the middle of
the central position within the given dataset. Sometimes these measures are called the
standards of middle or the central location. The mean (otherwise known as the average)
is the most commonly used measure for central tendency, but there are other
methodologies such as the median and the mode.

Range:
Range of a set of data is, the difference between the largest and smallest values. It gives
a rough idea of how the outcome of the data set. The range of a set of data is the result
of subtracting the smallest value from largest value.

Mean:
The mean (average) of a data set is found by adding all numbers in the data set and
then dividing by the number of values in the set. In general, it is considered as the
arithmetic mean.
𝐒𝐮𝐦 𝐨𝐟 𝐕𝐚𝐫𝐢𝐚𝐛𝐥𝐞𝐬
𝐌𝐞𝐚𝐧 =
𝐍𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐕𝐚𝐫𝐢𝐚𝐛𝐥𝐞𝐬
In general, 𝐼𝑓 𝑥1, 𝑥2, 𝑥3, … … … , 𝑥𝑛 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠, 𝑡ℎ𝑒𝑛 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑖𝑠 𝑔𝑖𝑣𝑒𝑛 𝑏𝑦
𝒙𝟏+ 𝒙𝟐 +𝒙𝟑 + ……+𝒙 𝒏
𝐌𝐞𝐚𝐧
𝐧
=
1|Page
STATISTICS & ANALYTICS (20SC02P)
For example: If 10, 11, 15, 20, 25,33 are the numbers, then the mean is
𝟏𝟎+ 𝟏𝟏+𝟏𝟓+ 𝟐𝟎+𝟐𝟓+𝟑𝟑
𝐌𝐞𝐚𝐧 = = 𝟏𝟗
𝟔

Mode:
The mode is the value that appears most frequently in a data set. That is the most
repeated value in a given data set is called as Mode. A set of data may have one mode,
more than one mode, or no mode at all.

Examples of the Mode:


In the following list of numbers, 16 is the mode, since it appears more times in the
set than any other number:

• 3, 3, 6, 9, 16, 16, 16, 27, 27, 37, 48

In the example given below, both the number 3 and the number 16 are modes, as they
each occur three times and no other number occurs more often.

• 3, 3, 3, 9, 16, 16, 16, 27, 37, 48

If no number in a set of numbers occurs more than once, that set has no mode:

• 3, 6, 9, 16, 27, 37, 48

Median:
The median of a set of data is the middlemost number or centre value in the set. The
median is also the number that is halfway into the set.
To find the median, the data should be arranged first, in ascending order or descending
order. The median is different for different types of distribution.

2|Page
STATISTICS & ANALYTICS (20SC02P)

Dispersion :
The extent to which the numerical data is likely to spread or dispersed or vary about an average
value is called as Dispersion. It helps to understand the data distribution. Measure of
Dispersion helps to interpret the variability in data.

Types of Measure of Dispersion :


There are two main types of dispersion methods in statistics which are:
• Absolute Measure of Dispersion
• Relative Measure of Dispersion

Absolute Measure of Dispersion:


Absolute measures of dispersion use the original units of data, and are most useful for
understanding the dispersion within the context of your experiment and measurements.
Absolute measures of dispersion include:
• The range,
• The quartile deviation,
• The mean deviation,
• The standard deviation and variance.

Relative Measures of Dispersion:


Relative measures of dispersion are calculated as ratios or percentages; for example,
one relative measure of dispersion is the ratio of the standard deviation to the mean.
Relative measures of dispersion are always dimensionless, and they are particularly
useful for making comparisons between separate data sets or different experiments that
might use different units. They are sometimes called coefficients of dispersion.

3|Page
STATISTICS & ANALYTICS (20SC02P)
Some Commonly Used Measures of Relative Dispersion / Absolute Dispersion:
Range: The simplest measure of absolute dispersion is the range. This is just the upper
limit minus the lower limit; the largest data point minus the smallest.
We can write this as R = H – L.
Example: 1, 3,5, 6, 7 => Range = 7 -1= 6

Standard Deviation: The standard deviation is a more complicated measure of absolute


dispersion, you could calculate it by squaring the difference between each data point
and the mean, summing those squares, dividing by a number that is one less than the
number of your data points, and then taking the square root of that. Since your values
are squared and, in the end, the square root is taken again, the standard deviation is
given in your original units of measure.

Variance: Deduct the mean from each data in the set, then squaring each of them and
adding each square and finally dividing them by the total no of values in the data set is
the variance.
The square root of the variance is known as the standard deviation

Standard Deviation in Excel


Standard deviation in Excel helps you to understand, how much your values deviate
from the Average or Mean that is it tells you that whether your data is somewhere close
to the average or fluctuates a lot. If the value received is on the higher side then that
means that your data has a lot of fluctuations and vice versa. To calculate standard
deviation in excel we use STDEV function.

Quartile:
A quartile is statistical term that describes a division of observations into four defined
intervals based on the values of the data and how they compare to the entire set. Each
quartile contains 25% of the total observations. Generally, the data is arranged from
smallest to largest:

1. First quartile: the lowest 25% of numbers


2. Second quartile: between 25.1% and 50% (up to the median)
3. Third quartile: 51% to 75% (above the median)
4. Fourth quartile: the highest 25% of numbers

4|Page
STATISTICS & ANALYTICS (20SC02P)

Minimum Value QUARTILE(A1:A20,0) MIN(A1:A20)

1st Quarter QUARTILE(A1:A20,1)

Median QUARTILE(A1:A20,2) MEDIAN(A1:A20)

3rd Quarter QUARTILE(A1:A20,3)

Maximum Value QUARTILE(A1:A20,4) MAX(A1:A20)

Select the data and by using above functions we get the Quartile values.
Ex:

Quartile Deviation:
Quartile deviation is based on the difference between the first quartile and the third
quartile in the frequency distribution and the difference is also known as the
interquartile range, the difference divided by two is known as quartile deviation or semi-
interquartile range.

Mean and Mean Deviation:


The average of numbers is known as the mean and the arithmetic mean of the absolute
deviations of the observations from a measure of central tendency is known as the mean
deviation (also called mean absolute deviation).

SKEWNESS and KUTROSIS:


5|Page
STATISTICS & ANALYTICS (20SC02P)
SKEWNESS:
If one tail is longer than another, the distribution is skewed. These distributions are
sometimes called asymmetric or asymmetrical distributions as they don’t show any kind
of symmetry. Symmetry means that one half of the distribution is a mirror image of the
other half. For example, the normal distribution is a symmetric distribution with no
skew. The tails are exactly the same.
It is the degree of distortion from the symmetrical bell curve or the normal distribution. It
measures the lack of symmetry in data distribution. It
differentiates extreme values in one versus the other tail. A symmetrical distribution
will have a skewness of 0.

There are two types of Skewness: Positive and Negative

Positive Skewness means when the tail on the right side of the distribution is longer or
fatter. The mean and median will be greater than the mode.

Negative Skewness is when the tail of the left side of the distribution is longer or fatter
than the tail on the right side. The mean and median will be less than the mode.

So, when is the skewness too much?


The rule of thumb seems to be:
• If the skewness is between -0.5 and 0.5, the data are fairly symmetrical.
• If the skewness is between -1 and -0.5(negatively skewed) or between 0.5 and
1(positively skewed), the data are moderately skewed.
• If the skewness is less than -1(negatively skewed) or greater than 1(positively
skewed), the data are highly skewed.

6|Page
Kurtosis:
Kurtosis is all about the tails of the distribution — not the peakedness or flatness. It is
used to describe the extreme values in one versus the other tail. It is actually
the measure of outliers present in the distribution.

High kurtosis in a data set is an indicator that data has heavy tails or outliers. If there
is a high kurtosis, then, we need to investigate why do we have so many outliers. It
indicates a lot of things, maybe wrong data entry or other things.

Low kurtosis in a data set is an indicator that data has light tails or lack of outliers. If
we get low kurtosis (too good to be true), then also we need to investigate and trim the
dataset of unwanted results.

Mesokurtic: This distribution has kurtosis statistic similar to that of the normal
distribution. It means that the extreme values of the distribution are similar to that of a
normal distribution characteristic. This definition is used so that the standard normal
distribution has a kurtosis of three.

Leptokurtic (Kurtosis > 3): Distribution is longer, tails are fatter. Peak is higher and
sharper than Mesokurtic, which means that data are heavy-tailed or profusion of
outliers.
Outliers stretch the horizontal axis of the histogram graph, which makes the bulk of
the data appear in a narrow (“skinny”) vertical range, thereby giving the “skinniness” of
a leptokurtic distribution.

Platykurtic: (Kurtosis < 3): Distribution is shorter, tails are thinner than the normal
distribution. The peak is lower and broader than Mesokurtic, which means that data
are light-tailed or lack of outliers.
The reason for this is because the extreme values are less than that of the normal
distribution.

7|Page

You might also like