Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
34 views14 pages

Basic Statistical Concepts - Measures of Location

Statistics is a scientific discipline focused on collecting, organizing, summarizing, analyzing, and drawing conclusions from numerical data to aid decision-making under uncertainty. It encompasses two main areas: descriptive statistics, which describes data, and inferential statistics, which makes inferences about populations based on samples. Key concepts include types of variables, levels of measurement, sampling methods, and measures of central tendency such as mean, median, and mode.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views14 pages

Basic Statistical Concepts - Measures of Location

Statistics is a scientific discipline focused on collecting, organizing, summarizing, analyzing, and drawing conclusions from numerical data to aid decision-making under uncertainty. It encompasses two main areas: descriptive statistics, which describes data, and inferential statistics, which makes inferences about populations based on samples. Key concepts include types of variables, levels of measurement, sampling methods, and measures of central tendency such as mean, median, and mode.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

DATA MANAGEMENT

What is Statistics?

❖​ Statistics is a scientific discipline consisting of theory and methods for processing


numerical information that one can use when making decisions in the face of
uncertainty.
❖​ It is a science of conducting studies, to collect, organize, summarize, analyze and
draw conclusions from data.

Two Main Areas of Statistics

1. Descriptive Statistics

❖​ In here, statisticians try to describe a situation.


❖​ it consists of the collection, organization, summarization, and presentation of
data.

2. Inferential Statistics

❖​ In here, statistician try to make inference from samples to populations.


❖​ It uses probability, i.e., the chance of an event occurring.
❖​ it consists of generalizing from samples to populations, performing estimations
and hypothesis test, determining relationships among variables, and making
predictions.

Key Definitions

A universe is the collection of things or observational units under consideration.

A variable is a characteristic observed or measured on every unit of the universe. It is a


characteristic or attribute that can assume different values.

A data are values that the variables can assume.

Data Set is a collection of data values.

A population consists of all subjects (human or otherwise) that are being studied.

A sample is a group of subjects selected from a population.

Types of Variables

1. Qualitative variables

▪​ These are non-numerical values


▪​ These are variables that can be placed into distinct categories accdg. to
some characteristics or attributes.
Examples: Type of School, Educational Qualification, Ethicity, Economic Status,

2. Quantitative variables

▪​ These are numerical values that can be ordered or ranked

Example: age, height, weight, body temperature

Classification of Quantitative Variables

1. Discrete Variable – assume values that can be counted.

Example: no. of children in a family, no. of students in a classroom, etc.

2. Continuous Variable – assume an infinite number of values bet. any two specific values.
These include fractions and decimals.

Example: height, weight, etc.


length (15cm)-14.5-15.5 cm
weight (1.6g) 1.55 – 1.65 g

* Since data must be measured, answer must be rounded because of the very limited device.

Levels of Measurement

1. Nominal Level

▪​ It classifies data into mutually exclusive (nonoverlapping), in which no


order or ranking can be imposed on the data
▪​ numbers or symbols are used to classify

Examples: classifying teachers according to subject taught, classifying subjects


according to educational attainment, etc.

2. Ordinal Level

▪​ Classifies data into data that can be ranked; however precise difference
between the ranks do not exist

Example:

Student evaluation result might be ranked the faculty as excellent, satisfactory ,poor, etc
Children in a family might be ranked as 1st child, 2nd child, etc.

3. Interval Level

- Ranks data and precise differences bet units of measure do exist; however there is no
meaningful zero.

Example: Temperature, say a meaningful difference of 10 C bet. 370C and 380C.

▪​ No meaningful/absolute zero means, say a temperature of 00 C doesn’t


mean no heat at all.

4. Ratio Level

- possesses all the characteristics of interval measurement, and there exists a true zero.

- True ratios exist when the same variable is measured on two different members of the
​ population.

Example: If 1 person can lift 50kg and another can lift 100kg, then the ratio bet them is
1:50, 1; 100

Methods of Presenting Data

1.​ Textual
2.​ Tabular
3.​ Graphical

Sampling Methods

1.​ Random Sampling​

Random Sample is a sample in which all members of the population have equal
chance of being selected.

2.​ Systematic Sampling​

Systematic Sample is a sample obtained selecting every kth member of the population.

3.​ Stratified Sampling​


Stratified Sample is a sample obtained by dividing up the population into
subgroups(strata) according to some characteristics relevant to the study.( There can be
several subgroups.). Then subjects are selected from each subgroup.

4.​ Cluster Sampling​

Cluster Sample is a sample selected by dividing the population into sections or clusters
and then selecting one or more clusters and using all members in the cluster(s) as the
members of the sample. ​

▪​ It is used when the population is large or when it involves subjects residing in a


large geographic area.

Frequency Distribution and Graphs

The most convenient method of organizing data is to construct a frequency distribution


and the most useful method of presenting data is by the use of statistical tables and graphs.

A frequency distribution is the organization of raw data in table form using classes and
frequencies

Types of Frequency Distribution

1. Categorical Frequency Distribution

- used for data that can be placed in specific categories, such as nominal or ordinal
level data.

Examples: Data such as age, gender, civil status, educational attainment, income etc.
2. Grouped Frequency Distribution

▪​ It is used when the data is large.

Definition of Terms:

1. Range = Highest Value – Lowest Value

2. Class Limits (lower and upper)

▪​ It is the difference by subtracting lower(upper) class limit of one class from the
lower(upper) class limit of the next class.

▪​ should have decimal place value as the data

3. Class Boundaries

▪​ It should have one additional place value and end in 5

4. Class Width

▪​ It the difference bet. the lower(upper) class from the Lower (upper) class
of the next class.

5. Class Midpoint

▪​ It is obtained by adding the lower and upper boundaries or adding the


lower and upper limits and dividing by 2.
▪​ It is the numerical location of the center of the class.

Rules in Constructing Frequency Distribution

1.​ There should be 5- 20 classes.


2.​ It is preferable but not absolutely necessary that class width be an odd number. This is
to ensure that class midpoints of each class has the same place value as the data.

𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 + 𝑢𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦


Class Midpoint= 2

3.​ Classes must mutually exclusive. Mutually exclusive classes have nonoverlapping class
limits so that data cannot be placed into 2 classes.
4.​ The classes must be continuous.
5.​ The classes must be exhaustive . There should be enough classes to accommodate all
the data.
6.​ The classes must be equal in width.
Example: The following data represent the scores of 40 students in a 100-item STAT 221
exam. Construct a frequency distribution table using 9 classes. Find the mean class and the
median.

67​ 67​ 45​ 56​ 56​ 56​ 43​ 77 67 78

​ 39​ 67​ 39​ 29​ 45​ 39​ 27 78 23 45

​ 89​ 67​ 92​ 59​ 60​ 79​ 58 23 96 19

93​ 79​ 67​ 78​ 89​ 45​ 67 18 45 20

MEASURES

1.​ Measure of Location

A Measure of Location summarizes a data set by giving a “typical value” within the range of
the data values that describes its location relative to entire data set.

Some Common Measures:

​ ☞ Central Tendency

​ ☞ Percentiles, Deciles, Quartiles

a.​ Percentile
▪​ It is a numerical measure that give the relative position of a data
value relative to the entire data set.

▪​ It divides an array (raw data arranged in increasing or decreasing


order of magnitude) into 100 equal parts.

Percentiles are also used to compare an individual’s test score with the national
norm.

A percentage score indicates the proportion of a test that someone has


completed correctly.

A percentile score tells us what percent of other scores are less than the data
point we are investigating.

Example:
1.​ (Exam Result) John is in the 78 Percentile on CPA Exam, it means than he
performs better than 78% of the takers.
2.​ (Height). Jamie is in the 98 Percentile among the students in the class. This
means 98% of the class are shorter than her.

b.​ Decile
▪​ It divides an array into ten equal parts, each part having ten
percent of the distribution of the data values, denoted by Dj.

​ The 1st decile is the 10th percentile; the 2nd decile is the 20th percentile…..

c.​ Quartile

▪​ It divides an array into four equal parts, each part having 25% of the
distribution of the data values, denoted by Qj.
​ The 1st quartile is the 25th percentile; the 2nd quartile is the 50th percentile,
also the median and the 3rd quartile is the 75th percentile.

Steps in Finding Quartiles

Step 1: Arrange the data in order from lowest to highest.


Step 2. Find the median of the data values. This the value for Q2.
Step 3. Find the median of the data values that fall below Q2.
This is the value for Q1.​
Step 4. Find the median of the ​ data values that fall above Q2.
This is ​ the ​ value for Q3.

Example:​
Find the Q1, Q2 and Q3 for the data set:

15, 13, 6, 5, 12, 50, 22, 18

Solution:

Step1. Arrange the data in order:​


5, 6, 12, 13, 15, 18, 22, 50

Step2. Find the median (Q2).


​ 5,6,12,13,15,18,22,50

Q2 = 14

Step 3. Find the median of the data values less than 14.​
5,6,12,13

Q1 = 9

Step 4. Find the median of the data values greater than 14. ​
15,18, 22,50

Q3= 20

The interquartile range (IQR) is defined as the difference between Q1 and Q3


.

IQR = Q3-Q1 = 20-9 = 11

MEASURES OF CENTRAL TENDENCY

1. Mean

The arithmetic mean, often called as the mean, is the most frequently used measure of
central tendency. The mean is the only common measure in which all values play an equal role
meaning to determine its values you would need to consider all the values of any given data
set. The mean is appropriate to determine the central tendency of an interval or ratio data. The
symbol 𝑥 , called “x bar”, is used to represent the mean of a sample and the symbol μ, called
“mu”, is used to denote the mean of a population.

A. Properties of Mean

1.​ The mean is found by using all the values of the data.
2.​ The mean varies less than the median or mode when samples are taken from the
same population and all three measured are computed for these samples.
3.​ The mean is used in computing other statistics, such as variance.
4.​ The mean for the data set is unique and not necessarily one of the data values.
5.​ The mean cannot be computed for data in a frequency distribution than has an
open ended class.
6.​ The mean is affected by extremely high or low values, called outliers, and may not
be the appropriate average to use in these situations.
2. Median

The median is the midpoint of the data array. When the data set is ordered whether ascending
or descending, it is called data array. Median is an appropriate measure of central tendency for
data that are ordinal or above, but it is more valuable in an ordinal type of data.

A. Properties of the Median

1. The median is unique, there is only one median for the data set.

2. The median is used to find the center or middle value of a data set.

3. The median is used when it is necessary to find out whether the data values fall in the upper
or lower half of the distribution.

4. Median is not affected by the extreme values.

5. Median can be computed for an open-ended frequency distribution.

6. Median can be applied for ordinal, interval and ratio data.

B. Median for the Ungrouped Data

To determine the value of median for ungrouped we need to consider two rules:

1. If n is odd, the median is the middle ranked.

2. If n is even, then the median is the average of the two middle ranked values.

𝑛1
Median (Rank Value) = 2

Example1: Find the median of the ages of the middle-management employees of a certain
company. The ages are 53, 45, 59, 48, 54, 46,51, 58 and 55.

Solution:

1.​ Arrange the data in ascending order.


45, 46, 48, 51, 53, 54, 55, 58, 59
2.​ Select the middle rank value using the Formula
𝑛1 91 10
Median (Rank Value) = 2 = 2 = 2 = 5
3.​ Identify the median in the data set.
45, 46, 48, 51, 53, 54, 55, 58, 59

5th
Hence, the median age is 53 years of age.

Example 2: The daily rates of eight employees of a certain Municipality of Davao del Sur are
Php 550, 420, 560, 500, 700, 670, 860, 480. Find the median of the daily rate of employee.

Solution:

1.​ Arrange the data in Php in order.

420, 480, 500, 550, 560, 670, 700, 860

2.​ Select the middle rank value using the Formula:

𝑛1 81 9
Median (Rank Value) = 2
= 2
= 2
= 4.5

3.​ Identify the median in the data set.

420, 480, 500, 550, 560, 670, 700, 860



4.5th

Since the middle point falls between 550 and 560, we can determine the median of the
data set by getting the average of the two values.
550+560 1,110
Median = 2
= 2
= 555

Therefore, the median daily rate is Php 555.

3. Mode

The mode is the value in the data set that appears most frequently. Like the median and unlike
the mean, extreme value in the data set do not affect the mode. A data may not contain any
mode if none of the values is ‘most typical”. A data set that has only one value that occur the
greatest frequency is said to be unimodal. If the data has two values with the same greatest
frequency, both values are considered the mode and the data set is bimodal. If the data set
have more than two modes, the data set is said to be multimodal. If all the values in a data
set are different from each other, the data set is said to have no mode.

A. Properties of Mode

1.​ The mode is used when the most typical case is desired.
2.​ The mode is the easiest average to compute.
3.​ The mode can be used when the data are nominal or categorical, such religious
affiliation, gender, or political affiliation.
4.​ The mode is not always unique. A data set can have more than one mode or the may
not exist for a data set.

Example1: The following data represent the total unit sales for PSP 2000 from a
sample of 10 Gaming Centers for the month of August: 15, 17, 10, 12, 13, 10, 14, 10, 8
and 9. Find the mode.

Solution: The ordered array for these data is 8, 9, 10, 10, 10, 12, 13, 14, 15, 17.
Because 10 appears three times, more times than the other value, therefor the mode is
10.

Example 2: An operation manager in charge of a company’s manufacturing keeps track


of the number of manufactured LCD television in a day. Compute for the following data
that represents the number of LCD television manufactured for the past three weeks:

20, 18, 19, 25, 20, 21, 20, 25, 20, 29, 28, 29, 25, 27, 26, 22 and 20.
Find the mode of the given data set.

Solution: The ordered array for these data is:


18, 19, 20, 20, 20, 20, 21, 22, 25, 25, 25, 25, 26, 27, 28, 29, 29, 30.

There are two modes 20 and 25, since each of these values occurs four times in
a data set.

Measures of Variation

​ A measure of variation is a single value that is used to describe the spread


of the distribution
​ A measure of central tendency alone does not uniquely describe a
distribution

❖​ Range - The difference between the maximum and minimum value in a data
set, i.e.

R = MAX – MIN
​ The larger the value of the range, the more dispersed the observations
are.
​ It is quick and easy to understand.
​ A rough measure of dispersion.

❖​ Variance
​ important measure of variation
​ shows variation about the mean
a.​ Population Variance

Formula:

b.​ Sample Variance

Formula:

❖​ Standard Deviation

a.​ Population SD

Formula:

b.​ Sample SD

Formula:

Properties of Standard Deviation

​ It is the most widely used measure of dispersion. (Chebychev’s Inequality)


​ It is based on all the items and is rigidly defined.
​ It is used to test the reliability of measures calculated from samples.
​ The standard deviation is sensitive to the presence of extreme values.
​ It is not easy to calculate by hand (unlike the range).

Coefficient of Variation (CV)


​ measure of relative variation
​ usually expressed in percent
​ shows variation relative to mean
​ used to compare 2 or more groups

Formula :

Measures of Skewness

​ describes the degree of departures of the distribution of the data from


symmetry.
​ The degree of skewness is measured by the coefficient of skewness,
denoted as SK and computed as
3 (𝑀𝑒𝑎𝑛−𝑀𝑒𝑑𝑖𝑎𝑛)
SK = 𝑆𝐷
Types of Distributions

Frequency distribution can assume many shapes. Three most familiar shapes are symmetric,
positively skewed, and negatively skewed. In a symmetric distribution the data values are
evenly distributed on both sides of the mean. Also, the distribution is unimodal and the mean,
median and mode are similar and are at the center of distribution

What is Symmetry?

Exercise No. 3.

1.​ The hourly output of two groups of employees assembling plug-in units at Zenith were
selected at random. The sample outputs were:
Complete the table; all measurements should be in 2-decimal

a. Which shift performed better? ____________________________________


b. Justify your answer.____
_________________________________________

2.​ Complete the table and find the mean for the following grouped frequency distribution.

N = ________

∑ fx = _______

𝑥 = _______

Correlation- a statistical method used to determine whether a relationship exist


between variables exist.

Regression – a statistical method used to describe the nature of the relationship


bet variables, that is positive or negative, linear or nonlinear.

One Way - ANOVA (Analysis of Variance) – a technique used to determine if there is a


significant difference amon

You might also like