0% found this document useful (0 votes)

9 views41 pages

Topic1 3

The document provides an overview of basic statistical concepts essential for experimental design, including definitions of population, sample, and types of statistics (descriptive vs. inferential). It discusses variables, measures of central tendency and variation, and the importance of sampling distributions and the Central Limit Theorem. The content emphasizes the significance of understanding these concepts for making inferences about populations based on sample data.

Uploaded by

benjaminzhang0728

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views41 pages

Topic1 3

Uploaded by

benjaminzhang0728

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Statistics for Experimental Design

Topic 1:
A Review of Basic Statistical Concepts

Mohammad Darainy
we want to study pop by using sample that is representative of pop (no biased sampling)

Population and Sample

§Population: The entire
set of things of interest.
§Parameter: A property
descriptive of the
population Ex: mean, median
§Population mean

§Sample: A subset of
population. Typically this
provides the data we
will look at.
or statistic §Estimate: A property of
a sample
§Sample mean
property of sample

Descriptive vs. Inferential Statistics

§Descriptive Statistics:
§Summarize/describe the properties of
samples (or populations when they are
completely known)
§Inferential Statistics:
§Draw conclusions/make inferences about
the properties of populations from sample
data
Descriptive vs. Inferential Statistics

Mean(X) = 167

descriptive
Descriptive vs. Inferential Statistics
Hypothesis!

Mean(μ) = 150
? inferential

Mean( X ) = 167

descriptive
Variable: Represent a characteristic of individual in
a sample or population

How many of people in How many of people

this sample are female ? in this sample are
overweight ?

What is the IQ level of What was the grade

the people in this of this sample in
sample ? PSYC 204?
Variables (measurement level)

Qualitative Quantitative
Nominal Ordinal Interval Ratio

Gender IQ level Age

Walking
-Female 80 23
speed
-Male 85 25
-Very slow 90 36
-Slow 95 Weight
Occupation -Normal
-Student 100 44
-Fast 58
-Teacher
102

rank
no rank no meaningful zero meaningful 0
Types of Variables

• Dependent variables (Y):

– Outcomes/Responses
– Predicted variables

• Independent variables (X):

– Aka factors in experimental designs
– Aka predictors/covariates
Types of Variables
• Dependent variables (Y):
Walking speed
• Independent variables (X):
Age
A marketing researcher wants to test the effect of
a new ad on consumers’ preference ratings.

Random Random
sampling assignment
Group 1
(treatment)
Ad

Group 2 No Ad
(control)

Y = Consumer preference (1-10)

X = Ad (0 = no, 1 = yes)
In this course

§ We focus on the relationships between

one dependent variable and
one/multiple independent variables.
n DV – Continuous (typically, normally
distributed)
n IVs – Categorical/continuous
n Ad Example:
n DV = Continuous (Preference: 1-10)
n IV = Categorical (Ad: 0/1)
Descriptive Statistics
get mean and standard deviation to fully describe

§ Summarize/describe the properties of

samples (or populations when they are
completely known)
§ How are the data distributed?
– Where is the center? (central tendency)
– What is the range? (variation)
– What is the shape of the distribution?
(shape)
Descriptive Statistics
To summarize/describe the samples

Central tendency Variation Shape

0 for normal distribution

Mean Range Skewness

Median Variance Kurtosis
pointy or flat curve

Mode Standard deviation

Measures of Central Tendency

§ Mean
§ Median
§ Mode
nce =!"#$%&"$'()'*"+,&#-'."+/"+01'2'!"#+
mean of squared deviation scores (mean square)

The(X
average;
2 Sum of values divided by sample size(N)

=
∑ − µ) µ = population mean
σ 2 = population variance
Np N

åX i
X 1 a+ descriptive
X 2 + ! + Xstatistic
=
∑ (X − X ) 2
X= i =1
N
only
= N
N on average
biased, because
N ( ∑ )x 2
N < σ 2
For Data points: 6,8,8,10,12,12,15,50

2 6+8+8+10+12+12+15+50
both a descriptive statistic

=
∑ (X − X )= =15.12of
and an unbiased estimate
82
σ
N −1 N – 1 = degrees of freedom
Measures of Central Tendency : Mean
§ Mean is affected by extreme values (outliers).

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Mean = 3 Mean = 4

1 + 2 + 3 + 4 + 5 15 1 + 2 + 3 + 4 + 10 20
= =3 = =4
5 5 5 5
!"#$%&"$'()'*"+,&#-'."+/"+01'2'!"/3#+

The exact middle value

§ Calculation:
– If there are an odd number of observations,
find the middle value.
– If there are an even number of observations,
find the middle two values and average them.
(N +1)
Median Rank=
2
6, 8, 8, 10, 12, 12, 15, 50

(10 +12)
Median = = 11
2
!"#$%&"$'()'*"+,&#-'."+/"+01'2'!"/3#+
§ Median is NOT affected by extreme
values (outliers). take out outliers to take accurate mean

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Median = 3 Median = 3
!"#$%&"$'()'*"+,&#-'."+/"+01'2'!(/"
§ The most frequently observed value

6,8, 8,10,12,12,15,50
Modes of this distribution are 8 and 12

§ Not affected by extreme values

§ Used for either numerical or categorical data
§ There may be no mode
§ There may be several modes
Which measure of central tendency is
the best?

• Mean is generally used, unless extreme

values (outliers) exist.

• Median is often used, since the median is

not sensitive to extreme values.
– Example: Median home prices may be
reported for a region – less sensitive to
outliers.
Measures of Variation
Measures of variation
give information on the
spread or variability of
data values.

• Range
• Variance
• Standard Deviation
Same center,
different variation
4#+5"

-Considers only starting point and end point

-Does not show how the data is spread

i.e, range of 10 numbers between 2 to 26

2 3 4 … … 26 …

2 … … 24 25 26
6#&3#+0"

• Average (approximately) of ‘squared’

deviations of values from the mean
unbiased
N
2
Sample variance: (
∑ iX - X )
S2 = i=1
N -1
where X = mean
N = sample size
Xi = i th value of the variable X
Standard Deviation

§ Most commonly used measure of variation

§ A statistic that measures the dispersion of a
dataset relative to its mean.
§ Has the same units as the original data
– Sample standard deviation:
N
2
(X
∑ i − X)
S= i=1
N-1
Comparing Standard Deviations
Data A
Mean = 15.5

11 12 13 14 15 16 17 18 19 20 21
S = 3.338

Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 0.926
Data C

Mean = 15.5

11 12 13 14 15 16 17 18 19 20 21 S = 4.570
Shape of a Distribution
• Describes how data are distributed
• Measures of shape
– Symmetric or skewed
Mean, Median, Mode
Mode Mode

Median Median
Mean
Mean
pos skewed neg skewed
Normal (Gaussian or bell-shaped)
Distribution
• In most statistical techniques for
experimental designs, the dependent
variable (Y) is assumed to be continuous
and normally distributed.
• If normally distributed,
• Mean=Median=Mode
• Mean(μ) and Standard Deviation(σ) are sufficient to describe a
normal distribution.

1 æ ( X - µ)2 ö
like a skewness test
Y= expçç - ÷÷
s2 p è 2s 2
ø
Normal distribution

µ ±σ
68% of the values in the population or the sample
Normal distribution

µ ± 2σ µ ± 3σ
95% 99.7%
Standard score (Z-score)

§ Definition: number of standard deviations from mean.

X−X X −µ
z= or z=
s σ

§ standard score (z-score) follows the standard normal

distribution if your original data is normally
distributed. μ = 0, σ = 1, z ~ N(0,1)
The purpose of the z statistic is to transform any
normal distribution to the standard normal s =5 s =5

distribution the shape of our

157 162 167 172
curve177doesn’t
(X)
182 187
change,
only the units

s =5 s =5 s= 1 s= 1

157 162 167 172 177 182 187 -3 -2 -1 0 +1 +2 +3

(X)

§ This transformation is useful because we can

easily examine how extreme our sample score
(X) is by simply looking at the corresponding z
s= 1 s= 1

score.
-3 -2 -1 0 +1 +2 +3
Example: You get a GRE score of 800. Is that
good? Will you get into grad school?

X-µ
z=
s
300 400 500 600 700 800 900

• Let’s say µ = 600, s = 100

• z = (800 - 600)/100 = 2
Example: You get a GRE score of 800. Is that
good? Will you get into grad school?

This area is
2.28% of the
distribution

300 400 500 600 700 800 900

With the help of z table the area above z=2.0 is only

0.0228.That means that you scored in the top 2.28%!
Example: You get a GRE score of 800. Is that
good? Will you get into grad school?

This area is
2.28% of the
distribution

300 400 500 600 700 800 900

With the help of z table the area above z=2.0 is only

0.0228.That means that you scored in the top 2.28%!
Sampling Distribution of the Mean
• Three Types of Distributions
• Population Distributions
The distribution of all scores in the population. Imagine we are
interested in the height of all currently enrolled McGill students. The
resulting frequency distribution will be our population distribution.

Male ~ N(175.1 , 4.5)

Female ~ N(162.3 , 4.0)
Combined ~ (μ = 168.7 , σ = 7.7)
* The true standard deviations are
male = 7.42 and female = 7.11 which
were changed for effect.
Sampling Distribution of the Mean
• Sample Distributions
Draw a McGill student at random and measure his/her
height.
Put him/her back (replacement)
Draw again and measure the height of the student
Do you expect the two heights to be the identical?
Suppose we repeat this procedure, but draw 50 students
each time instead of one
Do you expect the two sets of heights to be identical?
Random Variation: Two samples drawn randomly from
the same population will practically never be identical.
Height Distribution: Samples (n=50)
Sampling Distribution of the Mean
• Sampling Distributions
Draw two McGill students at random and measure their
heights
Put them back (replacement)
Draw two students again and measure their heights
Do you expect the means of these two samples to be
identical?
Again, what about for two samples of 50 students?
Random Variation in Sample Statistics:
Just like individual observations vary randomly between
samples, so do the statistics generated from those samples.
And just like the variation among observations can be
described by probability distributions, so can the variation
in the samples statistics.
Sampling Distributions: The distribution of a
statistic generated from samples.
Why are Sampling Distributions Important?
They are the foundation for statistical inference and
hypothesis testing
Every statistic has a sampling distribution:
Means, standard deviations, medians, maxima/minima,
etc.
In this course we are interested in Sampling Distribution
of the Mean.
Sampling Distribution of the Mean
To explore sampling distributions, let’s use the following online
applet.
http://onlinestatbook.com/stat_sim/sampling_dist/

Explore on your own: What aspects of the population and sample

distributions affect the resulting sampling distribution of the Mean?
Central Limit Theorem
sampling distribution
of the mean

µX

Normality of the Sampling Distribution of the Mean

Central Limit Theorem

When the sample size is large (i.e., > 30):

Even if the variable is not normally distributed, the

sampling distribution of the mean approaches
σ
normality, with Xµ = µ and σ X
= .
N

Bioepi Lesson 6. Descriptive Statistics
No ratings yet
Bioepi Lesson 6. Descriptive Statistics
38 pages
Descriptive Statistics & Probability Guide
No ratings yet
Descriptive Statistics & Probability Guide
510 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
41 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
Central Tendency and Variability: The Two Most Essential Features of A Distribution
No ratings yet
Central Tendency and Variability: The Two Most Essential Features of A Distribution
29 pages
Descriptive Statistics MBA
100% (3)
Descriptive Statistics MBA
7 pages
Descriptive Statistics Techniques
No ratings yet
Descriptive Statistics Techniques
108 pages
Descriptive Statistics-1
No ratings yet
Descriptive Statistics-1
7 pages
Unit 8. Data Analysis
No ratings yet
Unit 8. Data Analysis
69 pages
Matm Midterms
No ratings yet
Matm Midterms
6 pages
Descriptive Statistic
No ratings yet
Descriptive Statistic
37 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
93 pages
Lesson 1
No ratings yet
Lesson 1
37 pages
Basic Statistics
No ratings yet
Basic Statistics
24 pages
Descriptive Statistics Guide
No ratings yet
Descriptive Statistics Guide
16 pages
43hyrs Principles of Statistics 3
No ratings yet
43hyrs Principles of Statistics 3
56 pages
Descreptive Statistics 1
No ratings yet
Descreptive Statistics 1
74 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
34 pages
Chapt3 Overheads
No ratings yet
Chapt3 Overheads
8 pages
Previously On Statistics 1
No ratings yet
Previously On Statistics 1
48 pages
Lecture 2
No ratings yet
Lecture 2
93 pages
Numerical Descriptive Techniques (6 Hours)
No ratings yet
Numerical Descriptive Techniques (6 Hours)
89 pages
Lecture 2 - Descriptive Statistics Part II
No ratings yet
Lecture 2 - Descriptive Statistics Part II
47 pages
CH 2
No ratings yet
CH 2
49 pages
Lecture 2-Summarizing Data - HSciences Biostats - 010232en
No ratings yet
Lecture 2-Summarizing Data - HSciences Biostats - 010232en
37 pages
Descripti VE Statistics and Data Visualization: January 14, 2020
No ratings yet
Descripti VE Statistics and Data Visualization: January 14, 2020
34 pages
Math
No ratings yet
Math
6 pages
Lesson 6c, 7, 8
No ratings yet
Lesson 6c, 7, 8
46 pages
Lecture 3 Numerical Measures of Data
No ratings yet
Lecture 3 Numerical Measures of Data
36 pages
Descriptive Statistics - Measures of Central Tendency and Dispersion - PHD 2021
No ratings yet
Descriptive Statistics - Measures of Central Tendency and Dispersion - PHD 2021
31 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Intro Summary of Statistics PLTW Slide Show
No ratings yet
Intro Summary of Statistics PLTW Slide Show
47 pages
Lecture 3 Sem 1 Edited
No ratings yet
Lecture 3 Sem 1 Edited
30 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
35 pages
Unit 2 DS PDF
No ratings yet
Unit 2 DS PDF
97 pages
Summary Biometry
No ratings yet
Summary Biometry
51 pages
Statistical Organization of Scores
No ratings yet
Statistical Organization of Scores
109 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
Desc Excel
No ratings yet
Desc Excel
65 pages
2830a Lecture 3
No ratings yet
2830a Lecture 3
68 pages
Ids Unit 2 Notes Ckm-1
No ratings yet
Ids Unit 2 Notes Ckm-1
30 pages
FCMS Biostat Descriptive Statsitics-Part 2w
No ratings yet
FCMS Biostat Descriptive Statsitics-Part 2w
48 pages
Lecture 06-Describing Data Visual Information
No ratings yet
Lecture 06-Describing Data Visual Information
49 pages
Lesson 6c, 7, 8-Print
No ratings yet
Lesson 6c, 7, 8-Print
5 pages
Statistical Methods in Social Sciences
No ratings yet
Statistical Methods in Social Sciences
69 pages
CENTRAL TENDENCY MEASURES Lectures 3+4+5
No ratings yet
CENTRAL TENDENCY MEASURES Lectures 3+4+5
35 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
18 pages
Descriptive Statistics Basics
No ratings yet
Descriptive Statistics Basics
72 pages
Central Tendency - HU 2023
No ratings yet
Central Tendency - HU 2023
48 pages
Biostatistics Revision DR - NJ
No ratings yet
Biostatistics Revision DR - NJ
67 pages
Topic 2 - Descriptive - Statistics
No ratings yet
Topic 2 - Descriptive - Statistics
36 pages
Data Description
No ratings yet
Data Description
46 pages
Slideset 2
No ratings yet
Slideset 2
63 pages
2 Mean Median Mode Variance
No ratings yet
2 Mean Median Mode Variance
29 pages
Screenshot 2024-07-22 at 10.26.36 AM
No ratings yet
Screenshot 2024-07-22 at 10.26.36 AM
35 pages
Normal DistrCent Tendency Measures of Dispersion
No ratings yet
Normal DistrCent Tendency Measures of Dispersion
26 pages
Module 10 Introduction To Data and Statistics
No ratings yet
Module 10 Introduction To Data and Statistics
63 pages
R-Prog Unit-5
No ratings yet
R-Prog Unit-5
23 pages
1.) List All Possible SAMPLE of Size 2 and Their Corresponding
No ratings yet
1.) List All Possible SAMPLE of Size 2 and Their Corresponding
6 pages
AI Exam Study Guide
No ratings yet
AI Exam Study Guide
3 pages
Value at Risk - Notes
No ratings yet
Value at Risk - Notes
16 pages
Representation and Invariance of Scientific Structures 1st Edition Patrick Suppes Download PDF
100% (18)
Representation and Invariance of Scientific Structures 1st Edition Patrick Suppes Download PDF
85 pages
Probability Class 12 Ncert Solutions
No ratings yet
Probability Class 12 Ncert Solutions
60 pages
Normal 222
No ratings yet
Normal 222
12 pages
Sample Path Properties of Bifractional Brownian Motion: Research Partially Supported by The NSF Grant DMS-0404729
No ratings yet
Sample Path Properties of Bifractional Brownian Motion: Research Partially Supported by The NSF Grant DMS-0404729
27 pages
IITb Asi Course
No ratings yet
IITb Asi Course
23 pages
Hypergeometric Distribution
No ratings yet
Hypergeometric Distribution
9 pages
Final Review Handout
No ratings yet
Final Review Handout
47 pages
List of Topics For Programming Competitions
100% (1)
List of Topics For Programming Competitions
5 pages
CT4
No ratings yet
CT4
6 pages
STA301-Mid Term Solved Subjective With References
No ratings yet
STA301-Mid Term Solved Subjective With References
23 pages
STA301 Final Term Solved MCQs by JUNAID-1
No ratings yet
STA301 Final Term Solved MCQs by JUNAID-1
54 pages
Mock Exams August
No ratings yet
Mock Exams August
16 pages
Stat Quarter Exam
No ratings yet
Stat Quarter Exam
4 pages
SP 9 Birth Death Process
No ratings yet
SP 9 Birth Death Process
14 pages
Histogram of The Probability Mass Function
No ratings yet
Histogram of The Probability Mass Function
10 pages
Prob&stat Formula Sheet
No ratings yet
Prob&stat Formula Sheet
1 page
Graphic Era Deemed To Be University MBA 1st Sem
0% (1)
Graphic Era Deemed To Be University MBA 1st Sem
10 pages
ACM-2 Syllabus
No ratings yet
ACM-2 Syllabus
2 pages
4.3.1 The Kalman Filter
No ratings yet
4.3.1 The Kalman Filter
3 pages
Mediation 4
No ratings yet
Mediation 4
95 pages
Recitation 3: Network Examples: Hung-Bin (Bing) Chang and Yu-Yu Lin
No ratings yet
Recitation 3: Network Examples: Hung-Bin (Bing) Chang and Yu-Yu Lin
16 pages
Brownian Motion & Stochastic Calculus
No ratings yet
Brownian Motion & Stochastic Calculus
38 pages
Problem Set 5 - ECON1005
No ratings yet
Problem Set 5 - ECON1005
4 pages
Histogram: Interval 1 2 3 4 5 6
No ratings yet
Histogram: Interval 1 2 3 4 5 6
5 pages
Statistics and Probability Test
No ratings yet
Statistics and Probability Test
8 pages
DTL - IIT - Paper2 2024
No ratings yet
DTL - IIT - Paper2 2024
1 page

Topic1 3

Uploaded by

Topic1 3

Uploaded by

Statistics for Experimental Design

Population and Sample

Descriptive vs. Inferential Statistics

How many of people in How many of people

What is the IQ level of What was the grade

Gender IQ level Age

• Dependent variables (Y):

• Independent variables (X):

Y = Consumer preference (1-10)

§ We focus on the relationships between

§ Summarize/describe the properties of

Central tendency Variation Shape

Mean Range Skewness

Mode Standard deviation

The exact middle value

§ Not affected by extreme values

• Mean is generally used, unless extreme

• Median is often used, since the median is

-Considers only starting point and end point

-Does not show how the data is spread

• Average (approximately) of ‘squared’

§ Most commonly used measure of variation

§ Definition: number of standard deviations from mean.

§ standard score (z-score) follows the standard normal

distribution the shape of our

157 162 167 172 177 182 187 -3 -2 -1 0 +1 +2 +3

§ This transformation is useful because we can

• Let’s say µ = 600, s = 100

300 400 500 600 700 800 900

With the help of z table the area above z=2.0 is only

300 400 500 600 700 800 900

With the help of z table the area above z=2.0 is only

Male ~ N(175.1 , 4.5)

Explore on your own: What aspects of the population and sample

Normality of the Sampling Distribution of the Mean

Central Limit Theorem

When the sample size is large (i.e., > 30):

Even if the variable is not normally distributed, the

You might also like