Design and Analysis
of Experiments
Chapter 2
SIMPLE COMPARATIVE
EXPERIMENTS
Dr. Tran Thanh Hung
Department of Automation Technology,
College of Engineering, Can Tho University
Email: [email protected]
Chapter objectives
At the end of the chapter, students can
• review basic statistical concepts
• chose sample size for two sample
problems based on the t-test
• analyze results of comparative
experiments using t-test
• review the assumptions underlying the t-
test and how to test for these
assumptions
Contents
• Basic Statistical Concepts
• Simple comparative experiments
– The hypothesis testing framework
– The two-sample t-test
– Checking assumptions, validity
• Sample size determination
3
Basic Statistical Concepts
• Probability Distributions:
The probability structure of a random variable.
b
P a y b f y dy
b
P ya y j yb p yj a
j a
p y 1
all values
j f y dy 1 (2.1)
Basic Statistical Concepts
• Mean of a probability distribution is a measure of its
central tendency or location, or expected value or the
long-run average value of the random variable y:
yf y dy , y continuous
E y (2.2)
y j p y j , y discrete
all values
• Variance: The variability or dispersion of a probability
distribution
y f y dy , y continuous
2
V y 2 (2.3)
y 2 p y , y discrete
all values j j
E y
2 2
(2.4)
Basic Statistical Concepts
Suppose that y1, y2, . . . , yn represents a sample Sample size = n.
• Random sample: a sample that has been randomly
selected from the population
n
• Sample mean:
yi
y i 1 (2.7)
n
• Sample variance:
n
y y
2
i
S2 i 1
(2.8)
n 1
• Relationship between y and , S 2 and 2 ?
• Number of degrees of freedom: n 1
Basic Statistical Concepts
• Normal Distribution (Phân bố chuẩn):
y N , 2
If 0 and 1
2
Standard Normal Distribution
(Phân bố chuẩn hóa)
Simple Comparative Experiment
• Two Sample Experiment:
Example: Study the formulation of a Portland cement
mortar with 2 formulations:
- Normal (unmodified)
- Polymer latex emulsion added
What is the meaning of the
results? Which is better?
It is hard to get a sense of the
data when looking only at a table
of numbers.
Simple Comparative Experiment
Graphical view of the data:
• Dot diagram/ dot plot:
Now what you can understand about the result?
Dot diagrams work well to get a sense of the distribution.
These work especially well for very small sets of data.
Simple Comparative Experiment
• Box plot:
Now what you can understand about the result?
If you look at the box plot you get a quick snapshot of the
distribution of the data.
Box plot is useful for small or larger data sets.
The Hypothesis Testing
Framework
• Statistical hypothesis testing is a
useful framework for many experimental
situations
• Origins of the methodology date from the
early 1900s
• We will use a procedure known as the
two-sample t-test
11
The Hypothesis Testing
(Kiểm định giả thuyết)
• Sampling from a normal distribution
• Statistical hypotheses (các giả thuyết thống kê)
- Null hypothesis: H 0 : 1 2
- Alternative hypothesis: H1 : 1 2 12
The Hypothesis Testing
• Statistical hypotheses:
- Null hypothesis (giả thuyết không): H 0 : 1 2
- Alternative hypothesis (gt thay thế): H1 : 1 2
• Errors may be committed when testing
hypotheses:
- Type 1: P type I error P reject H 0 | H 0 is true
- Type 2: P type II error P fail to reject H 0 | H 0 is false
- Power of a test:
Power 1 P reject H 0 | H 0 is false
13
Estimation of Parameters
1 n
y yi estimates the population mean
n i 1
n
1
S
2
n 1 i 1
( yi y ) estimates the variance
2 2
14
Summary Statistics
Formulation 1 Formulation 2
“New recipe” “Original recipe”
y1 16.76 y2 17.04
S 0.100
1
2
S 22 0.061
S1 0.316 S 2 0.248
n1 10 n2 10
15
How the Two-Sample t-Test
Works:
Use the sample means to draw inferences about the population means
y1 y2 16.76 17.04 0.28
Difference in sample means
Standard deviation of the difference in sample means
2
y2
n
This suggests a statistic:
y1 y2
Z0
12 22
n1 n2
16
How the Two-Sample t-Test
Works:
Use S and S to estimate and
1
2 2
2
2
1
2
2
y1 y2
The previous ratio becomes
2 2
S S
1
2
n1 n2
However, we have the case where 2
1
2
2
2
Pool the individual sample variances:
( n 1) S 2
( n 1) S 2
S p2 1 1 2 2
n1 n2 2
17
How the Two-Sample t-Test Works:
The test statistic is
y1 y2
t0
1 1
Sp
n1 n2
• Values of t0 that are near zero are consistent with the
null hypothesis
• Values of t0 that are very different from zero are
consistent with the alternative hypothesis
• t0 is a “distance” measure-how far apart the averages
are expressed in standard deviation units
• Notice the interpretation of t0 as a signal-to-noise
ratio 18
The Two-Sample (Pooled) t-Test
(n1 1) S12 (n2 1) S 22 9(0.100) 9(0.061)
S
2
p 0.081
n1 n2 2 10 10 2
S p 0.284
y1 y2 16.76 17.04
t0 2.20
1 1 1 1
Sp 0.284
n1 n2 10 10
The two sample means are a little over two standard deviations apart
Is this a "large" difference?
19
The Two-Sample (Pooled) t-Test
• So far, we haven’t
t0 = -2.20
really done any
“statistics”
• We need an objective
basis for deciding how
large the test statistic
t0 really is
• In 1908, W. S. Gosset
derived the reference
distribution for t0 …
called the t distribution
In general, if , we would reject
is type I error, sometime is call level of significance. 20
The Two-Sample (Pooled) t-Test
• A value of t0 t0 = -2.20
between –2.101
and 2.101 is
consistent with
equality of means
• It is possible for the
means to be equal
and t0 to exceed
either 2.101 or –
2.101, but it would
be a “rare event” …
leads to the
conclusion that the
means are different
• Could also use the 21
The Two-Sample (Pooled) t-Test
t0 = -2.20
• The P-value is the risk of wrongly rejecting the null hypothesis of
• equal means (it measures rareness of the event).
• P-value = smallest level of significance () that would lead to
rejection of the null hypothesis.
• The P-value in our problem is P = 0.042
would be rejected at any level of significance . 22
Minitab Two-Sample t-Test Results
23
Checking Assumptions –
The Normal Probability Plot
Assumption of independence:
Both samples are random
samples that are drawn from
independent populations
- normal distribution,
- equal standard deviation or
variances
24
Importance of the t-Test
• Provides an objective framework for
simple comparative experiments
• Could be used to test all relevant
hypotheses in a two-level factorial
design, because all of these hypotheses
involve the mean response at one “side”
of the cube versus the mean response at
the opposite “side” of the cube
25
Confidence Intervals
(khoảng tin cậy)
• Hypothesis testing gives an objective
statement concerning the difference in
means, but it doesn’t specify “how different”
they are
• General form of a confidence interval
L U where P( L U ) 1
• The 100(1- α)% confidence interval on the
difference in two means:
y1 y2 t / 2, n n 2 S p (1/ n1 ) (1/ n2 ) 1 2
1 2
y1 y2 t / 2, n1 n2 2 S p (1/ n1 ) (1/ n2 )
26
Confidence Intervals
(khoảng tin cậy)
y1 y2 t / 2, n1 n2 2 S p (1/ n1 ) (1/ n2 ) 1 2
y1 y2 t / 2, n1 n2 2 S p (1/ n1 ) (1/ n2 )
Choice of Sample Size
• The length of the confidence interval
t /2, n1 n2 2 S p (1/ n1 ) (1/ n2 )
t /2,2 n 2 S p 2 / n ,if n1 n2 n
What If There Are More Than
Two Factor Levels?
• The t-test does not directly apply
• There are lots of practical situations where there are
either more than two levels of interest, or there are
several factors of simultaneous interest
• The analysis of variance (ANOVA) is the appropriate
analysis “engine” for these types of experiments –
Chapter 3
• The ANOVA was developed by Fisher in the early
1920s, and initially applied to agricultural experiments
• Used extensively today for industrial experiments
29
Thực hành chương 2
• Bài 1: Dùng Minitab phân tích kết quả thí
nghiệm 2 loại hồ trong ví dụ chương 2.
• Bài 2: Dùng cùng một loại giấy, xếp 2
loại máy bay khác nhau. Phóng thử mỗi
loại 10 lần, ghi lại độ xa mỗi lần phóng.
Dùng Minitab phân tích kết quả. Rút ra
kết luận.