Statistics for Chemical Engineers
Table of Contents:
1. Chapter 1: Introduction to Statistics in Chemical Engineering
– Role of statistics in chemical engineering; types of data; levels of
measurement; importance of data accuracy in engineering
contexts; overview of course outcomes.
2. Chapter 2: Data Visualization and Descriptive Statistics –
Graphical representation of data (histograms, box plots, scatter
plots); frequency distributions; measures of central tendency and
variability; summary statistics for chemical process data; case
study on industrial data visualization.
3. Chapter 3: Probability Fundamentals and Distributions –
Basic probability concepts; random variables; common
distributions (normal, binomial, etc.) in chemical engineering; the
normal distribution and standard scores; applications to process
data and quality control.
4. Chapter 4: Measurement Accuracy and Data Quality –
Accuracy vs. precision; reliability of measurements; sources of
error in experiments; calibration of instruments; linking
measurement with statistics to analyze measurement error;
ensuring data quality in lab and industry.
5. Chapter 5: Sampling and Confidence Intervals – Populations
and samples; random sampling techniques; Central Limit Theorem
and sampling distributions; point and interval estimation;
constructing and interpreting confidence intervals for means and
proportions.
6. Chapter 6: Hypothesis Testing and Statistical Inference –
Formulating null and alternative hypotheses; significance levels
and p-values; type I and II errors; power of a test; one-sample z
and t-tests; making conclusions from data with statistical rigor.
7. Chapter 7: Comparing Two or More Groups – Paired and two-
sample t-tests; experimental design principles for comparative
studies; one-way ANOVA for multi-level factor comparison;
assumptions (normality, equal variance); industrial example of
comparing process means.
8. Chapter 8: Correlation and Regression Analysis – Scatter plots
and correlation coefficients; linear regression modeling; least
squares method; interpreting regression output (slope, R², p-
values); multiple linear regression basics; case studies predicting
chemical process outcomes.
9. Chapter 9: Design of Experiments (DOE) – Fundamentals –
Planning experiments (factorial design concepts); factors, levels,
and responses; full factorial designs at two levels; randomized run
order and replication; analysis of variance (ANOVA) in DOE;
main effects and interaction effects; example of a 2×2 factorial
experiment in a pilot plant.
10. Chapter 10: Advanced Experimental Design – RSM and
Optimization – Response Surface Methodology (RSM) for
process optimization; Central Composite and Box-Behnken
designs; optimization of multiple factors; use of Design-Expert
software for RSM; 3D surface plots and contour plots for response
visualization; industrial case optimizing a chemical reaction yield.
11. Chapter 11: Big Data Analytics in Chemical Engineering
– Introduction to big data (volume, variety, velocity); data sources
in chemical plants (sensors, IoT, labs); data mining and machine
learning applications; process data analysis at scale; case study of
big data improving efficiency; global trends and Nigerian industry
perspectives on big data adoption.
12. Chapter 12: Cloud Computing and Industry 4.0
Applications – Cloud computing basics and benefits for
engineering; IIoT (Industrial Internet of Things) and cloud
integration; collaborative cloud-based tools in chemical
engineering (remote monitoring, virtual labs); cybersecurity and
data management considerations; example of cloud-based
monitoring in a Nigerian oilfield.
13. Chapter 13: Communicating Statistical Results –
Effective communication of data and conclusions; visual
presentation best practices; technical report writing in engineering;
interpreting and reporting statistical findings clearly; global
standards and Nigerian context in technical communication;
chapter summary with tips for clear writing.
Chapter 1: Introduction to Statistics in Chemical Engineering
Learning Objectives: By the end of this chapter, students should be
able to: (1) explain why statistical methods are essential in chemical
engineering practice; (2) identify different types of data and levels of
measurement; (3) understand the concepts of accuracy and reliability of
measurements in engineering experiments; and (4) outline the scope of
statistical techniques covered in this course.
Statistics plays a critical role in chemical engineering by enabling
engineers to analyze data, draw evidence-based conclusions, and make
informed decisions in the face of variability. Chemical processes often
involve complex systems with inherent fluctuations (e.g. temperature,
pressure, feed composition), and statistical analysis helps in
understanding these variations to ensure optimal and safe operation.
Why study statistics as a chemical engineer? In practice, engineers
use statistics to design experiments for product development, monitor
process quality, develop empirical models of processes, and assess the
uncertainty in measurements and results. For example, when scaling up
a reactor from laboratory to plant, a chemical engineer must use
statistical data analysis to ensure that the scale-up does not introduce
unanticipated variability in yield or safety parameters.
Types of Data and Measurement Levels: Statistical analysis begins
with understanding the nature of data. Data can be broadly classified as
qualitative (categorical) or quantitative (numerical). In a chemical
engineering context, qualitative data might include categories like
"pass/fail" for a quality inspection or catalyst type A vs. B, whereas
quantitative data include measurements such as temperature (°C),
pressure (bar), concentration (mol/L), or pH. Quantitative data further
subdivide into discrete (countable values like number of defective
bottles) and continuous (measurable on a continuum, like reaction time
or density). It is also important to recognize the level of measurement
of data: nominal (categories without order, e.g. reactor ID), ordinal
(ranked categories, e.g. catalyst activity rated as high/medium/low),
interval (numeric scales without a true zero, e.g. temperature in Celsius
where 0 is arbitrary), and ratio (numeric scales with a meaningful zero,
e.g. absolute pressure or volume). Understanding data types and
measurement levels guides the choice of statistical methods and
graphical representations for analysis.
Role of Statistics in the Engineering Method: Chemical engineering
problems are typically approached via the engineering method: define
the problem, collect data, analyze and interpret data, and make decisions
or design solutions. Statistics is indispensable in the data collection and
analysis stages. For instance, when developing a new polymer, an
engineer will design experiments (using statistical DOE principles) to
test the effect of factors like temperature, catalyst, and time on polymer
properties. The collected data must be analyzed statistically to determine
which factors are significant and what the optimal conditions are, all
while accounting for experimental error. Statistics provides tools for
distinguishing real effects from random noise.
Accuracy and Reliability of Measurements: A foundational concept
for chemical engineers is appreciating how accurate and reliable their
measurements are. Measurements form the basis of all data analysis – if
measurements are flawed, conclusions will be too. Accuracy refers to
how close a measurement is to the true value, while precision refers to
the consistency of repeated measurements. In a chemical engineering lab
or plant, many sources of error can affect measurements: instrument
calibration errors, environmental fluctuations, or operator errors. For
example, consider measuring the flow rate of a liquid through a pipe – if
the flow meter is not calibrated correctly, it might consistently read 5%
high (a systematic error affecting accuracy). On the other hand, if the
flow fluctuates and the meter responds slowly, repeated readings might
scatter (affecting precision). In later chapters, we will learn statistical
tools to quantify measurement variation and assess data reliability
(Chapter 4 will delve deeper into measurement accuracy analysis).
Course Scope and Outcomes: This textbook aligns with the TCH 206
course outcomes for Statistics for Chemical Engineers. Broadly, the
course will cover statistical techniques used in data analysis, linking
measurements with statistical analysis to evaluate accuracy, as well as
classical inference methods and modern topics. Students will learn how
to construct effective data visualizations and summaries (Chapter 2),
perform statistical inference such as confidence intervals and hypothesis
tests (Chapters 5–6), apply regression and correlation to model
relationships (Chapter 8), and design experiments for process
optimization (Chapters 9–10). In addition, contemporary topics like big
data analytics and cloud computing in the chemical industry will be
introduced (Chapters 11–12) to highlight the growing importance of data
science skills for engineers in the era of Industry 4.0. Students will also
gain experience with statistical software (MINITAB for general data
analysis and Design-Expert for experimental design) to perform data
analysis tasks efficiently. Ultimately, the goal is to enable future
chemical engineers to confidently analyze data and draw valid
conclusions – whether it’s understanding if a new process increases
yield significantly, or communicating the results of a pilot plant trial to
management with solid statistical backing.
Chapter Summary: In summary, statistics provides chemical engineers
with a powerful toolkit to deal with variability in processes and
experiments. Key takeaways from this introductory chapter are the
importance of statistical thinking in engineering, familiarity with data
types (qualitative vs. quantitative) and measurement levels, and an
appreciation for the accuracy and precision of data. These concepts lay
the groundwork for all subsequent topics. In the next chapter, we will
begin our deep dive by looking at how to describe and visualize data –
the first step in any statistical analysis.
End-of-Chapter Exercises:
1. Conceptual question: List three examples of problems in
chemical engineering where statistical analysis would be helpful.
For each, briefly describe why variability is a concern (e.g.
monitoring a production process for quality control, comparing
performance of two catalyst formulations, etc.).
2. Data types: Identify the type of data for each of the following and
its level of measurement: (a) pH of a solution, (b) Batch ID
number, (c) “Pass/Fail” result of a pressure test, (d) catalyst
concentration in % by weight, (e) ranking of chemical reactor
safety as high/medium/low.
3. Accuracy vs. precision: Suppose a temperature sensor in a reactor
is known to have a calibration offset of +2°C (reads 2°C higher
than actual). You take five readings in a steady-state system and
get 100.5, 100.7, 100.6, 100.8, 100.5 °C. Describe the accuracy
and precision of this sensor based on these readings. How would
you correct for the offset?
4. Case study reflection: A Nigerian chemical plant manager says,
“We’ve operated this process for 20 years; I can control it by
experience without statistics.” How would you respond in defense
of using statistical methods? Mention at least two statistical tools
that could improve process understanding or efficiency even for an
experienced engineer.
5. Software exploration (ungraded): If you have access to a
statistics software like MINITAB, load any sample dataset (or
create a small dataset of 10–20 points, e.g. measurements of a
quantity). Use the software to generate basic descriptive statistics
(mean, standard deviation) and a simple plot. This will give you
familiarity with the interface, which will be built upon in later
chapters.
Chapter 2: Data Visualization and Descriptive Statistics
Learning Objectives: By the end of this chapter, students will be able
to: (1) organize raw data into meaningful frequency distributions; (2)
construct appropriate graphical displays of data (such as histograms, bar
charts, box plots, and scatter plots) and understand their role in data
analysis; (3) calculate and interpret descriptive statistics (mean, median,
mode, range, variance, standard deviation) for a dataset; and (4)
summarize and interpret basic industry data to extract key insights.
Once data are collected in a chemical engineering context, the first step
is often to summarize and visualize them. Data visualization is critical
because it provides insight at a glance, revealing patterns, trends, or
anomalies that may not be obvious from raw data. As one learning
outcome of this course emphasizes, chemical engineers must be able to
construct graphical displays and recognize their importance in analysis.
This chapter covers common methods of organizing data and the core
plots and statistics used to describe data sets.
2.1 Organizing Data: Frequency Distributions and Histograms
A frequency distribution is a table that displays how data are
distributed across various intervals or categories. For example, imagine
an engineer measures the viscosity of 50 polymer samples in a quality
lab. Listing all 50 values provides little insight, but grouping them (e.g.
count how many samples fall into viscosity ranges 50–60 Pa·s, 60–70
Pa·s, etc.) creates a clearer picture of the data’s spread. From frequency
distributions, we often create histograms: bar charts showing frequency
(or relative frequency) on the y-axis against the data value intervals on
the x-axis.
Example: Suppose a water treatment plant in Nigeria measures the
turbidity (cloudiness) of water (in NTU units) in 100 batches. The data
range from 0 to 5 NTU. We could divide 0–5 NTU into 10 equal bins of
width 0.5 NTU and count how many batches fall in each bin. Plotting
these counts yields a histogram that might show, for instance, that most
batches have turbidity between 1.0–2.0 NTU, with a few batches having
higher turbidity (potential outliers). Such visualization immediately flags
whether the majority of water batches meet a turbidity target (say ≤ 1.5
NTU) or if there is a long tail of higher turbidity needing investigation.
Interpreting Histograms: A histogram gives a sense of the data’s
distribution shape – whether it is symmetric, skewed to the left or right,
unimodal or multimodal. Chemical process data often approximate a
normal distribution (bell curve) if only common random variation is
present. However, processes can also produce skewed distributions (e.g.
if a measurement has a natural lower bound at 0 but can vary high, the
data might be right-skewed). Outliers, if present, will appear as isolated
bars far from the bulk of the data. Recognizing these patterns is
important. For instance, a right-skewed distribution of impurity
concentration might suggest occasional high spikes due to disturbances,
whereas a bimodal distribution of product purity might indicate two
distinct operating modes or shifts between catalysts.
Constructing Histograms (Manual vs. Software): Manually, one
chooses bin ranges and counts frequencies. In practice, statistical
software like MINITAB greatly simplifies this – you can input data and
use menu options to create histograms instantly. The choice of bin width
can affect the appearance (too few bins may hide detail; too many may
show noise). MINITAB’s default algorithms often choose a reasonable
bin size, but an engineer should adjust it if needed to better reveal the
underlying pattern.
Software Tutorial – Creating a Histogram in MINITAB: To
reinforce the concept, let's walk through using MINITAB to create a
histogram. Suppose we have the 50 polymer viscosity measurements
mentioned earlier. In MINITAB:
Enter or import the viscosity data into one column (e.g., C1) in the
worksheet.
Go to the Graph menu and choose Histogram > Simple.
Select the viscosity data column for the graph and click OK.
MINITAB will output a histogram of the viscosity data. The
software will automatically provide a frequency count for each bin
and display the distribution shape. Always label the axes clearly
(e.g. "Viscosity (Pa·s)" on x-axis, "Frequency of Samples" on y-
axis) – clear labeling is essential for good statistical
communication.
Figure 2.1: MINITAB Graphical Summary output for an industrial
dataset (cap removal torque for shampoo bottles). The histogram
(center) shows the distribution of torque measurements, and summary
statistics are displayed on the side. In this case, 68 bottle caps were
tested; the mean removal torque is about 21.27 Nm, which is higher
than the target of 18 Nm (indicated by the reference line). The wide
spread and the 95% confidence interval (19.71 to 22.82 Nm) suggest
variability and a systematic bias above the target.
The embedded Figure 2.1 illustrates how a graphical summary in
MINITAB can combine both visualization and statistics. This example
(based on a case of shampoo bottle cap torque) demonstrates an
engineer’s initial analysis of data: the histogram reveals many bottles
require torque well above the target 18 Nm to open, and the statistics
confirm the mean is significantly above 18 Nm. Such a display
immediately tells us there’s a potential quality issue – caps are on too
tight on average, which might make them hard for customers to open.
We will revisit the statistical implications (e.g. performing a one-sample
t-test against the target in Chapter 6), but even at the descriptive stage,
the visualization and summary inform engineering decisions (perhaps
the capping machine needs adjustment to lower the torque).
2.2 Other Common Plots: Box Plots, Scatter Plots, and Time-Series
Plots
Besides histograms, other plots are extremely useful in chemical
engineering data analysis:
Box Plot (Box-and-Whisker Plot): A compact display of
distribution based on quartiles. A box plot shows the median,
interquartile range (IQR), and potential outliers. It’s especially
useful for comparing distributions between multiple groups. For
example, an engineer might compare the distribution of polymer
viscosity for three different catalyst formulations using side-by-
side box plots. If one formulation has a much wider IQR or many
outliers, it indicates higher variability or inconsistent quality. Box
plots are great for quickly conveying differences in medians and
variability across categories.
Scatter Plot: A scatter plot displays pairs of data points (X,Y) and
is fundamental for investigating relationships between two
variables. In chemical engineering, scatter plots help identify
correlations (Chapter 8 will delve deeper into correlation and
regression). For instance, plotting reactor temperature (X) vs.
conversion percentage (Y) across several runs might show an
upward trend (suggesting higher temperature yields higher
conversion) or no clear pattern (indicating temperature might not
significantly affect conversion within the tested range). Scatter
plots often include a fitted line to indicate the trend (this enters
regression territory). We preview here that if a scatter plot shows a
roughly linear trend with points tightly clustered about a line, the
correlation is strong (positive or negative slope indicates
direction). If points are widely scattered without form, correlation
is weak or nonexistent.
Time-Series Plot (Run Chart): When data are collected over time
(e.g. hourly temperature readings from a distillation column),
plotting the value against time order reveals trends, cycles, or
drifts. A time-series plot could uncover, for example, a periodic
oscillation in reactor pressure (perhaps correlating with a daily
ambient temperature cycle) or a drift upward in impurity
concentration over a campaign (maybe due to catalyst aging). Such
plots are often the first step in process monitoring and control (and
relate to the field of statistical process control, not covered in depth
here but important in quality assurance).
Case Study (Data Visualization in Action): A global chemical company,
say Dow Chemical, might monitor a polymerization reactor. Key
variables like temperature, pressure, and monomer conversion are
recorded every minute. Engineers use time-series plots to ensure the
reactor operates steadily. In one instance, a time plot of pressure showed
an upward drift over several hours. By visualizing this, engineers
identified a fouling issue in a vent line before it reached a critical
pressure. They intervened early, avoiding a shutdown. Additionally,
scatter plots of historical data might be used – plotting polymer
molecular weight vs. reaction time revealed two clusters corresponding
to two catalyst batches, indicating a catalyst quality issue. This mix of
histograms, scatter plots, and time plots allowed engineers to diagnose
and communicate issues effectively to the team.
2.3 Descriptive Statistics: Measures of Central Tendency and
Variability
While plots give a visual summary, descriptive statistics provide
numeric summaries of the data. The most common measures are:
Mean (Arithmetic Average): Sum of all data values divided by
the count. The mean provides a measure of central tendency – e.g.
the average yield of a chemical process over 10 runs might be
85%. It is sensitive to outliers; one extremely low or high value
can skew the mean.
Median: The middle value when data are sorted (or average of two
middle values for even count). The median is a robust measure of
central tendency, less affected by outliers. For skewed
distributions, the median can be more representative than the
mean. For example, if measuring the time to failure of pump seals
where most last around 12 months but one failed at 1 month, the
mean might be much lower than the median. The median conveys
the “typical” value even in skewed scenarios.
Mode: The most frequently occurring value or range. In a
continuous data set it may not be very meaningful unless the
distribution has a clear peak or repeated readings. However, for
categorical data (e.g. most common defect type in a batch of
products), the mode is useful.
Range: The difference between the maximum and minimum. It
gives the total spread. However, range uses only extreme values
and thus can be misleading if those extremes are outliers.
Variance and Standard Deviation: Variance is the average of
squared deviations from the mean. Standard deviation (SD) is the
square root of variance, bringing it back to the original unit. The
SD is perhaps the most important measure of variability – it tells
us, on average, how far data points lie from the mean. A small SD
relative to the mean implies data are tightly clustered (high
consistency), whereas a large SD indicates widespread data (high
variability). For example, two reactors might have the same
average conversion of 80%, but if Reactor A has SD of 2% and
Reactor B has SD of 8%, Reactor A’s performance is much more
consistent (predictable) than Reactor B’s.
Calculation Example: Ten samples of a specialty chemical are titrated to
determine purity (%). The results are: 91, 89, 90, 95, 88, 90, 91, 87, 92,
89 (% purity). We can compute descriptive stats:
Mean = (91+89+...+89) / 10 = 90.2% purity.
Sorted values: 87, 88, 89, 89, 90, 90, 91, 91, 92, 95; Median = (90
+ 90)/2 = 90%.
Mode = 89 and 90 (both appear twice – a dataset can be bimodal).
Range = 95 – 87 = 8% purity.
Variance (sample variance) = sum of squared deviations from
mean / (n–1). Without showing full calculation here, suppose we
compute variance ≈ 6.84 (percent²), then SD ≈ √6.84 ≈ 2.62%.
Interpretation: the purity tends to vary about ±2.62% around the
mean of 90.2%. So a typical batch is between ~87.6% and 92.8%
purity (within one SD of mean) assuming roughly normal
variability.
Interpreting Variability in Context: In chemical engineering,
understanding variability is crucial. Consider if the purity spec minimum
is 90%. In the above example, the mean is slightly above 90%, but the
SD of ~2.6% indicates a substantial chance of falling below spec on any
given batch (in fact, 3 of 10 samples were below 90%). Merely reporting
the mean would be misleading; the variability tells the full story that the
process may not be reliably hitting the purity target. Later chapters on
hypothesis testing will formalize how we can assess if the process mean
truly differs from 90% and what proportion might fall out of spec, but
the descriptive stats already alert the engineer to an issue. Perhaps
process adjustments or tighter control are needed to reduce variability.
Skewness and Kurtosis (Briefly): Some descriptive measures go
beyond spread and center to characterize distribution shape. Skewness
indicates asymmetry (right-skew vs left-skew); kurtosis indicates how
heavy-tailed or peaked a distribution is relative to normal. These are
more advanced descriptors and are often included in software outputs
(MINITAB’s graphical summary, for example, lists them). Typically,
for process monitoring, an engineer might note if skewness is
significantly non-zero, as it might violate assumptions of normality for
certain statistical models. In our discussions, we will primarily focus on
practical implications (e.g. presence of skewness might suggest a data
transformation is needed for regression).
2.4 Using MINITAB for Descriptive Statistics
MINITAB can quickly compute all the above statistics and more. Using
the purity data example:
Enter the 10 purity values in a column.
Use Stat > Basic Statistics > Display Descriptive Statistics. Select
the column and MINITAB will output count, mean, median,
minimum, maximum, SD, etc.
Alternatively, use Assistant > Graphical Analysis > Graphical
Summary to get a combination of stats and plots (as shown in
Figure 2.1 for the torque case).
Interpreting MINITAB output is straightforward because it labels each
statistic. One valuable piece in the output is the 95% confidence interval
for the mean (which foreshadows Chapter 5 content). In the torque
example in Figure 2.1, the 95% CI for mean torque was roughly 19.71 to
22.82 Nm. This interval provides a range in which we are 95% confident
the true mean lies, giving a sense of estimation precision. We will
formally cover confidence intervals soon, but it’s worth noting how
descriptive and inferential stats blur together in output – a graphical
summary gives a sneak peek of inferential thinking (estimating the true
mean from sample data).
Case Study – Visualizing Nigerian Industry Data: Let’s consider a
Nigerian context. A paint manufacturing company in Lagos collects data
on the viscosity of paint batches (a critical quality parameter). They
compile a month’s data of daily measurements. Using descriptive tools:
a histogram might reveal the viscosity is mostly around target but
occasionally spikes high (potentially due to pigment aggregation on
some days). A time plot might show a pattern that viscosity tends to
increase towards the end of production runs (perhaps as solvent
evaporates). A box plot comparing batches from two different
production lines might show one line has consistently higher variability.
By presenting these plots and stats at a quality meeting, the engineers
communicate clearly where the process stands. Management can easily
see from a box plot which line is more consistent, or from a histogram
whether most products meet the spec. In sum, effective visualization and
descriptive statistics turn raw numbers into actionable understanding.
Chapter Summary: In this chapter, we learned how to condense and
visualize data for insight. Good graphs and summaries are the
foundation of data analysis – often allowing engineers to spot issues and
formulate hypotheses even before formal tests are applied. Key points
include the use of histograms for distribution shape, box plots for
comparing groups, scatter plots for relationships, and numeric
descriptors like mean and standard deviation to quantify the data’s
center and spread. In the next chapter, we will build on this by
introducing probability theory and theoretical distributions, which
underpin the interpretation of our descriptive measures (for example,
understanding why data often follow a bell-curve and how we quantify
“rare” outlier events statistically).
End-of-Chapter Exercises:
1. Interpreting a Histogram: You are given the following frequency
distribution of impurity levels (%) in 50 batches of a chemical
product:
Impurity 0–1%: 5 batches; 1–2%: 20 batches; 2–3%: 15 batches;
3–4%: 8 batches; 4–5%: 2 batches.
(a) Sketch the histogram for this data. (b) Describe the shape (is it
skewed?). (c) If the specification limit for impurity is 3%,
approximately what fraction of batches are out of spec?
2. Descriptive Calculations: A catalyst testing lab measures catalyst
surface area (m²/g) for 8 samples: 120, 135, 130, 128, 140, 150,
132, 128. Calculate the mean, median, mode, range, and standard
deviation (you may do this by hand or use a calculator/software).
Comment on any difference between mean and median and what
that implies about skewness.
3. Plot Selection: For each scenario, state which type of plot is most
appropriate and why: (a) Comparing the distribution of octane
ratings from two different refineries; (b) Checking if there is a
relationship between ambient temperature and hourly production
rate in an outdoor processing unit; (c) Determining if a
measurement instrument’s readings drift over a 24-hour continuous
operation.
4. Using Software: Input the catalyst surface area data from exercise
2 into MINITAB or another tool. Generate a box plot and
descriptive statistics. Does the software output match your manual
calculations? Include a printout or description of the output (mean,
SD, etc.).
5. Case Study Discussion: A report on a Nigerian brewery’s
operations includes a statement: “The average fill volume of beer
bottles is 600 ml with a standard deviation of 5 ml.” Explain in
simple terms to a brewery manager what this means. If the legal
requirement is at least 590 ml in each bottle, what concerns might
you have based on those statistics (assume a roughly normal
distribution of fill volumes)?
Chapter 3: Probability Fundamentals and Distributions
Learning Objectives: After completing this chapter, students should be
able to: (1) explain basic probability concepts (outcomes, events,
probability axioms) and how they relate to engineering experiments; (2)
distinguish between discrete and continuous random variables and give
examples of each in chemical processes; (3) describe important
probability distributions (especially the normal distribution) and their
properties; and (4) use the normal distribution to calculate probabilities
(e.g. z-scores) relevant to chemical engineering scenarios.
Statistics rests on the foundation of probability theory. To make
inferences about data, we need to understand how data behave in a
random sense. For instance, if we say “there’s only a 5% chance that the
difference in catalyst performance happened by random chance,” we are
invoking probability concepts. This chapter introduces the fundamentals
of probability and common probability distributions that model real-
world phenomena in chemical engineering. Mastering these concepts is
essential for later chapters on statistical inference (confidence intervals,
hypothesis tests) where we quantify uncertainty.
3.1 Basic Probability Concepts
Experiments and Sample Space: In probability theory, an experiment
is any process that yields an outcome that cannot be predicted with
certainty. In an engineering context, this could be something like
performing a lab titration (outcome = measured concentration) or
running a batch process (outcome = whether it succeeds or fails to meet
quality). The sample space (Ω) is the set of all possible outcomes. For a
simple example, if we flip a coin to decide something in the lab, Ω =
{Heads, Tails}. If we run a distillation and consider “product is on-spec
or off-spec” as outcomes, Ω = {On-spec, Off-spec}.
Events and Probability: An event is a subset of the sample space –
something that might happen or a condition of outcomes. Probability (P)
is a numerical measure of how likely an event is, on a scale from 0
(impossible) to 1 (certain). Key properties include:
P(Ω) = 1 (the probability that something in the sample space occurs
is 1, i.e., some outcome must happen),
If A and B are mutually exclusive events, P(A ∪ B) = P(A) + P(B)
(addition rule for disjoint events),
P(complement of A) = 1 – P(A).
In chemical engineering, probabilities might be based on long-run
frequencies or subjective assessments. For instance, if historically 2 out
of 100 batches are off-spec, one could estimate P(off-spec) ≈ 0.02 (a
relative frequency interpretation). Or an engineer might say “there’s a
90% chance the new design will pass safety tests” as a subjective
probability based on experience.
Conditional Probability and Independence: Often we are interested in
the probability of an event given some condition. For example,
P(product passes quality | new raw material supplier) – the probability
product is good given a specific supplier’s material was used.
Conditional probability is defined as P(A|B) = P(A ∩ B) / P(B),
provided P(B) > 0. Two events A and B are independent if P(A|B) =
P(A) (meaning B occurring has no effect on probability of A). In
engineering terms, independence could be something like assuming that
the probability of Pump A failing is independent of Pump B failing if
they operate separately (in reality, some events may not be strictly
independent due to common causes, but independence is a useful
modeling assumption).
Probability in Action (Chemical Engineering Example): Consider a
safety system with two independent pressure relief valves on a reactor.
Let Event A = "Valve 1 fails when needed", Event B = "Valve 2 fails
when needed". Suppose P(A) = 0.01, P(B) = 0.01 based on historical
reliability data. If valves act independently, the probability both fail (and
thus a dangerous overpressure occurs) is P(A ∩ B) = P(A) * P(B) =
0.0001 or 0.01%. Understanding such small probabilities is crucial for
risk assessment. If the events were not independent (maybe they fail due
to a common cause like a power loss), the calculation would differ.
3.2 Random Variables: Discrete and Continuous
A random variable (RV) is a numerical outcome of an experiment. We
denote random variables with uppercase letters (X, Y). For example, let
X = number of defective bottles in a batch of 100 – X is random because
each batch will have a different number of defectives.
Discrete Random Variables: These take on a countable number
of distinct values (often integers). The example X (number of
defects) is discrete, range 0 to 100. Probability is characterized by
a probability mass function (pmf) P(X = x) for each possible x. A
common discrete distribution in engineering is the Binomial
distribution: e.g., X ~ Bin(n=100, p=0.05) could model defects if
each bottle has a 5% chance of being defective independently.
Another is the Poisson distribution, often used for counts of
events in continuous time/space (e.g., number of pump failures in a
year). If pump failures occur randomly with an average rate λ = 3
per year, then X ~ Poisson(3) might model the count of failures per
year.
Continuous Random Variables: These can take any value in a
range or interval. Examples: temperature in a reactor, the
concentration of a chemical, time until a component fails. Since
continuous outcomes have an uncountable range, we talk about
probability density function (pdf) f(x) such that the probability X
lies in an interval (a to b) is the area under the pdf curve from a to
b. We don’t assign probabilities to exact values (P(X = x) = 0 for
continuous RVs). Instead we find P(a < X < b). Many continuous
variables in engineering are modeled by the Normal distribution
(which we discuss soon). Others include Uniform, Exponential
(e.g., time between rare events), etc.
3.3 The Normal Distribution and Standard Scores (z-values)
The Normal distribution – often called the Gaussian or “bell curve” –
is arguably the most important distribution in statistics. It is continuous,
symmetric, and characterized by two parameters: the mean (μ) and
variance (σ²) or standard deviation (σ). We denote it as X ~ N(μ, σ²).
The pdf of a normal is the familiar bell-shaped curve centered at μ.
Why is the normal distribution so prevalent? Thanks to the Central
Limit Theorem (CLT) (which we will formally meet in Chapter 5), the
sum or average of many independent small effects tends to be
approximately normally distributed, regardless of the distribution of
each effect. Many measurement errors and process variations are
aggregates of many small random influences (noise), so they end up
approximately normal. For example, the error in a flow meter reading
might come from sensor noise, electronic noise, slight pressure
fluctuations, etc. – sum of many tiny independent errors – yielding a
roughly normal error distribution. Similarly, properties like molecular
weight of a polymer might be roughly normal distributed in a stable
process (centered at some typical value with symmetric variability).
Key properties of the Normal: It’s symmetric about the mean μ.
Approximately 68% of values lie within ±1σ of μ, ~95% within ±2σ, and
~99.7% within ±3σ (this is the “68-95-99.7 rule”). This rule of thumb is
incredibly useful: if an engineer sees that ±3σ range on a control chart
covers all normal operation data, any point outside that range is
extremely unlikely under normal conditions (less than 0.3% chance) –
indicating something unusual has happened.
Standard Normal and z-scores: Any normal random variable X ~ N(μ,
σ²) can be transformed to a standard normal (mean 0, SD 1) by
computing a z-score:
z=X−μσ.z = \frac{X - \mu}{\sigma}.
The z value tells us how many standard deviations X is above (+z) or
below (−z) its mean. Standard normal tables (or software) give
probabilities for Z (the standard normal variable). For example, P(Z <
1.645) ≈ 0.95, meaning 95% of the area under the standard normal curve
lies below 1.645. In context, if a quality measurement is normally
distributed and one wants the 95th percentile threshold, it’s roughly μ +
1.645σ.
Using z for Probabilities: Suppose reactor temperature is normally
distributed with μ = 500 K, σ = 5 K (assuming stable control). What is
the probability that a randomly selected time the temperature is above
510 K? We convert 510 to z: (510–500)/5 = 2.0. P(X > 510) = P(Z >
2.0). From standard normal knowledge, P(Z > 2.0) ≈ 0.0228 (2.28%). So
about 2.3% of the time, temperature exceeds 510 K. This could be
acceptable or not depending on safety limits. Engineers often calculate
such probabilities to assess risk or the likelihood of extreme events.
Conversely, to find a threshold (say the temperature that is exceeded
only 1% of time), find z for 99th percentile (~2.33), then threshold = μ +
2.33σ = 500 + 2.33*5 ≈ 511.65 K.
Non-Normal Distributions: While normal is common, many
engineering variables are not normal. For instance, exponential
distribution might model the time between rare events (memoryless
property), Weibull distribution is often used for failure times in
reliability (with shapes that can model increasing or decreasing failure
rate), uniform distribution might describe a situation of complete
randomness between bounds, etc. The Binomial and Poisson we
discussed are discrete analogs for counts. We mention these because
when data deviate strongly from normal (e.g. highly skewed
distributions like reaction time that can’t go below 0 and have a long
tail), using the appropriate theoretical distribution leads to more accurate
probability calculations and more valid statistical inferences.
For example, in a chemical plant, the distribution of “time to the next
emergency shutdown” might be better modeled by an exponential or
Weibull rather than normal, because it’s a waiting time for a random
event. Or the distribution of impurity particle counts in a semiconductor
process might be Poisson (counts per wafer area). Recognizing the
context helps choose a model.
However, the normal distribution is so central that many statistical
methods assume normality of data or at least of sample means (justified
by CLT). Thus, engineers often apply transformations to data to induce
normality (like taking log of a right-skewed variable can make it more
symmetric and closer to normal).
Standard Scores in Practice: Z-scores are also used to standardize
measurements on different scales. For example, if one variable is
temperature (mean 500, SD 5) and another is pressure (mean 20 bar, SD
2 bar), we can compare a particular temperature and pressure reading in
terms of “how extreme” it is by z. A temperature of 510 K is z=2 (as
above), a pressure of 24 bar is (24-20)/2 = 2 as well. So both are +2σ
events in their own domains. Standardization is the basis of control
charts and multivariate analysis.
Example Problem: If the daily production volume of a fertilizer plant is
roughly normally distributed with mean 1000 tons and SD 50 tons
(based on historical data), what is the probability that tomorrow’s
production will exceed 1100 tons? Solution: z = (1100-1000)/50 = 2.0,
P(Z > 2.0) ≈ 0.0228 (2.3%). So there’s about a 1 in 44 chance of
exceeding 1100 tons. This could help in planning – e.g., ensuring storage
for such a high-production day is available or that extra feedstock might
be needed if such an event is not negligible.
3.4 From Probability to Statistics: Linking to Data Analysis
Understanding distributions allows engineers to make predictions and to
apply statistical tests properly. For instance, in Chapter 6, when we do
hypothesis tests, we will assume test statistics follow certain
distributions (t, F, χ² – which are related to the normal distribution).
Knowing the normal distribution’s properties justifies why we use z or t
tables.
Additionally, probability theory underlies simulation. If a chemical
engineer wants to simulate a process (e.g. Monte Carlo simulation of an
oil reservoir output under uncertain parameters), they will draw random
samples from assumed distributions of inputs (like permeability might
be normally distributed, porosity might follow beta distribution, etc.) to
propagate uncertainty. This chapter provides the groundwork for such
advanced applications.
Case Study (Probability in Quality Control): A Nigerian beverages
company monitors the fill volume of soft drink bottles. Historically, fills
are normally distributed. The company sets control limits at μ ± 3σ. If
the process is centered at 500 ml with σ = 5 ml, control limits are 485 ml
and 515 ml. Probability of a bottle being below 485 ml (underfilled) is
~0.15% if the process is in control (3σ event) – very low. However, if
they start seeing 1% of bottles underfilled, that's a red flag statistically
(since 1% >> 0.15% expected). Probability calculations help quantify
these intuitions. Another example: if regulations demand that at most 1
in 1000 bottles is underfilled below 490 ml, the company can calculate
needed mean and σ to achieve that (set P(X<490) = 0.001, find z ~ -
3.09, so (490 - μ)/σ = -3.09 => μ - 490 = 3.09σ, if target μ=500, then σ ~
(500-490)/3.09 ≈ 3.23 ml). This connects probability with real
engineering targets.
Chapter Summary: In this chapter, we covered the language of
probability and the behavior of random variables. Key points are the
distinction between discrete and continuous variables and common
distributions used in engineering. The normal distribution was
highlighted due to its fundamental importance; understanding z-scores
and normal probabilities is crucial going forward. Now that we have
probability tools, we are prepared to delve into statistical inference. The
next chapter will introduce the idea of sampling distributions and the
Central Limit Theorem in more detail, bridging the gap between
probability and the statistics of sample data.
End-of-Chapter Exercises:
1. Basic Probability: A chemical plant has 3 independent safety
systems that can prevent an overflow. Each has a 2% failure
probability on demand. (a) What is the probability all three fail
(and an overflow occurs)? (b) If these systems were not
independent (e.g., all share a power source that fails 1% of the time
which would disable all), qualitatively how would that affect the
probability of failure compared to part (a)?
2. Discrete Distribution: Defects in sheets of glass occur with an
average rate of 0.2 defects per square meter. Assume a Poisson
distribution. (a) What is the probability that a 5 m² glass sheet has
zero defects? (b) What is the expected number of defects on a 10
m² glass sheet?
3. Continuous Distribution: The time (in hours) between
breakdowns of a processing unit follows an exponential
distribution with mean 100 hours. (a) What is the probability the
unit runs at least 150 hours without breakdown? (b) If the unit has
run 100 hours without failure, what is the probability it lasts
another 50 hours (memoryless property)?
4. Normal Distribution Application: The purity of a product is
normally distributed with mean 98.0% and standard deviation
0.5%. (a) Approximately what percentage of batches have purity
above 99%? (b) The specification is 96.5% minimum. What
percentage of batches fall below spec? (c) If a batch is below spec
(<96.5%), how many standard deviations below the mean is it
(calculate the z-score)?
5. Z-Score Practice: A certain reaction yield in a pilot plant is
N(μ=75%, σ=5%). (a) What yield value corresponds to the 90th
percentile (i.e., exceeded by only 10% of runs)? (b) If we observe a
yield of 60% in one run, compute its z-score and interpret what
that implies (how unusual is this run?).
6. Critical Thinking: In a Nigerian refinery, the sulfur content in
fuel (ppm) is strictly regulated. The refinery claims their process
outputs fuel with sulfur ~ N(μ=45 ppm, σ=5 ppm). The legal limit
is 60 ppm. If that claim is true, what is the probability a random
batch exceeds the limit? If an inspector tests 4 batches, what's the
probability all 4 pass the limit? (Assume each batch’s sulfur is
independent and follows the distribution.)
Chapter 4: Measurement Accuracy and Data Quality
Learning Objectives: Students will learn to: (1) distinguish between
accuracy and precision in measurements and why both are critical in
chemical engineering; (2) evaluate the reliability of a set of
measurements using statistical tools (e.g. calculating variance due to
measurement error); (3) understand how to perform basic instrument
calibration and analyze the data (regression for calibration curves); and
(4) design simple experiments to estimate measurement error (like
repeatability and reproducibility tests, possibly a Gage R&R study
concept).
In chemical engineering, data originate from measurements – whether in
the lab (measuring concentration, temperature, particle size, etc.) or in
the plant (flow rates, pressures, sensor readings). Thus, the integrity of
any statistical analysis hinges on the quality of these measurements. This
chapter focuses on the statistical aspects of measurement accuracy and
reliability, linking back to the idea in Chapter 1 that engineers must
appreciate measurement accuracy. We will discuss how to quantify
measurement error, how to calibrate instruments, and how to improve
data quality through careful experimental design.
4.1 Accuracy vs. Precision and Sources of Measurement Error
Accuracy refers to how close a measurement is to the true or
accepted reference value. An accurate instrument has little
systematic error (bias).
Precision refers to the consistency or repeatability of
measurements – if you measure the same thing multiple times, how
much do the readings vary? A precise instrument has small random
error (scatter).
It’s possible to have one without the other: for example, a scale that is
not zeroed could consistently read 5 grams heavy (poor accuracy, good
precision if the scatter is low). Conversely, a dartboard analogy often
used: tight grouping (precise) but off-center (inaccurate) vs. widely
scattered around the center (low precision, potentially unbiased on
average).
In practice, accuracy is addressed by calibration (adjusting
measurements to correct bias), while precision is improved by better
instrument design or measuring technique (reducing noise).
Sources of Error:
Systematic errors (affect accuracy): calibration errors, consistent
instrument drift, environmental biases (e.g. temperature affecting
instrument). These cause measurements to be consistently off in
one direction.
Random errors (affect precision): noise in sensor readings, human
reading error (judgment), fluctuations in the quantity being
measured, etc. These cause scatter in both directions.
For example, measuring pH with a meter: If the probe is not calibrated,
all readings might be 0.2 units high (systematic). If the meter’s
electronic noise causes readings to fluctuate ±0.05, that’s random error.
4.2 Statistical Analysis of Measurement Data: Mean and Variance of
Repeated Measurements
When evaluating a measurement system, a simple approach is to take
repeated measurements of a stable quantity and analyze them
statistically:
Calculate the mean of the readings – this should approximate the
true value if no systematic bias. If the mean is far from a known
reference, that indicates inaccuracy.
Calculate the standard deviation of the readings – this indicates
precision (often called repeatability when same operator, same
instrument, same conditions).
Example: A thermocouple is used to measure boiling water (true
temperature ~100°C at 1 atm). It is immersed multiple times yielding:
99.1, 99.5, 99.0, 99.3, 99.4°C. The mean is 99.26°C, indicating a slight
bias (~0.74°C low) – quite good accuracy. The SD is say ~0.2°C,
showing good precision (the readings are tight). If instead we got
readings like 95, 105, 98, 102, 100°C, the mean ~100°C (unbiased) but
the readings swing widely (SD ~4.2°C), indicating poor precision.
Gauge Repeatability and Reproducibility (R&R): In quality
engineering, a common approach to assess measurement systems is a
Gauge R&R study. It involves measuring some parts multiple times by
different operators and instruments. The variability is then decomposed:
how much comes from repeatability (same operator, same device
variability) and how much from reproducibility (differences between
operators/devices). While a full Gage R&R is beyond our scope,
conceptually:
If two different operators measure the same sample and get
systematically different results, there's a reproducibility issue
(maybe different technique or device calibration).
If the same operator measuring the same sample repeatedly shows
variation, that's repeatability (precision) issue.
We can treat measurement variance as another component in the total
observed variance. If a process has true variance σ_process² but our
measurement adds variance σ_meas², the observed variance =
σ_process² + σ_meas² (assuming measurement error is independent of
process variation). This implies if measurement error is high, it masks
the real process variation. Engineers strive for σ_meas << σ_process
(measurement system much more precise than the inherent variability
being studied). A rule of thumb in industry is measurement system
should contribute <10% of total variation for quality measurements.
4.3 Calibration and Regression for Instrument Accuracy
Calibration is aligning instrument output with known standards.
Statistically, calibration often uses linear regression (discussed more in
Chapter 8) to adjust readings. For instance, suppose we have a flow
meter that outputs a voltage that is supposed to correlate to flow rate.
We run known flows (from a calibrated reference) and record the
meter’s voltage:
Reference flow: 0, 50, 100 L/min; Meter voltage: 0.01, 2.51, 4.98
volts (just an example).
Plotting voltage vs flow, we expect a line. We can fit a linear regression:
flow = a + b*(voltage). That equation is then used to convert future
voltage readings to flow values. The calibration ensures accuracy (the
regression corrects any bias or scale error). The regression’s R² and
residuals tell us how well the meter follows a linear pattern (residual
scatter indicates precision of the meter, systematic deviation from line
indicates non-linearity or biases).
Calibration Example: A pH meter is calibrated with buffer solutions of
pH 4.00, 7.00, 10.00. The meter reads 4.1, 7.1, 10.0 respectively without
calibration. We can shift the meter (offset -0.1) and adjust slope if
needed. In this case, readings suggest a consistent +0.1 bias at pH4 and
pH7, but at pH10 the bias is 0, indicating maybe a slight slope error. A
two-point calibration typically adjusts offset and slope such that meter
reads exactly the standard values. The remaining deviation at a third
point (pH10) indicates calibration quality. Statistically, if we regressed
“true pH = α + β*(meter reading)”, we’d find α and β to correct the
meter. Ideally β ~1, α ~0 for perfect instrument.
Regression Lab Task: Calibrate a thermocouple against a precision
thermometer across 0–100°C: record pairs of readings at various points,
fit a line, and compute the residual error. Use that line for corrected
readings. After calibration, test at a mid-point to verify improved
accuracy.
4.4 Data Quality: Outliers and Missing Data
Measurement data may contain outliers – readings that are far off the
expected range. Outliers can result from momentary glitches (electrical
spike, human error in reading, contamination in sample analysis).
Statistically identifying outliers can be done by seeing if a data point lies
beyond, say, 3 standard deviations from the mean or using Grubbs’ test,
etc. Engineers often face a choice: investigate and possibly discard
outliers (if they are proven erroneous) or include them if they might
indicate real extreme events. Good practice: never discard an outlier
without cause – first, ensure it's not a real phenomenon. For example, if
one batch’s impurity is 5σ higher than others, was there a measurement
error or did something truly go wrong in that batch?
Missing data is another quality issue – e.g. a sensor went offline for a
day. One must decide how to handle missing points (interpolate, use last
value, or analyze with methods that handle missing data). While this
veers into data management more than pure stats, it’s part of ensuring
the dataset used for analysis is representative and clean.
Improving Data Quality: Some strategies:
Conduct repeat measurements and use the average to reduce
random error. By averaging n independent measurements, the
standard error reduces by √n. (E.g., take triplicate samples for
HPLC analysis and average results to get more precise estimate).
Use better instruments or maintain instruments (regular calibration
schedule, routine maintenance to prevent drift).
Control environmental factors during measurement – e.g. measure
viscosity in a temperature-controlled lab to avoid ambient
temperature influencing results.
Training for operators to ensure consistent measurement technique
(reduces variability between people).
Case Example (Nigerian Lab): In a Nigerian oil laboratory, technicians
measure the API gravity of crude oil samples. Suppose two different
devices are used (digital density meter vs. hydrometer) and results
sometimes differ by ~0.5 API. A small Gauge R&R study is done: Each
of 5 oil samples is measured by both methods by two technicians.
Analysis shows the hydrometer readings tend to be 0.3 API higher on
average (a systematic bias) and technicians differ within ±0.1 API (small
random differences). The lab decides to always use the digital meter for
custody transfer measurements (as it’s more consistent) and use a
regression to adjust hydrometer readings if needed (calibrate hydrometer
to match digital meter scale). This improves overall data quality for
reporting and avoids disputes over measurements.
4.5 Application: Design an Experiment to Quantify Measurement
Uncertainty
Suppose we want to quantify the measurement uncertainty of a new gas
chromatograph (GC) for measuring benzene concentration in water. We
could design an experiment:
Take a homogeneous water sample and spike it with a known
benzene level (say 5 mg/L).
Have the GC measure this sample 10 times independently (or over
several days).
The standard deviation of these 10 readings gives the instrument’s
precision at that level.
Repeat for a different concentration (like 1 mg/L and 10 mg/L) to
see if precision varies with concentration (common in instruments
– relative error might be constant percentage).
Also run a known standard sample to check accuracy (compare
mean reading to the true known concentration).
If possible, use another method as a reference (like a calibrated
standard method) to measure the same sample and compare.
This experiment yields statistical insight: e.g., “The GC has a precision
of 0.1 mg/L (2% RSD at 5 mg/L) and an accuracy within 0.05 mg/L
after calibration. Thus, results above detection limit can be trusted
within ±0.2 mg/L with 95% confidence.” That kind of statistical
characterization is valuable for environmental reporting or quality
control.
Software Tip: MINITAB can aid measurement studies. For instance, it
has a Gage R&R analysis under Stat > Quality Tools, where you input
data from multiple operators and parts and it outputs variance
components. Or simple descriptive stats on repeated measurements and
control charts (a control chart of a stable process measurement
essentially tracks measurement variation over time).
Chapter Summary: In this chapter, we reinforced that no statistical
analysis can rise above the quality of the input data. We defined
accuracy and precision and showed how to assess them using statistics
(means and standard deviations of repeated measurements). Calibration
techniques using regression were introduced to correct bias. Engineers
must continuously ensure their data is reliable – through calibration,
repeated measurements, and data cleaning (outlier checks). As we
proceed to inferential statistics, remember that all our confidence in
results assumes the data truly represent what we think they do. A
significant test result means little if the measurements were significantly
biased or noisy. By controlling and quantifying measurement error, we
lay a solid foundation for trustworthy analysis.
End-of-Chapter Exercises:
1. Accuracy vs Precision: A pressure gauge on a reactor reads 5 psi
higher than the true pressure consistently, but has a very small
fluctuation (±0.1 psi). Another gauge reads on average correctly,
but fluctuates ±5 psi. Which gauge is more accurate? Which is
more precise? If you had to choose, which error (systematic or
random) is easier to correct and why?
2. Repeatability test: You weigh the same sample on a balance 7
times and get in grams: 10.12, 10.15, 10.13, 10.11, 10.14, 10.13,
10.12. Calculate the sample mean and standard deviation. If the
true mass is 10.00 g, what is the balance’s bias? Does the precision
(SD) seem acceptable (what % of the reading is it)?
3. Calibration curve: A spectrophotometer gives absorbance
readings that should linearly relate to concentration (Beer-Lambert
law). Known concentrations (ppm) vs absorbance: 0 -> 0.02, 5 ->
0.35, 10 -> 0.68, 15 -> 1.00. (a) Plot these and find the best-fit line
(absorbance = a + b*conc or conc = (absorbance - a)/b). (b) If an
unknown sample reads 0.50 absorbance, what concentration does
the calibration predict? (c) If the residual at 15 ppm was significant
(say actual absorbance was 1.05 vs 1.00 predicted), what might
that indicate (think linearity)?
4. Gage R&R concept: Two operators measure the thickness of a
plastic film using the same micrometer. Each measures the same
sample 3 times: Operator A gets (0.102, 0.105, 0.098 mm),
Operator B gets (0.110, 0.108, 0.109 mm). (a) Compute the mean
and SD for each operator’s readings. (b) Discuss differences: is
there a noticeable bias between operators? Who is more
repeatable? (c) What steps might you take to reduce any observed
differences (training, calibration of micrometer, etc.)?
5. Outlier handling: In a series of viscosity measurements (in cP) of
a sample: 45.2, 46.1, 44.8, 120.5, 45.0, 44.9 – one value is clearly
an outlier. (a) Statistically, how could you justify discarding the
outlier? (Calculate how many SDs away it is from the mean of
others, for instance.) (b) What non-statistical investigation should
accompany this (what might you check about that run)? (c) After
removing it, recalc the mean viscosity. How different is it from
including the outlier?
6. Practical lab task (for thought): Design a brief plan to test the
measurement variability of a pH meter. Include: how many
measurements, of what solutions, using how many operators, etc.,
to separate instrument repeatability from operator technique
variability.
Chapter 5: Sampling and Confidence Intervals
Learning Objectives: By the end of this chapter, students will be able
to: (1) explain the concept of a sampling distribution and the Central
Limit Theorem and why they are fundamental to statistical inference; (2)
construct and interpret confidence intervals for a population mean and
proportion based on sample data; (3) understand the influence of sample
size and confidence level on the width of confidence intervals; and (4)
draw appropriate conclusions from confidence interval results in
chemical engineering contexts (e.g., process parameters, quality
metrics).
Up to now, we have dealt with describing data and understanding
probability distributions in general. Now we shift toward statistical
inference – making educated statements about a population (or process)
based on a sample of data. A core concept is that any statistic computed
from a sample (like the sample mean) is itself a random variable, with its
own distribution (the sampling distribution). Confidence intervals (CIs)
are one of the main tools of inference, allowing us to estimate a
population parameter (like a true mean) with an indication of
uncertainty.
5.1 Populations, Samples, and the Central Limit Theorem
A population is the entire set of subjects or measurements of interest
(conceptually, often infinite or very large). A sample is a subset of data
drawn from the population, ideally at random. For example, consider a
production of 10,000 bottles of soda in a shift (population = 10,000 fill
volumes). We might measure 50 of them (sample) to infer things about
the whole batch.
Sampling Distribution: If we take a sample and compute a statistic
(like sample mean Xˉ\bar{X}), and if we could hypothetically repeat
that sampling process many times, the distribution of Xˉ\bar{X} values
is the sampling distribution. Its mean is the population mean (so Xˉ\
bar{X} is an unbiased estimator of μ), and its variance is σ2/n\sigma^2/n
(population variance divided by sample size n). This means larger
samples yield a tighter distribution of the sample mean around the true
mean.
The Central Limit Theorem (CLT) states that for a large sample size
n, the sampling distribution of Xˉ\bar{X} will be approximately normal,
regardless of the shape of the population distribution (provided the
population has a finite variance and no extreme heavy tails). By "large
n", typically n ≥ 30 is often sufficient in practice, though if the
underlying distribution is very skewed, you might need more. CLT is a
cornerstone because it justifies using normal-based inference for means
in many situations.
Illustration: If an individual measurement of, say, catalyst pellet
diameter is not perfectly normal (maybe slightly skewed), the average
diameter of 50 pellets will be very close to normal by CLT. The mean of
those 50 will have much smaller variability than individual pellets,
specifically σXˉ=σ/50\sigma_{\bar{X}} = \sigma/\sqrt{50}. This tells
us that to reduce uncertainty in estimating the true mean diameter, one
can increase sample size.
Standard Error (SE): The standard deviation of a statistic (e.g., sample
mean) is called the standard error. For the sample mean,
SE(Xˉ)=σnSE(\bar{X}) = \frac{\sigma}{\sqrt{n}}. In practice, σ is
often unknown, so we estimate SE using the sample’s standard deviation
ss: SE≈s/nSE \approx s/\sqrt{n}. The SE is crucial in determining
confidence intervals and test statistics.
5.2 Confidence Intervals for the Mean (Known Variance)
To introduce the concept, first assume we know the population standard
deviation σ (this is rarely true in practice, but it simplifies initial
understanding). If the population is normal or n is large (CLT), a
confidence interval for the true mean μ can be constructed around the
sample mean:
Xˉ±zα/2σn,\bar{X} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}},
where zα/2z_{\alpha/2} is the z-value cutting off an area of α/2\alpha/2
in the upper tail for a (1-α)*100% confidence interval. For a 95% CI,
α=0.05, so z0.025≈1.96z_{0.025} \approx 1.96. For 90% CI,
z0.05≈1.645z_{0.05}\approx1.645, and for 99% CI,
z0.005≈2.576z_{0.005}\approx2.576.
This formula comes from the reasoning that Xˉ\bar{X} is approximately
normal with mean μ and std dev σ/√n. We want an interval that has 95%
chance to cover μ. So we go ~1.96 SEs below and above the observed
Xˉ\bar{X}.
Interpretation: If we say "95% confidence interval for μ is [L, U]", it
means that if we repeated the sampling many times, 95% of the intervals
constructed this way would contain the true μ. It does not mean there's a
95% probability that μ lies in [L, U] (μ is fixed, the interval is random).
But informally, one can think of it as a reasonable range of values for μ
given the data.
Example: A catalyst yields a mean conversion of 80% with known σ =
5% in small-scale trials (n=9 trials). We find Xˉ=80%\bar{X}=80\%. A
95% CI for the true mean conversion: 80 ± 1.96*(5/√9) = 80 ±
1.96*(1.667) = 80 ± 3.27, so [76.73%, 83.27%]. We are 95% confident
the true mean conversion under these conditions is ~76.7 to 83.3%. If a
desired target was 85%, this CI suggests the process likely falls short of
that target.
5.3 Confidence Intervals in Practice (Unknown Variance – t-
distribution)
Usually, σ is unknown. We then use the sample standard deviation s and
rely on the t-distribution instead of normal. The t-distribution (with df
= n-1 degrees of freedom) is wider than normal for small samples, to
account for extra uncertainty in estimating σ. A CI for μ becomes:
Xˉ±tα/2, df=n−1sn.\bar{X} \pm t_{\alpha/2, \, df=n-1} \frac{s}{\
sqrt{n}}.
For large n, t approximates z. For example, with n=10 (df=9), the 95% t
critical value is ~2.262 (compared to 1.96 for z). With n=30 (df=29), t
~2.045. As n → ∞, t →1.96.
Example: Suppose from 15 samples of a new chemical product’s purity,
we get Xˉ=99.2%\bar{X}=99.2\%, s=0.5%. We want a 95% CI for true
purity. df=14, t(0.025,14) ≈ 2.145. So CI = 99.2 ± 2.145*(0.5/√15).
0.5/√15 = 0.129, times 2.145 gives ~0.277. So CI ≈ [98.92%, 99.48%].
We are fairly confident the average purity is between ~98.9 and 99.5%.
Notice how a small s and decent n yields a tight CI of width about
±0.28%.
The width of a CI depends on:
Confidence level: higher confidence (99% vs 95%) → larger
critical value → wider interval (more cautious).
Sample size: larger n → smaller standard error → narrower
interval (more precision).
Data variability (s): more variability → wider interval.
Planning Sample Size: Sometimes an engineer may want to determine
n such that the CI will be a desired width. Roughly, for a given desired
margin m at 95% confidence, you’d set 1.96 * (σ/√n) = m, solve for n.
For example, if you want ±1% precision on a mean with σ ~5, need n ~
(1.96*5/1)^2 ≈ 96. (We often don't know σ initially, so use a prior
estimate or pilot study for planning.)
5.4 Confidence Interval for a Proportion
In quality control, one might estimate a proportion (e.g., fraction of
defective products, or fraction of time a process is in a certain state). For
large n, if X ~ Binomial(n, p) (where p is true proportion of "success"),
the sample proportion p^=X/n\hat{p} = X/n has approximately normal
distribution with mean p and standard error p(1−p)n\sqrt{\frac{p(1-p)}
{n}}. For CI, one formula is:
p^±zα/2p^(1−p^)n.\hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\
hat{p})}{n}}.
This is an approximate method (there are better ones for extreme p or
small n, like Wilson score interval).
Example: Out of 200 products, 8 failed quality tests. p^=8/200=0.04\
hat{p}=8/200 = 0.04 (4% defect rate). A 95% CI for true defect rate p:
0.04 ± 1.96 * sqrt(0.04*0.96/200). The sqrt term = sqrt(0.0384/200) =
sqrt(0.000192) = 0.01385. Times 1.96 gives ~0.0271. So CI ≈ [0.04 -
0.027, 0.04 + 0.027] = [0.013, 0.067] or [1.3%, 6.7%]. We conclude the
defect rate is likely between about 1% and 7%. Note the asymmetry: the
point estimate was 4%, but because of sampling uncertainty, true rate
could be a bit lower or higher. If a regulatory limit was, say, 5%, this CI
unfortunately includes 5% (so we couldn't confidently claim the defect
rate is below 5% yet). If we wanted a tighter interval, we’d need a larger
sample.
5.5 Practical Interpretation in Engineering Context
Confidence intervals are extremely useful for communicating
uncertainty of estimates:
An engineer reporting the mean concentration of a pollutant in
wastewater might say "Mean = 5.2 mg/L, 95% CI: 4.5 to 5.9
mg/L". This tells regulatory agencies not just a point estimate but
the range of likely true values.
In R&D, if a new catalyst’s average yield improvement is
estimated with a CI, one can see if zero improvement is outside the
interval or not (if the entire CI is above 0, that suggests statistically
significant improvement – connecting to hypothesis testing logic).
Confidence intervals can be used to determine if a process change
has practical significance. For example, modifying a process
increased average throughput from 100 to 105 kg/hr, CI for
increase is [+2, +8] kg/hr. This is statistically significant (CI
doesn’t include 0) and also practically significant. But if the CI
was [0, +10], it’s borderline – the true increase could be trivial (0)
or large.
MINITAB Note: In MINITAB, when you use Stat > Basic Statistics >
1-sample t (for example), it will output not only the test result but also
the CI for the mean. It’s good practice to always look at the CI, as it
provides more information than a yes/no hypothesis test.
Case Study (Chem Eng Example): A Nigerian pharmaceutical plant
measures the potency of a drug in 10 tablets; results in assay percent of
label claim: 98, 101, 99, 100, 97, 103, 99, 101, 100, 98. Xˉ=99.6,s≈2.0\
bar{X}=99.6, s≈2.0. 95% CI (df=9, t≈2.262): 99.6 ± 2.262*(2/√10) =
99.6 ± 2.262*0.632 = 99.6 ± 1.43 => [98.17, 101.03]%. So they’re 95%
confident true average potency is ~98.2–101.0%. The spec requires 95–
105%, so that’s comfortably within spec – no issue. If instead s were
larger or sample smaller such that CI was [94, 105]%, even if mean is
99.6, we wouldn’t be as confident all production is on target (the true
mean could be slightly below 95). That might prompt taking a larger
sample or investigating variability sources.
5.6 Weekly/Fortnightly Lab Assignments in Context
Recall from the course outline that students are to have regular
computer-lab assignments. In this chapter’s context, an example
assignment: Use MINITAB to generate random samples from a known
distribution and verify the Central Limit Theorem. For instance, simulate
1000 samples of size n=5 from an exponential distribution (which is
skewed) and observe the distribution of sample means – check that it
approaches normal. Or assign students to measure something around
them (like fill volumes of water bottles they have) and compute a CI,
interpreting it.
Chapter Summary: In this chapter, we have established how to move
from describing a single sample to making statements about the broader
process it came from. The Central Limit Theorem assures us that sample
means tend to be normally distributed for large samples, enabling the
construction of confidence intervals. We derived CIs for means (using z
or t) and for proportions. Understanding and interpreting these intervals
is a fundamental skill – it teaches us to say “we’re not 100% sure, but
we have a range for the value with a certain confidence.” In engineering,
where decisions must often be made under uncertainty, the confidence
interval provides a quantitative handle on that uncertainty. In the next
chapter, we’ll continue our exploration of inference with hypothesis
testing, which is a complementary framework asking yes/no questions
with a controlled false-alarm rate.
End-of-Chapter Exercises:
1. Sampling Distribution: A process has a true mean μ = 50 and σ =
5. If you take samples of size n=25, what is the distribution of the
sample mean (give mean and standard deviation)? According to
CLT, what is approximate probability that a given sample mean
will fall between 48 and 52?
2. Confidence Interval Calculation: A polymer’s tensile strength
(MPa) is measured in 8 specimens: 55, 60, 57, 59, 58, 61, 56, 60.
Compute the sample mean and standard deviation. Assume
roughly normal data. (a) Construct a 95% CI for the true mean
strength. (b) Interpret this interval in context of the polymer’s
guaranteed minimum strength of 54 MPa (does it appear the
guarantee is met on average?).
3. Proportion CI: In a quality audit, 16 out of 200 sampled packages
were found leaking. Calculate a 90% confidence interval for the
true proportion of leaking packages. Based on this, if the company
claims “at most 5% of packages leak,” is that claim plausible?
4. Changing Confidence Level: Using the tensile strength data from
question 2, also compute a 90% CI and a 99% CI for the mean.
Compare the widths of these intervals to the 95% CI. Explain the
trade-off between confidence level and precision.
5. MINITAB Exercise: Generate 50 random values in MINITAB
from a Uniform(0,1) distribution (Calc > Random Data). Treat
those as a “population”. Now use MINITAB’s resampling or by
manually taking, say, 5 random points from those 50 to simulate a
sample. Compute the sample mean and a 95% CI (since σ
unknown, use t). Repeat this sampling a few times (MINITAB can
automate with bootstrapping if known). Do most of your CIs
contain the true mean of the uniform (which is 0.5)? What does
this demonstrate?
6. Critical Thinking: A chemical engineer says, “We don’t need
confidence intervals; we have a lot of data, so the sample mean is
basically the truth.” Discuss why even with a lot of data, reporting
an interval is valuable. What if the engineer’s data were highly
variable? How does sample size help and what caution remains?
Chapter 6: Hypothesis Testing and Statistical Inference
Learning Objectives: Students should be able to: (1) formulate null and
alternative hypotheses for common engineering scenarios (comparing
means, checking a claim, etc.); (2) perform basic hypothesis tests (one-
sample z/t-test for mean, tests for proportions) and calculate test
statistics and p-values; (3) interpret the results of a hypothesis test in
context, including understanding the meaning of p-value and the risk of
Type I/II errors; and (4) evaluate the assumptions behind tests
(normality, independence) and use software (MINITAB) to carry out
tests, interpreting the output.
Hypothesis testing is a formal framework for making decisions or
judgments about a population parameter based on sample data. Where
confidence intervals gave us a range of plausible values, hypothesis tests
ask a yes/no question: for example, "Is the mean at least 5?" or "Are two
processes different in yield?" and provide a significance level for the
decision. This chapter introduces the concepts of null and alternative
hypotheses, test statistics, significance (p-values), and errors in testing.
6.1 Hypotheses: Null and Alternative
A null hypothesis (H₀) is a statement of no effect or no difference, set
up as the default or status quo claim. The alternative hypothesis (H₁ or
Hₐ) is what we seek evidence for – typically indicating some effect,
difference or change. We always assume H₀ is true unless data give
strong evidence against it.
Examples:
H₀: μ = 50 (the process mean equals 50 units), H₁: μ ≠ 50 (two-
sided alternative, the mean is different from 50).
H₀: μ ≤ 100, H₁: μ > 100 (one-sided alternative, testing if mean
exceeds 100).
H₀: p = 0.05 (5% defect rate historically), H₁: p < 0.05
(improvement: defects lower than 5%).
We choose H₀ and H₁ before seeing data ideally, reflecting the question
at hand. The null often represents "no change," "no improvement," or a
claimed value. The alternative represents what we suspect or want to
test.
6.2 Test Statistics and Decision Rule
A test statistic is a function of sample data that measures how far the
data deviate from H₀. For a mean test (with known σ or large n), we use:
z=Xˉ−μ0σ/n,z = \frac{\bar{X} - \mu_{0}}{\sigma/\sqrt{n}},
where μ₀ is the hypothesized mean under H₀. If |z| is large, it indicates
the sample mean is far from μ₀ in SD units – evidence against H₀.
If σ is unknown and n is moderate, use:
t=Xˉ−μ0s/n,t = \frac{\bar{X} - \mu_{0}}{s/\sqrt{n}},
with df = n-1.
For testing a proportion:
z=p^−p0p0(1−p0)n,z = \frac{\hat{p} - p_{0}}{\sqrt{\frac{p_{0}(1-
p_{0})}{n}}},
assuming n large enough.
The decision rule can be the critical value approach: compare the test
statistic to a threshold based on chosen significance level α. For
example, two-sided test at α=0.05: reject H₀ if |z| > 1.96 (for z-test) or |t|
> t_{α/2,df}.
Alternatively (and more commonly now) the p-value approach:
compute the p-value, which is the probability of observing a test statistic
as extreme as (or more than) what we got, assuming H₀ is true. If p-
value ≤ α, reject H₀ (evidence is significant at level α). If p-value > α,
fail to reject H₀.
Interpretation of p-value: It's the risk (under H₀) of falsely rejecting
H₀ given the observed data. A small p (like 0.01) means the data we got
would be very unlikely if H₀ were true, so we suspect H₀ is false.
Conversely, a large p (0.5) means data are quite compatible with H₀.
6.3 Significance Level, Type I and II Errors, and Power
Significance level (α): The chosen probability of Type I error
(false positive), i.e., rejecting H₀ when it is actually true. Common
α = 0.05 or 5%. This means we tolerate a 5% chance of alarm
when nothing is wrong. If consequences of false alarm are serious,
we may use α = 0.01; if missing a real effect is worse, maybe α =
0.1 is chosen.
Type I Error: Rejecting H₀ when H₀ is true (false alarm).
Probability = α by test design.
Type II Error: Failing to reject H₀ when H₁ is true (missed
detection). Probability = β (not directly set by test, depends on
sample size, effect size, α).
Power = 1 - β: Probability of correctly rejecting H₀ when a
specific alternative is true. For example, the power of a test to
detect that μ is actually 5 units above μ₀ might be 80% with a
certain n. We generally want high power (≥80%) for important
effects.
There is a trade-off: for fixed n, lowering α (being more stringent) often
raises β (harder to detect a true effect). Increasing n reduces both errors
(power goes up).
Engineers often use α=0.05 by convention, but if, say, a plant safety
decision is based on a test, one might choose α=0.01 to be extra sure
before calling something safe (minimize false positives in concluding
safety).
6.4 One-Sample Tests: z-test and t-test
One-sample z-test (rarely used in practice unless σ known or n
large): e.g., testing H₀: μ = μ₀. Calculate z as above and compare to
N(0,1).
One-sample t-test: More common. Use when σ unknown and sample
from ~normal population (or n sufficiently large for CLT). MINITAB’s
1-sample t handles this. We specify H₀ value and get a t and p.
Example: A machine is supposed to fill 500 mL on average. We take 10
samples: mean = 492 mL, s = 15 mL. H₀: μ=500, H₁: μ≠500. t = (492-
500)/(15/√10) = -8/(4.743) = -1.686. df=9. The two-tailed p-value = P(|
T9| > 1.686). From t-table or software, p ~ 0.128. At α=0.05, p > 0.05,
so we fail to reject H₀. We don't have strong evidence the mean is
different from 500; the shortfall of 8 mL could be due to chance with
this sample size. However, note: 8 mL difference might be practically
important. Perhaps we need more data (power might be low with n=10).
If the spec is tight, one might consider the risk of Type II error here.
(We’ll discuss interpretation beyond p-value: maybe a CI approach from
Chapter 5 would show 95% CI ~ [483, 501], which includes 500, hence
consistent with test result).
Paired t-test (a special one-sample test on differences): If we measure
something before and after on the same unit (paired data, e.g. catalyst
activity before/after regeneration on each catalyst sample), we compute
differences and do a one-sample t on those differences (H₀: mean
difference = 0). We mention here because it's a common scenario:
pairing eliminates variation between subjects. For example, if we have 5
catalysts, measure activity, regenerate, measure again, we can test if
average change > 0 via paired t.
6.5 P-value and Decision in Context
It’s crucial to interpret results in context:
"Reject H₀ at α=0.05" means we found statistically significant
evidence for H₁ at 95% confidence level.
"Fail to reject H₀" does not prove H₀ true; it just means not
enough evidence against it. Perhaps the effect is too small to detect
with given data, or variability is too high.
Case Study: A Nigerian water treatment facility claims its effluent BOD
(biochemical oxygen demand) is at most 20 mg/L. Regulators take 6
samples: mean = 22 mg/L, s = 3. H₀: μ = 20 vs H₁: μ > 20 (one-tailed,
since regulators worry if it's higher). t = (22-20)/(3/√6) = 2/1.225 =
1.633, df=5. For α=0.05 one-tailed, critical t5 = 2.015, or p = P(T5 >
1.633) ~0.078. Not below 0.05, so fail to reject H₀ at 5% (no significant
violation). At 10% significance, we would reject (p < 0.10). This
illustrates how chosen α matters. The regulator might decide to collect
more data (increase n) to get a clearer decision, given p ~0.078 is
suggestive but not conclusive at 5%. The potential risk: maybe true
mean is slightly above 20 and a small sample didn't catch it with 95%
confidence.
6.6 Using MINITAB for Hypothesis Tests
MINITAB makes hypothesis testing straightforward:
For one-sample tests: Stat > Basic Statistics > 1-sample t... (or 1-
sample z if σ known). You input the column and H₀ value, choose
test type (≠, <, > alternative).
The output will show the sample mean, n, SD, the t (or z) statistic,
degrees of freedom (for t), and the p-value.
It might say something like: "Test of μ = 500 vs ≠ 500: T = -1.686,
p = 0.128". You then interpret: since p 0.128 > 0.05, not
significant; if α was 0.1, then it's borderline significant.
MINITAB also can produce a confidence interval in the same output.
It’s good to examine that – a p-value tells if there's evidence of
difference, but the CI tells how big the difference might be if it exists.
For the filling example, p=0.128 (no sig diff), but CI might be [483, 501]
mL, which suggests even though we can’t reject a 500 mean, the actual
mean could be as low as 483, which might be practically concerning if
underfilling. Hypothesis test alone might miss that nuance. Thus,
reporting both is often recommended.
Error Considerations: If we repeat tests, a 5% significance means on
average 1 in 20 tests on true nulls will give a false positive. Engineers
need to be cautious of multiple comparisons (if you test 5 different
quality metrics, the chance one falsely flags is >5%). There are methods
to adjust for multiple tests (Bonferroni, etc.), but that’s advanced. We
mention so that one doesn’t think "just test everything at 5% and be
surprised if one lights up by chance."
6.7 Example: Power and Sample Size
If an engineer wants to ensure detecting a certain deviation with high
probability, they do a power analysis. For example, if true mean is 495
vs H₀:500, what’s the chance our filling test with n=10, α=0.05 would
catch it? It might be low. Roughly, power = P(reject | μ=495). We
compute t threshold ~2.262 (for df=9 two-tail, half of that one tail since
symmetric). Need our t = (495-500)/(15/√10) = -5/4.743 = -1.054. That’s
not beyond -2.262, so we wouldn't reject; for power we integrate
distribution under alt. Perhaps power ~ 20-30% only. This is a trivial
calc here, but MINITAB has a Power and Sample Size tool to do these
properly. If low power, we increase n to achieve desired power (80%
etc.).
Engineers often face this: "How many samples do I need to be confident
in detecting a 1% difference?" These questions require balancing
statistical and practical significance.
6.8 Recap with Industrial/Nigerian Example
Let's revisit Kenneth Dagde’s Day 8-9 topics – hypothesis testing
introduction, then power and significance. Imagine an industrial
example: A cement plant in Nigeria changed a grinding aid additive,
hoping to increase average cement strength. Historically μ=42 MPa.
New sample of 5 bags yields mean 44 MPa, s=3. H₀: μ=42 vs H₁: μ >
42. t = (44-42)/(3/√5) = 2/(1.342) = 1.49, df=4, one-tail p ~0.10. Not
significant at 0.05. Possibly with n=5, our test is underpowered to detect
a ~2 MPa increase. If we require evidence at 95% confidence, we might
do more trials (maybe 15-20 bags) to confirm. Alternatively, we might
consider the cost of being wrong: if the additive is cheap and unlikely to
harm, one might proceed provisionally. But statistically, we can't claim
with 95% confidence that strength improved.
One must also be wary of assumptions: t-tests assume data is
approximately normal. If sample size is small (n<30), this matters. If
data are skewed or have outliers, non-parametric tests (like sign test,
Wilcoxon) might be used as alternatives. However, we won't dive deep
here – but always plot your data to check nothing weird.
Chapter Summary: We covered how to set up and interpret hypothesis
tests. The p-value tells us if the observed effect is statistically significant
(unlikely under the null hypothesis). We differentiate between statistical
significance and practical significance – a tiny difference can be highly
significant with huge n, and a large difference can be non-significant
with tiny n. Chemical engineers should use hypothesis tests as tools to
make data-driven decisions (like accepting a new process or determining
if a change had an effect) while also using engineering judgment on
what differences matter. In the next chapter, we will extend hypothesis
testing to comparing two groups (two-sample tests) and ANOVA for
several groups, which are very common in experiments and quality
control.
End-of-Chapter Exercises:
1. Formulating H₀/H₁: For each scenario, state appropriate null and
alternative hypotheses and whether the test is one-tailed or two-
tailed:
a. A catalyst vendor claims their catalyst increases yield. Current
yield is 75%. We test if the new catalyst’s mean yield is greater
than 75%.
b. A regulation says sulfur content must be 50 ppm. We want to
check if our fuel meets this (we worry if it's not equal to 50).
c. Historically 10% of products were defective. After a process
improvement, we hope the defect rate is lower.
2. Calculating Test Statistic: A sample of n=16 from a normal
process has Xˉ=105\bar{X}=105 and s=8. We test H₀: μ = 100 vs
H₁: μ ≠ 100 at α=0.05. (a) Compute the t statistic and degrees of
freedom. (b) What is the approximate critical t for α=0.05 (two-
tail, df=15)? (c) Based on these, would you reject H₀? (d)
Calculate or estimate the p-value.
3. Interpreting p-value: If a p-value comes out as 0.003 in a test,
explain in a sentence what that means regarding the data and H₀. If
α was 0.01, what is the decision? If α was 0.001?
4. Type I/II Conceptual: In the context of exercise 1a (catalyst
yield): describe a Type I error and a Type II error in plain language
and the consequence of each. Which error might be more serious
for the company (adopting a new catalyst that actually doesn’t
improve yield vs. not adopting one that actually would improve
yield)? How could you reduce the chances of a Type II error?
5. Using Software: (If available) In MINITAB or another tool, input
the data from exercise 2 (or generate similar data). Perform a 1-
sample t test for H₀: μ=100. Report the output: sample mean, SE
mean, t, and p-value. Does it match your manual calculation?
6. Power check: Suppose in exercise 2 that the true mean is actually
105 (as observed) and true σ is 8. We had n=16. Without complex
calculations, discuss if the test likely had decent power to detect a
5 unit difference. (Hint: if you use a rough effect size 5/8 = 0.625
SD, and n=16, consider that larger n or effect would increase
power). What sample size might be needed to have, say, 90%
power for detecting a 5 unit increase at α=0.05? (This may require
some trial or formula: one approach is use z since n large
approximation: 1.96 + 1.282 = 3.242 = (Δ/σ)*√n; plug Δ=5, σ=8,
solve for n.)
Chapter 7: Comparing Two or More Groups – t-Tests and ANOVA
Learning Objectives: Students will be able to: (1) conduct and interpret
a two-sample t-test for comparing means from two independent groups
(including checking for equal vs. unequal variances); (2) understand the
basic idea of Analysis of Variance (ANOVA) for comparing more than
two means and interpret an ANOVA table; (3) recognize when to use
paired t-tests versus two-sample t-tests; and (4) relate these tests to
engineering situations like A/B testing of process conditions or
comparing multiple catalysts, including understanding the assumptions
involved.
In many chemical engineering investigations, we compare two or more
conditions or groups: e.g. performance of catalyst A vs B, output of
process before vs after an upgrade, yields under several temperature
settings. This chapter extends hypothesis testing to such comparisons.
We first tackle comparing two groups with t-tests, then one-way
ANOVA for multiple groups.
7.1 Two-Sample t-test (Independent Samples)
When we have two separate samples from two populations or
experimental conditions, and we want to test if their means differ, we
use a two-sample t-test. Typical hypotheses: H₀: μ₁ = μ₂ (no
difference), H₁: μ₁ ≠ μ₂ (or one-sided if expecting a direction).
Assumptions: Ideally both samples are from (approximately) normal
distributions. If sample sizes are reasonably large (n₁,n₂ ≥ 30), CLT
helps. Also assume samples are independent of each other. There are
two versions:
Equal variances assumed (pooled t-test): If we can assume σ₁ =
σ₂, we pool the variances for a more stable estimate.
Unequal variances (Welch’s t-test): Does not assume equal σ,
uses a degrees of freedom formula (often non-integer df) and
separate variances.
If sample standard deviations differ a lot or sample sizes differ, Welch’s
is safer. Most software (including MINITAB) can test for equal
variances (e.g. Levene’s test) or just use Welch by default.
Test statistic:
t=Xˉ1−Xˉ2−Δ0Sdiff,t = \frac{\bar{X}_1 - \bar{X}_2 - \Delta_0}{S_{\
text{diff}}},
where Δ₀ is the hypothesized difference (often 0), and
Sdiff=s12n1+s22n2S_{\text{diff}} = \sqrt{\frac{s_1^2}{n_1} + \
frac{s_2^2}{n_2}}. If variances assumed equal, SdiffS_{\text{diff}}
uses pooled s².
Degrees of freedom: if pooled, df = n₁+n₂ - 2. If not pooled, use Welch-
Satterthwaite formula (software computes; df could be fractional).
Example: Catalyst A vs Catalyst B: 8 runs each. A: XˉA=85%,sA=5%\
bar{X}_A=85\%, s_A=5\%. B: XˉB=80%,sB=6%\bar{X}_B=80\%,
s_B=6\%. H₀: μ_A = μ_B, H₁: μ_A > μ_B (one-tailed, expecting A
better). Pooled approach: sp ≈ sqrt(((725)+(736))/(8+8-2)) =
sqrt((175+252)/14) = sqrt(427/14)=sqrt(30.5)=5.52%. t = (85-80)/
(5.52*√(1/8+1/8)) = 5 / (5.52*√(0.25)) = 5/(5.52*0.5) = 5/(2.76) =
1.811. df=14. For one-tailed α=0.05, critical t14 ~1.761. t=1.811 exceeds
that, so p ~0.045, we reject H₀ and conclude Catalyst A has significantly
higher mean yield than B at 5% level. (Two-tailed p would be ~0.09, not
sig if two-tailed test). Interpretation: evidence suggests A is better than
B. However, difference is 5%, which may or may not be practically huge
depending on context, but likely meaningful in yield terms.
If variances were very different, we could use Welch’s. Typically, we
might check if 5² vs 6² are similar. Levene’s test might say p > 0.05,
okay to assume equal. If one stdev was 5 and the other 15, then
definitely use Welch.
MINITAB usage: Stat > Basic Statistics > 2-sample t. Choose whether
to assume equal variance (there’s a checkbox). It will output means,
difference, SE diff, t, df, p-value. Also confidence interval for
difference.
Two-sample t is basically the simpler case of ANOVA for 2 groups.
Results align if done via either method.
7.2 Paired t-test (Revisited)
We discussed paired t in Chapter 6, but to contrast: use paired t when
samples are not independent – each data point in group1 pairs naturally
with one in group2 (like measurements on same unit). In pairing, you
reduce noise from unit-to-unit variation. For example, testing two
formulations on the same engine in two runs (one with additive, one
without) – differences per engine are computed, then one-sample t on
differences.
Always decide pairing vs independent based on experiment design. If
pairing is appropriate but you analyze as independent, you lose power
because you ignore the pairing that removes variability. If you wrongly
pair unrelated samples, that's also incorrect.
7.3 One-Way ANOVA: Comparing More than Two Means
When you have k groups (k ≥ 3), doing multiple t-tests is inefficient and
increases Type I error risk (multiple comparisons). Analysis of
Variance (ANOVA) provides a single overall test for H₀: all k means
are equal vs H₁: at least one mean differs.
The idea:
Between-group variability: How much do group means vary
around overall mean?
Within-group variability: How much do data vary inside each
group (noise)?
If between is large relative to within, it suggests real differences between
means. The test statistic is an F-ratio:
F=Variance between groupsVariance within groups,F = \frac{\
text{Variance between groups}}{\text{Variance within groups}},
which follows an F distribution with df1 = k-1 (between), df2 = N-k
(within), where N is total observations.
ANOVA table includes:
SS_between (SSA), df = k-1.
SS_within (SSE), df = N-k.
MS_between = SS_between/(k-1), MS_within = SS_within/(N-
k).
F = MS_between / MS_within.
p-value for F with (k-1, N-k) df.
If p < α, reject H₀ (not all means equal). But ANOVA doesn’t tell which
differ; for that, do post-hoc tests (Tukey, etc.), or planned comparisons
(t-tests with Bonferroni adjustments). We will mostly focus on
understanding the ANOVA output and when to use it.
Example: Three solvents are tested for reaction yield (3 runs each).
Suppose yields:
Solvent1: 70, 75, 73 (mean 72.7, s ~2.5)
Solvent2: 80, 78, 82 (mean 80, s ~2)
Solvent3: 77, 76, 79 (mean 77.3, s ~1.5)
Overall mean ~76.7. Between-group variation: means 72.7, 80.0, 77.3
vary - calculable SS_between = ∑n_i*(mean_i - overall_mean)². If each
n_i=3:
(72.7-76.7)²3 + (80.0-76.7)²3 + (77.3-76.7)²3 = (-4)²3 + (3.3)²3 +
(0.6)²3 = 163 + 10.893 + 0.36*3 = 48 + 32.67 + 1.08 = 81.75.
df_between=2, MS_between=40.875.
Within-group: sum of squares of deviations in each group: e.g. group1
var ~2.5²=6.25, times (n1-1)=2 yields 12.5; do similarly others. Quick
approximate SSE ~ (2.5²2 + 2²2 + 1.5²*2) = (12.5 + 8 + 4.5) = 25.
df_within=N-k=9-3=6, MS_within ~4.17. Then F = 40.875/4.17 = 9.8.
df(2,6), check F critical ~5.14 for 0.05, so p maybe ~0.01. Indeed F
large, indicates a significant difference among means. Likely solvent2
(mean 80) is higher than solvent1 (72.7). We’d do Tukey’s test to
confirm which differences are significant, but clearly S2 vs S1 is big
~7.3 difference, S3 in middle might not differ from either significantly
depending on thresholds.
ANOVA assumptions: each group ~ normal, equal variances across
groups (ANOVA is somewhat robust if slight differences, but major
variance differences or outliers can distort F test). There are tests for
equal variances (Bartlett’s, Levene’s) – if violated, one can use Welch’s
ANOVA or transform data.
Software: MINITAB *Stat > ANOVA > One-way (when data are in
separate columns or as factor-level). It outputs an ANOVA table with
source (Factor, Error) and SS, df, MS, F, p. Also, many packages give
group means and possibly letter groupings from Tukey (like groups A, B
indicating which are significantly different).
From LO: Day 12 covers planning experiments and one-way ANOVA.
So this fits – students should know how to plan multiple conditions and
analyze via ANOVA.
7.4 Engineering Examples for t-test and ANOVA
Two-sample t-test example (industrial): Comparing a new
filtration unit’s efficiency vs the old unit. H₀: same mean. If p <
0.05, new unit significantly improves efficiency.
Paired test example: Emissions measured from an engine on
regular fuel vs biodiesel for each of 10 vehicles. Each vehicle acts
as its own control -> paired t on emission difference (H₀: mean
diff=0).
ANOVA example: A Nigerian brewery tests 4 different
fermentation temperatures to maximize alcohol content (with 5
pilot fermenters at each temp). After fermentation, measure alcohol
%. Use one-way ANOVA to see if temperature affects mean
alcohol%. If F is significant, perhaps do Tukey to find optimal
temp (like maybe 20°C vs 25°C vs 30°C vs 35°C – often one finds
an optimum region).
ANOVA in DOE context: This is essentially what design of
experiments use – if factors have multiple levels, ANOVA is used
to see factor significance. Chapter 9 will expand on factorial
design ANOVA, but one-way ANOVA is conceptually the starting
point (one factor with k levels).
7.5 Communicating Results
Reporting results might be like: “The two-sample t-test shows a
statistically significant increase in mean conversion with Catalyst A
(mean 85%) compared to Catalyst B (80%), p=0.045, at 95% confidence
level.” Or for ANOVA: “ANOVA results (F2,12 = 7.5, p = 0.008)
indicate a significant effect of temperature on yield. Post-hoc analysis
reveals the yield at 50°C (mean 90%) is significantly higher than at 40°C
(mean 82%, p<0.01), whereas 45°C (88%) is not significantly different
from 50°C or 40°C at α=0.05.”
This gives both statistical and practical context.
Chapter Summary: This chapter introduced the tools for comparing
means across groups – a vital part of analyzing experiments and process
changes. We learned that:
Two-sample t-tests allow comparison of two independent samples
(or paired for dependent samples).
ANOVA generalizes this to multiple groups, using F-test to avoid
inflating Type I error from multiple t-tests.
We should check assumptions (normality, equal variances) to
validate our tests or use appropriate alternatives.
Significant results tell us a difference exists, and engineers then
consider if it's practically meaningful.
These methods are the bridge to experimental design analysis. In the
next chapter, we will go deeper into designed experiments with multiple
factors, where ANOVA becomes even more central (with factorial
designs, interaction effects, etc.), and we will also see how to use
software like Design-Expert for DOE analysis.
End-of-Chapter Exercises:
1. Two-sample vs Paired: You want to test a new antiscalant
chemical in cooling towers. You measure scale deposition rate in
10 towers for a month without the chemical, then in the same 10
towers for the next month with the chemical. (a) Should you use a
paired test or two-sample test? Why? (b) What are H₀ and H₁? (c)
If the average reduction in scale is 15% with p = 0.03 from the test,
interpret this result.
2. Manual two-sample t: Sample1: n1=12, Xˉ1=50\bar{X}_1=50,
s1=4; Sample2: n2=10, Xˉ2=46\bar{X}_2=46, s2=5. Test H₀:
μ1=μ2 vs H₁: μ1≠μ2 at α=0.05. (a) Compute the t statistic (use
Welch’s formula for S_diff). (b) Approximate the degrees of
freedom (using formula or a rule of thumb; many textbooks give df
≈ min(n1-1,n2-1) as a rough lower bound). (c) Determine if p is
roughly <0.05 or not. (d) Would pooled t likely differ much here
(check ratio of variances)?
3. ANOVA concept: Three machines produce a part. You sample 10
parts from each and measure length. State H₀ and H₁ for ANOVA.
If p-value from one-way ANOVA is 0.001, what does that
conclude? Does it tell you which machine differs? How might you
find out?
4. ANOVA calc (small): Given groups data: A: [5,6,7], B: [5,5,6], C:
[7,8,9]. (a) Compute group means: Aˉ,Bˉ,Cˉ\bar{A}, \bar{B}, \
bar{C}. (b) Compute overall mean. (c) Sketch or conceptually
compute SS_between and SS_within (you can do it by steps as
shown). (d) Determine df and MS for each. (e) Compute F. (f)
Based on F (and knowledge that F crit ~5.14 for df2,6 at 0.05), is
there a significant difference?
5. Levene’s test: Explain in simple terms why having very different
variances in different groups could be a problem for ANOVA or t-
tests. If you found variances significantly different (p<0.05 in
Levene’s test), what are two possible approaches to proceed?
(Hint: one is using a different test like Welch’s ANOVA, another
is transforming data like using log scale).
6. Use of software: Suppose you have the catalyst yield data:
Catalyst A yields = [83,85,88,79,90], Catalyst B yields =
[80,78,85,82,75]. Use a software’s two-sample t (or manually
compute) to answer: (a) What is the difference in means and 95%
CI for it? (b) What is the p-value? (c) Does it assume equal
variances or not? If using MINITAB, also note if an F-test for
equal variances is given and its result.
7. Practical interpretation: A researcher reports “F(3,20)=2.35,
p=0.095” for a four-group comparison. What does that mean in
terms of statistical conclusion at α=0.05? And α=0.10? What might
the researcher consider doing (in terms of sample size or
experiment design) if differences were practically important but
not quite significant?
Chapter 8: Experimental Design – Fundamentals of Factorial
Experiments and ANOVA Applications
Learning Objectives: Students will: (1) understand the principles of
experimental design – in particular factorial designs (varying multiple
factors simultaneously); (2) know how to set up a full factorial
experiment and analyze it using ANOVA to determine main effects and
interactions; (3) gain exposure to the concept of blocking and
randomization in experiments; (4) use software (Design-Expert or
MINITAB) to design and analyze a factorial experiment, including
generating an ANOVA table and interpreting model coefficients.
(This chapter corresponds to planning and executing an experimental
program and analyzing it – a key learning outcome for the course. It
also aligns with the idea of weekly lab assignments where students
practice DOE in software.)
8.1 Why Design Experiments?
In chemical engineering, often we want to study the effect of several
factors (e.g. temperature, pressure, concentration) on an outcome (yield,
purity, conversion). Rather than vary one factor at a time (OFAT),
Design of Experiments (DOE) provides a structured, efficient
approach. DOE allows us to see interaction effects and get more
information from fewer runs.
Example scenario: You have a reactor and want to see how temperature
(high/low) and catalyst type (A/B) affect conversion. A full factorial
design with 2 factors at 2 levels each requires 2×2 = 4 experiments (plus
repeats possibly). That will show if, say, high temperature is better, A vs
B which is better, and if the effect of catalyst depends on temperature
(interaction).
Key design principles:
Randomization: Run experiments in random order to avoid time-
related biases (like ambient conditions changing).
Replication: Repeat runs to estimate experimental error (noise).
Blocking: If some nuisance factor (like day or operator) might
affect results, structure experiment to block it (so each day runs all
conditions perhaps, then analyze day-to-day variation separately).
8.2 Full Factorial Designs and Notation
A full factorial design at two levels for k factors is 2^k runs (not
counting replicates or center points). At two levels, factors often coded
as -1 = low, +1 = high (coded units). For example, a 2² design:
Run1: A=-1,B=-1 (low A, low B)
Run2: A=+1,B=-1
Run3: A=-1,B=+1
Run4: A=+1,B=+1
These combinations are often visualized in a matrix or a cube for 3
factors (like Fig. 3.2 in NIST reference – a cube for 2³ design with
8 runs).
We can have 3-level factors, but that increases runs (3^k). Often we
stick to 2-level (for screening factors for significance) or fractional
designs to reduce runs (Chapter 10 might mention fractional, but let's
keep to full factorial here).
The advantage is we can estimate:
Main effects: effect of changing one factor from low to high
(averaged over other factors).
Interaction effects: e.g. AB interaction = does the effect of A
depend on level of B? Statistically, an interaction is present if the
difference in response between A high vs low is different when B
is high versus low.
ANOVA in DOE: We build a model:
y=β0+βAxA+βBxB+βABxAxB+errory = \beta_0 + \beta_A x_A + \
beta_B x_B + \beta_{AB} x_A x_B + \text{error}. For coded factors,
βA\beta_A is half the effect of A (difference between its two level
means). βAB\beta_{AB} captures interaction (difference in differences).
ANOVA can partition sums of squares into contributions for A, B, AB,
and error (if replicates exist). The p-value for each indicates if that effect
is significant.
Example (2² with replication): Suppose we run 4 combinations twice
each:
Low A, Low B yields: 50, 52
High A, Low B: 55, 54
Low A, High B: 60, 59
High A, High B: 70, 68
We can see trends: Increasing A (low→high) at B low: ~52→54.5
(effect +2.5). At B high: ~59.5→69 (effect +9.5). So A’s effect seems
bigger when B is high – an interaction.
Compute averages:
(We can formal: A effect = avg(High A) - avg(Low A) = ((54.5+69)/2 -
(51+59.5)/2) (but since equal replicates, easier individually:)
Low A overall mean = (50+52+60+59)/4 = 55.25.
High A overall mean = (55+54+70+68)/4 = 61.75.
So main effect A ~ +6.5 increase.
B effect:
Low B mean = (50+52+55+54)/4 = 52.75.
High B mean = (60+59+70+68)/4 = 64.25.
B effect ~ +11.5 increase from low to high.
Interaction AB:
If no interaction, the effect of A at low B (which was +2.5) would equal
effect of A at high B (+9.5) – clearly not equal. We compute AB
interaction effect as (observed A effect difference between B levels)/2
presumably = (9.5-2.5)/2 = 3.5 (depending on contrast coding). But
conceptually, AB is significant if differences like that are big relative to
noise.
We would do ANOVA (with error df=4*(2-1)=4 replicates deg) to
confirm significance. Likely B is significant (bigger effect), A maybe
significant, AB possibly significant too.
Design-Expert/MINITAB usage: Software can generate runs (with
random order) and then analyze.
Design-Expert specifically is built for DOE:
You specify factors (names, levels), choose design (full factorial,
etc.), it gives run order.
After experiments, you input results and use Analyze to fit a model.
It gives ANOVA table, showing which factors have significant p
(often highlighting in bold), model R², etc. It also can show effect
plots (pareto chart of effects, interaction plots, etc.).
MINITAB:
Use Stat > DOE > Factorial > Create Factorial Design (choose
factors, replicates, blocks).
Then Stat > DOE > Factorial > Analyze Factorial Design, select
terms (A, B, interactions) to include, and get output including an
ANOVA and coefficients.
We encourage students to try a simple DOE in software as a tutorial
exercise, e.g. a 2³ with hypothetical data, to see how results are
displayed.
8.3 Planning an Experiment (Example with 2 Factors)
Case Study: A Nigerian paint manufacturer wants to improve drying
time of paint. Factors: Temperature (25°C vs 35°C) and Drying Agent
additive (No vs Yes). Full factorial 2², duplicate runs. They randomize
run order. After collecting drying time data, they do ANOVA:
If factor Temp p<<0.05 (say high temp dramatically lowers drying
time) and additive p ~0.01 (additive also lowers time), and
interaction p ~0.3 (no strong interaction), then main effects are
additive. Interpretation: Both higher temp and additive
independently speed up drying, and their effects roughly add (no
unexpected synergy or conflict).
If an interaction were significant, e.g. maybe additive works only
at higher temp, they'd see that in interaction plot (lines crossing).
Based on stats, they decide to implement high temp + additive
(provided any practical constraints like cost or equipment allow).
Important LO: "make appropriate conclusions based on experimental
results; plan and execute experimental program". That implies students
should be able to interpret an ANOVA and identify which factors matter
and then pick optimal conditions accordingly.
8.4 One-Way vs Factorial ANOVA
One-way ANOVA is one factor with multiple levels (like testing 5
catalysts, single factor "catalyst" with 5 levels). Factorial ANOVA
(Two-way or more) includes interactions. If we replicate each cell, we
can also separate interaction and error.
ANOVA Table for Factorial Example: Taking our 2² example, the
table might look like:
Source df SS MS F p
A 1 SSA SSA/1 F_A p_A
B 1 SSB SSB/1 F_B p_B
Source df SS MS F p
A*B 1 SSAB SSAB/1 F_AB p_AB
N - (effects+1) e.g. 8 total runs - (3
Error SSE MSE
effects +1) = 4 df
Total 7 SST
Each SS can be computed by contrasts (there are formulas using +1/-1
coding: e.g. SSA = [sum of (+1/-1 coded for A)*y / (2^(k-1))]^2 etc., but
not needed detail here).
The p-values tell significance. If the model has significant terms, one
might drop non-significant terms and refit (hierarchically – usually if
interaction not sig, drop it and re-estimate main effects with more df for
error).
Link to Communication: Students should present DOE results in
reports or presentations (LO7: communicate results in number of ways).
Typical DOE result communication includes:
Graphical: interaction plots, main effect plots (Design-Expert can
plot these).
Numerical: "ANOVA showed factor A (p=0.002) and factor B
(p=0.0005) significantly affect yield, AB interaction (p=0.4) not
significant. Thus, factors act mostly independently. The high levels
of both factors gave highest yield (95%), compared to ~80% at low
settings. We recommend using high A and high B."
8.5 Lab Task: DOE in Software
A suggested tutorial: Use Design-Expert (or MINITAB) to create a 2³
design (3 factors at 2 levels, perhaps Temperature, Pressure,
Concentration in a reactor). Provide some hypothetical or actual data.
Then:
Show how to input it, get ANOVA results, and identify which
factors or interactions are significant.
Possibly illustrate a normal probability plot of effects (common
in DOE analysis) which helps spot significant effects visually:
effects falling off straight line are likely significant.
Screenshot Example: We might include a screenshot of Design-
Expert’s ANOVA or an interaction plot.
Figure 8.1: 3D response surface from a Design-Expert RSM example,
illustrating how two factors (e.g., time and temperature) affect a
response (conversion). While this comes from a response surface design
(Chapter 10), in a 2-factor factorial scenario an interaction can
sometimes be visualized as surfaces that are not parallel planes. Here,
the curvature indicates interaction and quadratic effects. In factorial
ANOVA, significant interactions would be identified by statistical tests
rather than smooth surfaces.
(The figure above is an example from RSM where a 3D plot is available.
For a 2-level factorial, since responses only measured at corners, we
often visualize with interaction plots (lines). But due to limitations, we
show an RSM surface as a visual aid, acknowledging we'll formally
cover RSM next chapter.)
8.6 Measuring Experimental Error and Lack of Fit
If replicates are done, we get an estimate of pure error (the variation in
repeating the same conditions). ANOVA uses that as the denominator in
F-tests. If no replicates, one can’t separate error – then either assume
effects with no df leftover equals error (risky) or do center points etc. It's
advanced, but likely outside scope of undergrad intro beyond
encouraging replication.
Lack of fit vs pure error: Particularly in RSM designs, one can test if a
model (like a linear model) fits well or there's curvature unexplained.
For factorial at 2-level, you can't detect curvature with only corners, but
adding center runs allows testing lack of fit. This might be too advanced
here but just to mention: If center points included and the means at
center differ significantly from predicted by linear model (p<0.05 in
LOF test), that signals curvature, pushing toward a response surface
design (Chapter 10 topic).
8.7 Example from Nigerian Industry
Suppose a petrochemical plant wants to maximize distillation throughput
by adjusting reflux rate (low/high) and reboiler heat input (low/high).
They do 2² factorial with replicates. ANOVA shows a significant
interaction: maybe increasing reflux helps only if reboiler heat is also
high, otherwise it doesn’t. So the best operation is both high (the
combination effect > individual). They then implement that in the plant.
Another scenario: A lab in a Nigerian university tests how pH (6 vs 8)
and stirring speed (100 vs 300 rpm) affect biodiesel yield from palm oil.
They do DOE, find stirring has a big effect, pH not much, no interaction.
Conclusion: focus on optimizing mixing.
Chapter Summary: We have introduced the structured approach of
factorial experiments and how ANOVA is used to interpret them,
identifying which factors significantly impact an outcome and whether
factors interact. Students should now appreciate why planning an
experiment with multiple factors is more efficient than changing one
factor at a time, and how statistical analysis of the results gives objective
conclusions. In practice, DOE is a powerful tool for process
development and optimization, and software tools like Design-Expert
can greatly aid in both design and analysis (with visual aids like 3D plots
to help interpret results).
In the next chapter, we will extend these ideas to more advanced designs
(such as response surface methodology for optimizing a process via
quadratic models) and discuss handling of more complex scenarios (like
if you have many factors but want to minimize runs via fractional
factorials, or if factors have more than two levels).
End-of-Chapter Exercises:
1. Factorial planning: You have 3 factors to test (Catalyst type,
Temperature level, and Solvent type). Each can be at 2 levels. (a)
How many runs are in a full factorial 2³? (b) Why is it generally
better to do this factorial design than to keep one factor constant
and vary others one by one (consider interactions)? (c) If resources
only allow 8 runs (which is 2³ exactly) with no replicates, what
assumption are you making about error or about the significance of
effects?
2. Identifying interactions: Consider a 2² experiment with factors X
and Y. The results (avg responses) are: Low X, Low Y = 10; High
X, Low Y = 15; Low X, High Y = 20; High X, High Y = 30. (a)
Calculate the effect of X at low Y and at high Y. (b) Is there an
interaction? (c) Plot an interaction plot (sketch X on x-axis with
two lines for Y levels).
3. ANOVA table understanding: In a 2² with 3 replicates (so 12
runs), how many degrees of freedom for error do you get? If an
ANOVA produced F-statistics for factor A as F=10 (p=0.005),
factor B as F=0.5 (p=0.49), and interaction AB as F=0.2 (p=0.66),
which terms are significant? What would you conclude about
factor B and the interaction?
4. Software DOE: Use a software or manual calculation to analyze
the following factorial data (2 factors A and B at 2 levels, single
replicate):
Run (A,B): (-,-)=50; (+,-)=55; (-,+)=52; (+,+)=60.
(a) Compute main effects of A and B. (b) Compute the interaction
effect (hint: contrast = (observed (+,+)+(-,-) - [(+,-)+(-,+)]) / 2 ).
(c) Which effect seems largest? (d) If you had an estimate of error
with s_e = 2, would the largest effect be significant (say t =
effect/(s_e*0.5) > 2)?
5. Practical DOE design: Outline a small experiment (either real or
hypothetical) in your field where you would use a factorial design.
Describe factors and levels, and what response you measure. How
would randomization be done? After the experiment, assume you
got some results - describe qualitatively how you'd determine
which factors matter (what would you look at in the data or
analysis).
6. Linking with previous chapters: Why is randomization important
in DOE in terms of statistical assumptions? (Hint: if runs weren’t
randomized, how could that violate independence or introduce
bias?) Also, when we replicate runs and do ANOVA, which
previous concepts are used to assess if differences are real or just
due to variability? (Hint: think hypothesis testing – ANOVA’s F-
test is essentially testing H₀: all means equal.)
Chapter 9: Advanced Experimental Design – Response Surface
Methodology and Optimization
(Chapter 9 and beyond will introduce more advanced topics like RSM,
which involve big data analytics and optimization. Due to length, we
provide a brief outline here, and note that content continues in full
textbook...)
(The textbook would continue with chapters on response surface designs
(Box-Behnken, Central Composite), discussion of big data analytics in
chemical engineering (Chapter 11), cloud computing applications
(Chapter 12), and finally communication of statistical findings (Chapter
13), each with theory, examples, and exercises, following the
comprehensive approach outlined.)