Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
268 views122 pages

Statistics For Chemical Engineers

The document outlines a comprehensive course on statistics tailored for chemical engineers, covering essential topics such as data visualization, probability, hypothesis testing, and advanced experimental design. It emphasizes the importance of statistical methods in analyzing data, ensuring measurement accuracy, and optimizing chemical processes. Additionally, contemporary issues like big data analytics and cloud computing in the chemical industry are addressed to prepare students for modern engineering challenges.

Uploaded by

ce3221
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
268 views122 pages

Statistics For Chemical Engineers

The document outlines a comprehensive course on statistics tailored for chemical engineers, covering essential topics such as data visualization, probability, hypothesis testing, and advanced experimental design. It emphasizes the importance of statistical methods in analyzing data, ensuring measurement accuracy, and optimizing chemical processes. Additionally, contemporary issues like big data analytics and cloud computing in the chemical industry are addressed to prepare students for modern engineering challenges.

Uploaded by

ce3221
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 122

Statistics for Chemical Engineers

Table of Contents:

1. Chapter 1: Introduction to Statistics in Chemical Engineering

– Role of statistics in chemical engineering; types of data; levels of

measurement; importance of data accuracy in engineering

contexts; overview of course outcomes.

2. Chapter 2: Data Visualization and Descriptive Statistics –

Graphical representation of data (histograms, box plots, scatter

plots); frequency distributions; measures of central tendency and

variability; summary statistics for chemical process data; case

study on industrial data visualization.

3. Chapter 3: Probability Fundamentals and Distributions –

Basic probability concepts; random variables; common

distributions (normal, binomial, etc.) in chemical engineering; the

normal distribution and standard scores; applications to process

data and quality control.


4. Chapter 4: Measurement Accuracy and Data Quality –

Accuracy vs. precision; reliability of measurements; sources of

error in experiments; calibration of instruments; linking

measurement with statistics to analyze measurement error;

ensuring data quality in lab and industry.

5. Chapter 5: Sampling and Confidence Intervals – Populations

and samples; random sampling techniques; Central Limit Theorem

and sampling distributions; point and interval estimation;

constructing and interpreting confidence intervals for means and

proportions.

6. Chapter 6: Hypothesis Testing and Statistical Inference –

Formulating null and alternative hypotheses; significance levels

and p-values; type I and II errors; power of a test; one-sample z

and t-tests; making conclusions from data with statistical rigor.

7. Chapter 7: Comparing Two or More Groups – Paired and two-

sample t-tests; experimental design principles for comparative

studies; one-way ANOVA for multi-level factor comparison;


assumptions (normality, equal variance); industrial example of

comparing process means.

8. Chapter 8: Correlation and Regression Analysis – Scatter plots

and correlation coefficients; linear regression modeling; least

squares method; interpreting regression output (slope, R², p-

values); multiple linear regression basics; case studies predicting

chemical process outcomes.

9. Chapter 9: Design of Experiments (DOE) – Fundamentals –

Planning experiments (factorial design concepts); factors, levels,

and responses; full factorial designs at two levels; randomized run

order and replication; analysis of variance (ANOVA) in DOE;

main effects and interaction effects; example of a 2×2 factorial

experiment in a pilot plant.

10. Chapter 10: Advanced Experimental Design – RSM and

Optimization – Response Surface Methodology (RSM) for

process optimization; Central Composite and Box-Behnken

designs; optimization of multiple factors; use of Design-Expert


software for RSM; 3D surface plots and contour plots for response

visualization; industrial case optimizing a chemical reaction yield.

11. Chapter 11: Big Data Analytics in Chemical Engineering

– Introduction to big data (volume, variety, velocity); data sources

in chemical plants (sensors, IoT, labs); data mining and machine

learning applications; process data analysis at scale; case study of

big data improving efficiency; global trends and Nigerian industry

perspectives on big data adoption.

12. Chapter 12: Cloud Computing and Industry 4.0

Applications – Cloud computing basics and benefits for

engineering; IIoT (Industrial Internet of Things) and cloud

integration; collaborative cloud-based tools in chemical

engineering (remote monitoring, virtual labs); cybersecurity and

data management considerations; example of cloud-based

monitoring in a Nigerian oilfield.

13. Chapter 13: Communicating Statistical Results –

Effective communication of data and conclusions; visual

presentation best practices; technical report writing in engineering;


interpreting and reporting statistical findings clearly; global

standards and Nigerian context in technical communication;

chapter summary with tips for clear writing.

Chapter 1: Introduction to Statistics in Chemical Engineering

Learning Objectives: By the end of this chapter, students should be

able to: (1) explain why statistical methods are essential in chemical

engineering practice; (2) identify different types of data and levels of

measurement; (3) understand the concepts of accuracy and reliability of

measurements in engineering experiments; and (4) outline the scope of

statistical techniques covered in this course.

Statistics plays a critical role in chemical engineering by enabling

engineers to analyze data, draw evidence-based conclusions, and make

informed decisions in the face of variability. Chemical processes often

involve complex systems with inherent fluctuations (e.g. temperature,

pressure, feed composition), and statistical analysis helps in


understanding these variations to ensure optimal and safe operation.

Why study statistics as a chemical engineer? In practice, engineers

use statistics to design experiments for product development, monitor

process quality, develop empirical models of processes, and assess the

uncertainty in measurements and results. For example, when scaling up

a reactor from laboratory to plant, a chemical engineer must use

statistical data analysis to ensure that the scale-up does not introduce

unanticipated variability in yield or safety parameters.

Types of Data and Measurement Levels: Statistical analysis begins

with understanding the nature of data. Data can be broadly classified as

qualitative (categorical) or quantitative (numerical). In a chemical

engineering context, qualitative data might include categories like

"pass/fail" for a quality inspection or catalyst type A vs. B, whereas

quantitative data include measurements such as temperature (°C),

pressure (bar), concentration (mol/L), or pH. Quantitative data further

subdivide into discrete (countable values like number of defective

bottles) and continuous (measurable on a continuum, like reaction time


or density). It is also important to recognize the level of measurement

of data: nominal (categories without order, e.g. reactor ID), ordinal

(ranked categories, e.g. catalyst activity rated as high/medium/low),

interval (numeric scales without a true zero, e.g. temperature in Celsius

where 0 is arbitrary), and ratio (numeric scales with a meaningful zero,

e.g. absolute pressure or volume). Understanding data types and

measurement levels guides the choice of statistical methods and

graphical representations for analysis.

Role of Statistics in the Engineering Method: Chemical engineering

problems are typically approached via the engineering method: define

the problem, collect data, analyze and interpret data, and make decisions

or design solutions. Statistics is indispensable in the data collection and

analysis stages. For instance, when developing a new polymer, an

engineer will design experiments (using statistical DOE principles) to

test the effect of factors like temperature, catalyst, and time on polymer

properties. The collected data must be analyzed statistically to determine

which factors are significant and what the optimal conditions are, all
while accounting for experimental error. Statistics provides tools for

distinguishing real effects from random noise.

Accuracy and Reliability of Measurements: A foundational concept

for chemical engineers is appreciating how accurate and reliable their

measurements are. Measurements form the basis of all data analysis – if

measurements are flawed, conclusions will be too. Accuracy refers to

how close a measurement is to the true value, while precision refers to

the consistency of repeated measurements. In a chemical engineering lab

or plant, many sources of error can affect measurements: instrument

calibration errors, environmental fluctuations, or operator errors. For

example, consider measuring the flow rate of a liquid through a pipe – if

the flow meter is not calibrated correctly, it might consistently read 5%

high (a systematic error affecting accuracy). On the other hand, if the

flow fluctuates and the meter responds slowly, repeated readings might

scatter (affecting precision). In later chapters, we will learn statistical

tools to quantify measurement variation and assess data reliability

(Chapter 4 will delve deeper into measurement accuracy analysis).


Course Scope and Outcomes: This textbook aligns with the TCH 206

course outcomes for Statistics for Chemical Engineers. Broadly, the

course will cover statistical techniques used in data analysis, linking

measurements with statistical analysis to evaluate accuracy, as well as

classical inference methods and modern topics. Students will learn how

to construct effective data visualizations and summaries (Chapter 2),

perform statistical inference such as confidence intervals and hypothesis

tests (Chapters 5–6), apply regression and correlation to model

relationships (Chapter 8), and design experiments for process

optimization (Chapters 9–10). In addition, contemporary topics like big

data analytics and cloud computing in the chemical industry will be

introduced (Chapters 11–12) to highlight the growing importance of data

science skills for engineers in the era of Industry 4.0. Students will also

gain experience with statistical software (MINITAB for general data

analysis and Design-Expert for experimental design) to perform data

analysis tasks efficiently. Ultimately, the goal is to enable future

chemical engineers to confidently analyze data and draw valid

conclusions – whether it’s understanding if a new process increases


yield significantly, or communicating the results of a pilot plant trial to

management with solid statistical backing.

Chapter Summary: In summary, statistics provides chemical engineers

with a powerful toolkit to deal with variability in processes and

experiments. Key takeaways from this introductory chapter are the

importance of statistical thinking in engineering, familiarity with data

types (qualitative vs. quantitative) and measurement levels, and an

appreciation for the accuracy and precision of data. These concepts lay

the groundwork for all subsequent topics. In the next chapter, we will

begin our deep dive by looking at how to describe and visualize data –

the first step in any statistical analysis.

End-of-Chapter Exercises:

1. Conceptual question: List three examples of problems in

chemical engineering where statistical analysis would be helpful.

For each, briefly describe why variability is a concern (e.g.


monitoring a production process for quality control, comparing

performance of two catalyst formulations, etc.).

2. Data types: Identify the type of data for each of the following and

its level of measurement: (a) pH of a solution, (b) Batch ID

number, (c) “Pass/Fail” result of a pressure test, (d) catalyst

concentration in % by weight, (e) ranking of chemical reactor

safety as high/medium/low.

3. Accuracy vs. precision: Suppose a temperature sensor in a reactor

is known to have a calibration offset of +2°C (reads 2°C higher

than actual). You take five readings in a steady-state system and

get 100.5, 100.7, 100.6, 100.8, 100.5 °C. Describe the accuracy

and precision of this sensor based on these readings. How would

you correct for the offset?

4. Case study reflection: A Nigerian chemical plant manager says,

“We’ve operated this process for 20 years; I can control it by

experience without statistics.” How would you respond in defense

of using statistical methods? Mention at least two statistical tools


that could improve process understanding or efficiency even for an

experienced engineer.

5. Software exploration (ungraded): If you have access to a

statistics software like MINITAB, load any sample dataset (or

create a small dataset of 10–20 points, e.g. measurements of a

quantity). Use the software to generate basic descriptive statistics

(mean, standard deviation) and a simple plot. This will give you

familiarity with the interface, which will be built upon in later

chapters.

Chapter 2: Data Visualization and Descriptive Statistics

Learning Objectives: By the end of this chapter, students will be able

to: (1) organize raw data into meaningful frequency distributions; (2)

construct appropriate graphical displays of data (such as histograms, bar

charts, box plots, and scatter plots) and understand their role in data

analysis; (3) calculate and interpret descriptive statistics (mean, median,


mode, range, variance, standard deviation) for a dataset; and (4)

summarize and interpret basic industry data to extract key insights.

Once data are collected in a chemical engineering context, the first step

is often to summarize and visualize them. Data visualization is critical

because it provides insight at a glance, revealing patterns, trends, or

anomalies that may not be obvious from raw data. As one learning

outcome of this course emphasizes, chemical engineers must be able to

construct graphical displays and recognize their importance in analysis.

This chapter covers common methods of organizing data and the core

plots and statistics used to describe data sets.

2.1 Organizing Data: Frequency Distributions and Histograms

A frequency distribution is a table that displays how data are

distributed across various intervals or categories. For example, imagine

an engineer measures the viscosity of 50 polymer samples in a quality

lab. Listing all 50 values provides little insight, but grouping them (e.g.

count how many samples fall into viscosity ranges 50–60 Pa·s, 60–70
Pa·s, etc.) creates a clearer picture of the data’s spread. From frequency

distributions, we often create histograms: bar charts showing frequency

(or relative frequency) on the y-axis against the data value intervals on

the x-axis.

Example: Suppose a water treatment plant in Nigeria measures the

turbidity (cloudiness) of water (in NTU units) in 100 batches. The data

range from 0 to 5 NTU. We could divide 0–5 NTU into 10 equal bins of

width 0.5 NTU and count how many batches fall in each bin. Plotting

these counts yields a histogram that might show, for instance, that most

batches have turbidity between 1.0–2.0 NTU, with a few batches having

higher turbidity (potential outliers). Such visualization immediately flags

whether the majority of water batches meet a turbidity target (say ≤ 1.5

NTU) or if there is a long tail of higher turbidity needing investigation.

Interpreting Histograms: A histogram gives a sense of the data’s

distribution shape – whether it is symmetric, skewed to the left or right,

unimodal or multimodal. Chemical process data often approximate a

normal distribution (bell curve) if only common random variation is


present. However, processes can also produce skewed distributions (e.g.

if a measurement has a natural lower bound at 0 but can vary high, the

data might be right-skewed). Outliers, if present, will appear as isolated

bars far from the bulk of the data. Recognizing these patterns is

important. For instance, a right-skewed distribution of impurity

concentration might suggest occasional high spikes due to disturbances,

whereas a bimodal distribution of product purity might indicate two

distinct operating modes or shifts between catalysts.

Constructing Histograms (Manual vs. Software): Manually, one

chooses bin ranges and counts frequencies. In practice, statistical

software like MINITAB greatly simplifies this – you can input data and

use menu options to create histograms instantly. The choice of bin width

can affect the appearance (too few bins may hide detail; too many may

show noise). MINITAB’s default algorithms often choose a reasonable

bin size, but an engineer should adjust it if needed to better reveal the

underlying pattern.
Software Tutorial – Creating a Histogram in MINITAB: To

reinforce the concept, let's walk through using MINITAB to create a

histogram. Suppose we have the 50 polymer viscosity measurements

mentioned earlier. In MINITAB:

 Enter or import the viscosity data into one column (e.g., C1) in the

worksheet.

 Go to the Graph menu and choose Histogram > Simple.

 Select the viscosity data column for the graph and click OK.

MINITAB will output a histogram of the viscosity data. The

software will automatically provide a frequency count for each bin

and display the distribution shape. Always label the axes clearly

(e.g. "Viscosity (Pa·s)" on x-axis, "Frequency of Samples" on y-

axis) – clear labeling is essential for good statistical

communication.

Figure 2.1: MINITAB Graphical Summary output for an industrial

dataset (cap removal torque for shampoo bottles). The histogram

(center) shows the distribution of torque measurements, and summary


statistics are displayed on the side. In this case, 68 bottle caps were

tested; the mean removal torque is about 21.27 Nm, which is higher

than the target of 18 Nm (indicated by the reference line). The wide

spread and the 95% confidence interval (19.71 to 22.82 Nm) suggest

variability and a systematic bias above the target.

The embedded Figure 2.1 illustrates how a graphical summary in

MINITAB can combine both visualization and statistics. This example

(based on a case of shampoo bottle cap torque) demonstrates an

engineer’s initial analysis of data: the histogram reveals many bottles

require torque well above the target 18 Nm to open, and the statistics

confirm the mean is significantly above 18 Nm. Such a display

immediately tells us there’s a potential quality issue – caps are on too

tight on average, which might make them hard for customers to open.

We will revisit the statistical implications (e.g. performing a one-sample

t-test against the target in Chapter 6), but even at the descriptive stage,

the visualization and summary inform engineering decisions (perhaps

the capping machine needs adjustment to lower the torque).


2.2 Other Common Plots: Box Plots, Scatter Plots, and Time-Series

Plots

Besides histograms, other plots are extremely useful in chemical

engineering data analysis:

 Box Plot (Box-and-Whisker Plot): A compact display of

distribution based on quartiles. A box plot shows the median,

interquartile range (IQR), and potential outliers. It’s especially

useful for comparing distributions between multiple groups. For

example, an engineer might compare the distribution of polymer

viscosity for three different catalyst formulations using side-by-

side box plots. If one formulation has a much wider IQR or many

outliers, it indicates higher variability or inconsistent quality. Box

plots are great for quickly conveying differences in medians and

variability across categories.

 Scatter Plot: A scatter plot displays pairs of data points (X,Y) and

is fundamental for investigating relationships between two

variables. In chemical engineering, scatter plots help identify


correlations (Chapter 8 will delve deeper into correlation and

regression). For instance, plotting reactor temperature (X) vs.

conversion percentage (Y) across several runs might show an

upward trend (suggesting higher temperature yields higher

conversion) or no clear pattern (indicating temperature might not

significantly affect conversion within the tested range). Scatter

plots often include a fitted line to indicate the trend (this enters

regression territory). We preview here that if a scatter plot shows a

roughly linear trend with points tightly clustered about a line, the

correlation is strong (positive or negative slope indicates

direction). If points are widely scattered without form, correlation

is weak or nonexistent.

 Time-Series Plot (Run Chart): When data are collected over time

(e.g. hourly temperature readings from a distillation column),

plotting the value against time order reveals trends, cycles, or

drifts. A time-series plot could uncover, for example, a periodic

oscillation in reactor pressure (perhaps correlating with a daily

ambient temperature cycle) or a drift upward in impurity


concentration over a campaign (maybe due to catalyst aging). Such

plots are often the first step in process monitoring and control (and

relate to the field of statistical process control, not covered in depth

here but important in quality assurance).

Case Study (Data Visualization in Action): A global chemical company,

say Dow Chemical, might monitor a polymerization reactor. Key

variables like temperature, pressure, and monomer conversion are

recorded every minute. Engineers use time-series plots to ensure the

reactor operates steadily. In one instance, a time plot of pressure showed

an upward drift over several hours. By visualizing this, engineers

identified a fouling issue in a vent line before it reached a critical

pressure. They intervened early, avoiding a shutdown. Additionally,

scatter plots of historical data might be used – plotting polymer

molecular weight vs. reaction time revealed two clusters corresponding

to two catalyst batches, indicating a catalyst quality issue. This mix of

histograms, scatter plots, and time plots allowed engineers to diagnose

and communicate issues effectively to the team.


2.3 Descriptive Statistics: Measures of Central Tendency and

Variability

While plots give a visual summary, descriptive statistics provide

numeric summaries of the data. The most common measures are:

 Mean (Arithmetic Average): Sum of all data values divided by

the count. The mean provides a measure of central tendency – e.g.

the average yield of a chemical process over 10 runs might be

85%. It is sensitive to outliers; one extremely low or high value

can skew the mean.

 Median: The middle value when data are sorted (or average of two

middle values for even count). The median is a robust measure of

central tendency, less affected by outliers. For skewed

distributions, the median can be more representative than the

mean. For example, if measuring the time to failure of pump seals

where most last around 12 months but one failed at 1 month, the

mean might be much lower than the median. The median conveys

the “typical” value even in skewed scenarios.


 Mode: The most frequently occurring value or range. In a

continuous data set it may not be very meaningful unless the

distribution has a clear peak or repeated readings. However, for

categorical data (e.g. most common defect type in a batch of

products), the mode is useful.

 Range: The difference between the maximum and minimum. It

gives the total spread. However, range uses only extreme values

and thus can be misleading if those extremes are outliers.

 Variance and Standard Deviation: Variance is the average of

squared deviations from the mean. Standard deviation (SD) is the

square root of variance, bringing it back to the original unit. The

SD is perhaps the most important measure of variability – it tells

us, on average, how far data points lie from the mean. A small SD

relative to the mean implies data are tightly clustered (high

consistency), whereas a large SD indicates widespread data (high

variability). For example, two reactors might have the same

average conversion of 80%, but if Reactor A has SD of 2% and


Reactor B has SD of 8%, Reactor A’s performance is much more

consistent (predictable) than Reactor B’s.

Calculation Example: Ten samples of a specialty chemical are titrated to

determine purity (%). The results are: 91, 89, 90, 95, 88, 90, 91, 87, 92,

89 (% purity). We can compute descriptive stats:

 Mean = (91+89+...+89) / 10 = 90.2% purity.

 Sorted values: 87, 88, 89, 89, 90, 90, 91, 91, 92, 95; Median = (90

+ 90)/2 = 90%.

 Mode = 89 and 90 (both appear twice – a dataset can be bimodal).

 Range = 95 – 87 = 8% purity.

 Variance (sample variance) = sum of squared deviations from

mean / (n–1). Without showing full calculation here, suppose we

compute variance ≈ 6.84 (percent²), then SD ≈ √6.84 ≈ 2.62%.

Interpretation: the purity tends to vary about ±2.62% around the

mean of 90.2%. So a typical batch is between ~87.6% and 92.8%

purity (within one SD of mean) assuming roughly normal

variability.
Interpreting Variability in Context: In chemical engineering,

understanding variability is crucial. Consider if the purity spec minimum

is 90%. In the above example, the mean is slightly above 90%, but the

SD of ~2.6% indicates a substantial chance of falling below spec on any

given batch (in fact, 3 of 10 samples were below 90%). Merely reporting

the mean would be misleading; the variability tells the full story that the

process may not be reliably hitting the purity target. Later chapters on

hypothesis testing will formalize how we can assess if the process mean

truly differs from 90% and what proportion might fall out of spec, but

the descriptive stats already alert the engineer to an issue. Perhaps

process adjustments or tighter control are needed to reduce variability.

Skewness and Kurtosis (Briefly): Some descriptive measures go

beyond spread and center to characterize distribution shape. Skewness

indicates asymmetry (right-skew vs left-skew); kurtosis indicates how

heavy-tailed or peaked a distribution is relative to normal. These are

more advanced descriptors and are often included in software outputs

(MINITAB’s graphical summary, for example, lists them). Typically,


for process monitoring, an engineer might note if skewness is

significantly non-zero, as it might violate assumptions of normality for

certain statistical models. In our discussions, we will primarily focus on

practical implications (e.g. presence of skewness might suggest a data

transformation is needed for regression).

2.4 Using MINITAB for Descriptive Statistics

MINITAB can quickly compute all the above statistics and more. Using

the purity data example:

 Enter the 10 purity values in a column.

 Use Stat > Basic Statistics > Display Descriptive Statistics. Select

the column and MINITAB will output count, mean, median,

minimum, maximum, SD, etc.

 Alternatively, use Assistant > Graphical Analysis > Graphical

Summary to get a combination of stats and plots (as shown in

Figure 2.1 for the torque case).


Interpreting MINITAB output is straightforward because it labels each

statistic. One valuable piece in the output is the 95% confidence interval

for the mean (which foreshadows Chapter 5 content). In the torque

example in Figure 2.1, the 95% CI for mean torque was roughly 19.71 to

22.82 Nm. This interval provides a range in which we are 95% confident

the true mean lies, giving a sense of estimation precision. We will

formally cover confidence intervals soon, but it’s worth noting how

descriptive and inferential stats blur together in output – a graphical

summary gives a sneak peek of inferential thinking (estimating the true

mean from sample data).

Case Study – Visualizing Nigerian Industry Data: Let’s consider a

Nigerian context. A paint manufacturing company in Lagos collects data

on the viscosity of paint batches (a critical quality parameter). They

compile a month’s data of daily measurements. Using descriptive tools:

a histogram might reveal the viscosity is mostly around target but

occasionally spikes high (potentially due to pigment aggregation on

some days). A time plot might show a pattern that viscosity tends to
increase towards the end of production runs (perhaps as solvent

evaporates). A box plot comparing batches from two different

production lines might show one line has consistently higher variability.

By presenting these plots and stats at a quality meeting, the engineers

communicate clearly where the process stands. Management can easily

see from a box plot which line is more consistent, or from a histogram

whether most products meet the spec. In sum, effective visualization and

descriptive statistics turn raw numbers into actionable understanding.

Chapter Summary: In this chapter, we learned how to condense and

visualize data for insight. Good graphs and summaries are the

foundation of data analysis – often allowing engineers to spot issues and

formulate hypotheses even before formal tests are applied. Key points

include the use of histograms for distribution shape, box plots for

comparing groups, scatter plots for relationships, and numeric

descriptors like mean and standard deviation to quantify the data’s

center and spread. In the next chapter, we will build on this by

introducing probability theory and theoretical distributions, which


underpin the interpretation of our descriptive measures (for example,

understanding why data often follow a bell-curve and how we quantify

“rare” outlier events statistically).

End-of-Chapter Exercises:

1. Interpreting a Histogram: You are given the following frequency

distribution of impurity levels (%) in 50 batches of a chemical

product:

Impurity 0–1%: 5 batches; 1–2%: 20 batches; 2–3%: 15 batches;

3–4%: 8 batches; 4–5%: 2 batches.

(a) Sketch the histogram for this data. (b) Describe the shape (is it

skewed?). (c) If the specification limit for impurity is 3%,

approximately what fraction of batches are out of spec?

2. Descriptive Calculations: A catalyst testing lab measures catalyst

surface area (m²/g) for 8 samples: 120, 135, 130, 128, 140, 150,

132, 128. Calculate the mean, median, mode, range, and standard

deviation (you may do this by hand or use a calculator/software).


Comment on any difference between mean and median and what

that implies about skewness.

3. Plot Selection: For each scenario, state which type of plot is most

appropriate and why: (a) Comparing the distribution of octane

ratings from two different refineries; (b) Checking if there is a

relationship between ambient temperature and hourly production

rate in an outdoor processing unit; (c) Determining if a

measurement instrument’s readings drift over a 24-hour continuous

operation.

4. Using Software: Input the catalyst surface area data from exercise

2 into MINITAB or another tool. Generate a box plot and

descriptive statistics. Does the software output match your manual

calculations? Include a printout or description of the output (mean,

SD, etc.).

5. Case Study Discussion: A report on a Nigerian brewery’s

operations includes a statement: “The average fill volume of beer

bottles is 600 ml with a standard deviation of 5 ml.” Explain in

simple terms to a brewery manager what this means. If the legal


requirement is at least 590 ml in each bottle, what concerns might

you have based on those statistics (assume a roughly normal

distribution of fill volumes)?

Chapter 3: Probability Fundamentals and Distributions

Learning Objectives: After completing this chapter, students should be

able to: (1) explain basic probability concepts (outcomes, events,

probability axioms) and how they relate to engineering experiments; (2)

distinguish between discrete and continuous random variables and give

examples of each in chemical processes; (3) describe important

probability distributions (especially the normal distribution) and their

properties; and (4) use the normal distribution to calculate probabilities

(e.g. z-scores) relevant to chemical engineering scenarios.

Statistics rests on the foundation of probability theory. To make

inferences about data, we need to understand how data behave in a

random sense. For instance, if we say “there’s only a 5% chance that the
difference in catalyst performance happened by random chance,” we are

invoking probability concepts. This chapter introduces the fundamentals

of probability and common probability distributions that model real-

world phenomena in chemical engineering. Mastering these concepts is

essential for later chapters on statistical inference (confidence intervals,

hypothesis tests) where we quantify uncertainty.

3.1 Basic Probability Concepts

Experiments and Sample Space: In probability theory, an experiment

is any process that yields an outcome that cannot be predicted with

certainty. In an engineering context, this could be something like

performing a lab titration (outcome = measured concentration) or

running a batch process (outcome = whether it succeeds or fails to meet

quality). The sample space (Ω) is the set of all possible outcomes. For a

simple example, if we flip a coin to decide something in the lab, Ω =

{Heads, Tails}. If we run a distillation and consider “product is on-spec

or off-spec” as outcomes, Ω = {On-spec, Off-spec}.


Events and Probability: An event is a subset of the sample space –

something that might happen or a condition of outcomes. Probability (P)

is a numerical measure of how likely an event is, on a scale from 0

(impossible) to 1 (certain). Key properties include:

 P(Ω) = 1 (the probability that something in the sample space occurs

is 1, i.e., some outcome must happen),

 If A and B are mutually exclusive events, P(A ∪ B) = P(A) + P(B)

(addition rule for disjoint events),

 P(complement of A) = 1 – P(A).

In chemical engineering, probabilities might be based on long-run

frequencies or subjective assessments. For instance, if historically 2 out

of 100 batches are off-spec, one could estimate P(off-spec) ≈ 0.02 (a

relative frequency interpretation). Or an engineer might say “there’s a

90% chance the new design will pass safety tests” as a subjective

probability based on experience.


Conditional Probability and Independence: Often we are interested in

the probability of an event given some condition. For example,

P(product passes quality | new raw material supplier) – the probability

product is good given a specific supplier’s material was used.

Conditional probability is defined as P(A|B) = P(A ∩ B) / P(B),

provided P(B) > 0. Two events A and B are independent if P(A|B) =

P(A) (meaning B occurring has no effect on probability of A). In

engineering terms, independence could be something like assuming that

the probability of Pump A failing is independent of Pump B failing if

they operate separately (in reality, some events may not be strictly

independent due to common causes, but independence is a useful

modeling assumption).

Probability in Action (Chemical Engineering Example): Consider a

safety system with two independent pressure relief valves on a reactor.

Let Event A = "Valve 1 fails when needed", Event B = "Valve 2 fails

when needed". Suppose P(A) = 0.01, P(B) = 0.01 based on historical

reliability data. If valves act independently, the probability both fail (and
thus a dangerous overpressure occurs) is P(A ∩ B) = P(A) * P(B) =

0.0001 or 0.01%. Understanding such small probabilities is crucial for

risk assessment. If the events were not independent (maybe they fail due

to a common cause like a power loss), the calculation would differ.

3.2 Random Variables: Discrete and Continuous

A random variable (RV) is a numerical outcome of an experiment. We

denote random variables with uppercase letters (X, Y). For example, let

X = number of defective bottles in a batch of 100 – X is random because

each batch will have a different number of defectives.

 Discrete Random Variables: These take on a countable number

of distinct values (often integers). The example X (number of

defects) is discrete, range 0 to 100. Probability is characterized by

a probability mass function (pmf) P(X = x) for each possible x. A

common discrete distribution in engineering is the Binomial

distribution: e.g., X ~ Bin(n=100, p=0.05) could model defects if

each bottle has a 5% chance of being defective independently.


Another is the Poisson distribution, often used for counts of

events in continuous time/space (e.g., number of pump failures in a

year). If pump failures occur randomly with an average rate λ = 3

per year, then X ~ Poisson(3) might model the count of failures per

year.

 Continuous Random Variables: These can take any value in a

range or interval. Examples: temperature in a reactor, the

concentration of a chemical, time until a component fails. Since

continuous outcomes have an uncountable range, we talk about

probability density function (pdf) f(x) such that the probability X

lies in an interval (a to b) is the area under the pdf curve from a to

b. We don’t assign probabilities to exact values (P(X = x) = 0 for

continuous RVs). Instead we find P(a < X < b). Many continuous

variables in engineering are modeled by the Normal distribution

(which we discuss soon). Others include Uniform, Exponential

(e.g., time between rare events), etc.

3.3 The Normal Distribution and Standard Scores (z-values)


The Normal distribution – often called the Gaussian or “bell curve” –

is arguably the most important distribution in statistics. It is continuous,

symmetric, and characterized by two parameters: the mean (μ) and

variance (σ²) or standard deviation (σ). We denote it as X ~ N(μ, σ²).

The pdf of a normal is the familiar bell-shaped curve centered at μ.

Why is the normal distribution so prevalent? Thanks to the Central

Limit Theorem (CLT) (which we will formally meet in Chapter 5), the

sum or average of many independent small effects tends to be

approximately normally distributed, regardless of the distribution of

each effect. Many measurement errors and process variations are

aggregates of many small random influences (noise), so they end up

approximately normal. For example, the error in a flow meter reading

might come from sensor noise, electronic noise, slight pressure

fluctuations, etc. – sum of many tiny independent errors – yielding a

roughly normal error distribution. Similarly, properties like molecular

weight of a polymer might be roughly normal distributed in a stable

process (centered at some typical value with symmetric variability).


Key properties of the Normal: It’s symmetric about the mean μ.

Approximately 68% of values lie within ±1σ of μ, ~95% within ±2σ, and

~99.7% within ±3σ (this is the “68-95-99.7 rule”). This rule of thumb is

incredibly useful: if an engineer sees that ±3σ range on a control chart

covers all normal operation data, any point outside that range is

extremely unlikely under normal conditions (less than 0.3% chance) –

indicating something unusual has happened.

Standard Normal and z-scores: Any normal random variable X ~ N(μ,

σ²) can be transformed to a standard normal (mean 0, SD 1) by

computing a z-score:

z=X−μσ.z = \frac{X - \mu}{\sigma}.

The z value tells us how many standard deviations X is above (+z) or

below (−z) its mean. Standard normal tables (or software) give

probabilities for Z (the standard normal variable). For example, P(Z <

1.645) ≈ 0.95, meaning 95% of the area under the standard normal curve

lies below 1.645. In context, if a quality measurement is normally


distributed and one wants the 95th percentile threshold, it’s roughly μ +

1.645σ.

Using z for Probabilities: Suppose reactor temperature is normally

distributed with μ = 500 K, σ = 5 K (assuming stable control). What is

the probability that a randomly selected time the temperature is above

510 K? We convert 510 to z: (510–500)/5 = 2.0. P(X > 510) = P(Z >

2.0). From standard normal knowledge, P(Z > 2.0) ≈ 0.0228 (2.28%). So

about 2.3% of the time, temperature exceeds 510 K. This could be

acceptable or not depending on safety limits. Engineers often calculate

such probabilities to assess risk or the likelihood of extreme events.

Conversely, to find a threshold (say the temperature that is exceeded

only 1% of time), find z for 99th percentile (~2.33), then threshold = μ +

2.33σ = 500 + 2.33*5 ≈ 511.65 K.

Non-Normal Distributions: While normal is common, many

engineering variables are not normal. For instance, exponential

distribution might model the time between rare events (memoryless

property), Weibull distribution is often used for failure times in


reliability (with shapes that can model increasing or decreasing failure

rate), uniform distribution might describe a situation of complete

randomness between bounds, etc. The Binomial and Poisson we

discussed are discrete analogs for counts. We mention these because

when data deviate strongly from normal (e.g. highly skewed

distributions like reaction time that can’t go below 0 and have a long

tail), using the appropriate theoretical distribution leads to more accurate

probability calculations and more valid statistical inferences.

For example, in a chemical plant, the distribution of “time to the next

emergency shutdown” might be better modeled by an exponential or

Weibull rather than normal, because it’s a waiting time for a random

event. Or the distribution of impurity particle counts in a semiconductor

process might be Poisson (counts per wafer area). Recognizing the

context helps choose a model.

However, the normal distribution is so central that many statistical

methods assume normality of data or at least of sample means (justified

by CLT). Thus, engineers often apply transformations to data to induce


normality (like taking log of a right-skewed variable can make it more

symmetric and closer to normal).

Standard Scores in Practice: Z-scores are also used to standardize

measurements on different scales. For example, if one variable is

temperature (mean 500, SD 5) and another is pressure (mean 20 bar, SD

2 bar), we can compare a particular temperature and pressure reading in

terms of “how extreme” it is by z. A temperature of 510 K is z=2 (as

above), a pressure of 24 bar is (24-20)/2 = 2 as well. So both are +2σ

events in their own domains. Standardization is the basis of control

charts and multivariate analysis.

Example Problem: If the daily production volume of a fertilizer plant is

roughly normally distributed with mean 1000 tons and SD 50 tons

(based on historical data), what is the probability that tomorrow’s

production will exceed 1100 tons? Solution: z = (1100-1000)/50 = 2.0,

P(Z > 2.0) ≈ 0.0228 (2.3%). So there’s about a 1 in 44 chance of

exceeding 1100 tons. This could help in planning – e.g., ensuring storage
for such a high-production day is available or that extra feedstock might

be needed if such an event is not negligible.

3.4 From Probability to Statistics: Linking to Data Analysis

Understanding distributions allows engineers to make predictions and to

apply statistical tests properly. For instance, in Chapter 6, when we do

hypothesis tests, we will assume test statistics follow certain

distributions (t, F, χ² – which are related to the normal distribution).

Knowing the normal distribution’s properties justifies why we use z or t

tables.

Additionally, probability theory underlies simulation. If a chemical

engineer wants to simulate a process (e.g. Monte Carlo simulation of an

oil reservoir output under uncertain parameters), they will draw random

samples from assumed distributions of inputs (like permeability might

be normally distributed, porosity might follow beta distribution, etc.) to

propagate uncertainty. This chapter provides the groundwork for such

advanced applications.
Case Study (Probability in Quality Control): A Nigerian beverages

company monitors the fill volume of soft drink bottles. Historically, fills

are normally distributed. The company sets control limits at μ ± 3σ. If

the process is centered at 500 ml with σ = 5 ml, control limits are 485 ml

and 515 ml. Probability of a bottle being below 485 ml (underfilled) is

~0.15% if the process is in control (3σ event) – very low. However, if

they start seeing 1% of bottles underfilled, that's a red flag statistically

(since 1% >> 0.15% expected). Probability calculations help quantify

these intuitions. Another example: if regulations demand that at most 1

in 1000 bottles is underfilled below 490 ml, the company can calculate

needed mean and σ to achieve that (set P(X<490) = 0.001, find z ~ -

3.09, so (490 - μ)/σ = -3.09 => μ - 490 = 3.09σ, if target μ=500, then σ ~

(500-490)/3.09 ≈ 3.23 ml). This connects probability with real

engineering targets.

Chapter Summary: In this chapter, we covered the language of

probability and the behavior of random variables. Key points are the

distinction between discrete and continuous variables and common


distributions used in engineering. The normal distribution was

highlighted due to its fundamental importance; understanding z-scores

and normal probabilities is crucial going forward. Now that we have

probability tools, we are prepared to delve into statistical inference. The

next chapter will introduce the idea of sampling distributions and the

Central Limit Theorem in more detail, bridging the gap between

probability and the statistics of sample data.

End-of-Chapter Exercises:

1. Basic Probability: A chemical plant has 3 independent safety

systems that can prevent an overflow. Each has a 2% failure

probability on demand. (a) What is the probability all three fail

(and an overflow occurs)? (b) If these systems were not

independent (e.g., all share a power source that fails 1% of the time

which would disable all), qualitatively how would that affect the

probability of failure compared to part (a)?

2. Discrete Distribution: Defects in sheets of glass occur with an

average rate of 0.2 defects per square meter. Assume a Poisson


distribution. (a) What is the probability that a 5 m² glass sheet has

zero defects? (b) What is the expected number of defects on a 10

m² glass sheet?

3. Continuous Distribution: The time (in hours) between

breakdowns of a processing unit follows an exponential

distribution with mean 100 hours. (a) What is the probability the

unit runs at least 150 hours without breakdown? (b) If the unit has

run 100 hours without failure, what is the probability it lasts

another 50 hours (memoryless property)?

4. Normal Distribution Application: The purity of a product is

normally distributed with mean 98.0% and standard deviation

0.5%. (a) Approximately what percentage of batches have purity

above 99%? (b) The specification is 96.5% minimum. What

percentage of batches fall below spec? (c) If a batch is below spec

(<96.5%), how many standard deviations below the mean is it

(calculate the z-score)?

5. Z-Score Practice: A certain reaction yield in a pilot plant is

N(μ=75%, σ=5%). (a) What yield value corresponds to the 90th


percentile (i.e., exceeded by only 10% of runs)? (b) If we observe a

yield of 60% in one run, compute its z-score and interpret what

that implies (how unusual is this run?).

6. Critical Thinking: In a Nigerian refinery, the sulfur content in

fuel (ppm) is strictly regulated. The refinery claims their process

outputs fuel with sulfur ~ N(μ=45 ppm, σ=5 ppm). The legal limit

is 60 ppm. If that claim is true, what is the probability a random

batch exceeds the limit? If an inspector tests 4 batches, what's the

probability all 4 pass the limit? (Assume each batch’s sulfur is

independent and follows the distribution.)

Chapter 4: Measurement Accuracy and Data Quality

Learning Objectives: Students will learn to: (1) distinguish between

accuracy and precision in measurements and why both are critical in

chemical engineering; (2) evaluate the reliability of a set of

measurements using statistical tools (e.g. calculating variance due to

measurement error); (3) understand how to perform basic instrument


calibration and analyze the data (regression for calibration curves); and

(4) design simple experiments to estimate measurement error (like

repeatability and reproducibility tests, possibly a Gage R&R study

concept).

In chemical engineering, data originate from measurements – whether in

the lab (measuring concentration, temperature, particle size, etc.) or in

the plant (flow rates, pressures, sensor readings). Thus, the integrity of

any statistical analysis hinges on the quality of these measurements. This

chapter focuses on the statistical aspects of measurement accuracy and

reliability, linking back to the idea in Chapter 1 that engineers must

appreciate measurement accuracy. We will discuss how to quantify

measurement error, how to calibrate instruments, and how to improve

data quality through careful experimental design.

4.1 Accuracy vs. Precision and Sources of Measurement Error


 Accuracy refers to how close a measurement is to the true or

accepted reference value. An accurate instrument has little

systematic error (bias).

 Precision refers to the consistency or repeatability of

measurements – if you measure the same thing multiple times, how

much do the readings vary? A precise instrument has small random

error (scatter).

It’s possible to have one without the other: for example, a scale that is

not zeroed could consistently read 5 grams heavy (poor accuracy, good

precision if the scatter is low). Conversely, a dartboard analogy often

used: tight grouping (precise) but off-center (inaccurate) vs. widely

scattered around the center (low precision, potentially unbiased on

average).

In practice, accuracy is addressed by calibration (adjusting

measurements to correct bias), while precision is improved by better

instrument design or measuring technique (reducing noise).


Sources of Error:

 Systematic errors (affect accuracy): calibration errors, consistent

instrument drift, environmental biases (e.g. temperature affecting

instrument). These cause measurements to be consistently off in

one direction.

 Random errors (affect precision): noise in sensor readings, human

reading error (judgment), fluctuations in the quantity being

measured, etc. These cause scatter in both directions.

For example, measuring pH with a meter: If the probe is not calibrated,

all readings might be 0.2 units high (systematic). If the meter’s

electronic noise causes readings to fluctuate ±0.05, that’s random error.

4.2 Statistical Analysis of Measurement Data: Mean and Variance of

Repeated Measurements

When evaluating a measurement system, a simple approach is to take

repeated measurements of a stable quantity and analyze them

statistically:
 Calculate the mean of the readings – this should approximate the

true value if no systematic bias. If the mean is far from a known

reference, that indicates inaccuracy.

 Calculate the standard deviation of the readings – this indicates

precision (often called repeatability when same operator, same

instrument, same conditions).

Example: A thermocouple is used to measure boiling water (true

temperature ~100°C at 1 atm). It is immersed multiple times yielding:

99.1, 99.5, 99.0, 99.3, 99.4°C. The mean is 99.26°C, indicating a slight

bias (~0.74°C low) – quite good accuracy. The SD is say ~0.2°C,

showing good precision (the readings are tight). If instead we got

readings like 95, 105, 98, 102, 100°C, the mean ~100°C (unbiased) but

the readings swing widely (SD ~4.2°C), indicating poor precision.

Gauge Repeatability and Reproducibility (R&R): In quality

engineering, a common approach to assess measurement systems is a

Gauge R&R study. It involves measuring some parts multiple times by

different operators and instruments. The variability is then decomposed:


how much comes from repeatability (same operator, same device

variability) and how much from reproducibility (differences between

operators/devices). While a full Gage R&R is beyond our scope,

conceptually:

 If two different operators measure the same sample and get

systematically different results, there's a reproducibility issue

(maybe different technique or device calibration).

 If the same operator measuring the same sample repeatedly shows

variation, that's repeatability (precision) issue.

We can treat measurement variance as another component in the total

observed variance. If a process has true variance σ_process² but our

measurement adds variance σ_meas², the observed variance =

σ_process² + σ_meas² (assuming measurement error is independent of

process variation). This implies if measurement error is high, it masks

the real process variation. Engineers strive for σ_meas << σ_process

(measurement system much more precise than the inherent variability


being studied). A rule of thumb in industry is measurement system

should contribute <10% of total variation for quality measurements.

4.3 Calibration and Regression for Instrument Accuracy

Calibration is aligning instrument output with known standards.

Statistically, calibration often uses linear regression (discussed more in

Chapter 8) to adjust readings. For instance, suppose we have a flow

meter that outputs a voltage that is supposed to correlate to flow rate.

We run known flows (from a calibrated reference) and record the

meter’s voltage:

 Reference flow: 0, 50, 100 L/min; Meter voltage: 0.01, 2.51, 4.98

volts (just an example).

Plotting voltage vs flow, we expect a line. We can fit a linear regression:

flow = a + b*(voltage). That equation is then used to convert future

voltage readings to flow values. The calibration ensures accuracy (the

regression corrects any bias or scale error). The regression’s R² and

residuals tell us how well the meter follows a linear pattern (residual
scatter indicates precision of the meter, systematic deviation from line

indicates non-linearity or biases).

Calibration Example: A pH meter is calibrated with buffer solutions of

pH 4.00, 7.00, 10.00. The meter reads 4.1, 7.1, 10.0 respectively without

calibration. We can shift the meter (offset -0.1) and adjust slope if

needed. In this case, readings suggest a consistent +0.1 bias at pH4 and

pH7, but at pH10 the bias is 0, indicating maybe a slight slope error. A

two-point calibration typically adjusts offset and slope such that meter

reads exactly the standard values. The remaining deviation at a third

point (pH10) indicates calibration quality. Statistically, if we regressed

“true pH = α + β*(meter reading)”, we’d find α and β to correct the

meter. Ideally β ~1, α ~0 for perfect instrument.

Regression Lab Task: Calibrate a thermocouple against a precision

thermometer across 0–100°C: record pairs of readings at various points,

fit a line, and compute the residual error. Use that line for corrected

readings. After calibration, test at a mid-point to verify improved

accuracy.
4.4 Data Quality: Outliers and Missing Data

Measurement data may contain outliers – readings that are far off the

expected range. Outliers can result from momentary glitches (electrical

spike, human error in reading, contamination in sample analysis).

Statistically identifying outliers can be done by seeing if a data point lies

beyond, say, 3 standard deviations from the mean or using Grubbs’ test,

etc. Engineers often face a choice: investigate and possibly discard

outliers (if they are proven erroneous) or include them if they might

indicate real extreme events. Good practice: never discard an outlier

without cause – first, ensure it's not a real phenomenon. For example, if

one batch’s impurity is 5σ higher than others, was there a measurement

error or did something truly go wrong in that batch?

Missing data is another quality issue – e.g. a sensor went offline for a

day. One must decide how to handle missing points (interpolate, use last

value, or analyze with methods that handle missing data). While this

veers into data management more than pure stats, it’s part of ensuring

the dataset used for analysis is representative and clean.


Improving Data Quality: Some strategies:

 Conduct repeat measurements and use the average to reduce

random error. By averaging n independent measurements, the

standard error reduces by √n. (E.g., take triplicate samples for

HPLC analysis and average results to get more precise estimate).

 Use better instruments or maintain instruments (regular calibration

schedule, routine maintenance to prevent drift).

 Control environmental factors during measurement – e.g. measure

viscosity in a temperature-controlled lab to avoid ambient

temperature influencing results.

 Training for operators to ensure consistent measurement technique

(reduces variability between people).

Case Example (Nigerian Lab): In a Nigerian oil laboratory, technicians

measure the API gravity of crude oil samples. Suppose two different

devices are used (digital density meter vs. hydrometer) and results

sometimes differ by ~0.5 API. A small Gauge R&R study is done: Each

of 5 oil samples is measured by both methods by two technicians.


Analysis shows the hydrometer readings tend to be 0.3 API higher on

average (a systematic bias) and technicians differ within ±0.1 API (small

random differences). The lab decides to always use the digital meter for

custody transfer measurements (as it’s more consistent) and use a

regression to adjust hydrometer readings if needed (calibrate hydrometer

to match digital meter scale). This improves overall data quality for

reporting and avoids disputes over measurements.

4.5 Application: Design an Experiment to Quantify Measurement

Uncertainty

Suppose we want to quantify the measurement uncertainty of a new gas

chromatograph (GC) for measuring benzene concentration in water. We

could design an experiment:

 Take a homogeneous water sample and spike it with a known

benzene level (say 5 mg/L).

 Have the GC measure this sample 10 times independently (or over

several days).
 The standard deviation of these 10 readings gives the instrument’s

precision at that level.

 Repeat for a different concentration (like 1 mg/L and 10 mg/L) to

see if precision varies with concentration (common in instruments

– relative error might be constant percentage).

 Also run a known standard sample to check accuracy (compare

mean reading to the true known concentration).

 If possible, use another method as a reference (like a calibrated

standard method) to measure the same sample and compare.

This experiment yields statistical insight: e.g., “The GC has a precision

of 0.1 mg/L (2% RSD at 5 mg/L) and an accuracy within 0.05 mg/L

after calibration. Thus, results above detection limit can be trusted

within ±0.2 mg/L with 95% confidence.” That kind of statistical

characterization is valuable for environmental reporting or quality

control.

Software Tip: MINITAB can aid measurement studies. For instance, it

has a Gage R&R analysis under Stat > Quality Tools, where you input
data from multiple operators and parts and it outputs variance

components. Or simple descriptive stats on repeated measurements and

control charts (a control chart of a stable process measurement

essentially tracks measurement variation over time).

Chapter Summary: In this chapter, we reinforced that no statistical

analysis can rise above the quality of the input data. We defined

accuracy and precision and showed how to assess them using statistics

(means and standard deviations of repeated measurements). Calibration

techniques using regression were introduced to correct bias. Engineers

must continuously ensure their data is reliable – through calibration,

repeated measurements, and data cleaning (outlier checks). As we

proceed to inferential statistics, remember that all our confidence in

results assumes the data truly represent what we think they do. A

significant test result means little if the measurements were significantly

biased or noisy. By controlling and quantifying measurement error, we

lay a solid foundation for trustworthy analysis.

End-of-Chapter Exercises:
1. Accuracy vs Precision: A pressure gauge on a reactor reads 5 psi

higher than the true pressure consistently, but has a very small

fluctuation (±0.1 psi). Another gauge reads on average correctly,

but fluctuates ±5 psi. Which gauge is more accurate? Which is

more precise? If you had to choose, which error (systematic or

random) is easier to correct and why?

2. Repeatability test: You weigh the same sample on a balance 7

times and get in grams: 10.12, 10.15, 10.13, 10.11, 10.14, 10.13,

10.12. Calculate the sample mean and standard deviation. If the

true mass is 10.00 g, what is the balance’s bias? Does the precision

(SD) seem acceptable (what % of the reading is it)?

3. Calibration curve: A spectrophotometer gives absorbance

readings that should linearly relate to concentration (Beer-Lambert

law). Known concentrations (ppm) vs absorbance: 0 -> 0.02, 5 ->

0.35, 10 -> 0.68, 15 -> 1.00. (a) Plot these and find the best-fit line

(absorbance = a + b*conc or conc = (absorbance - a)/b). (b) If an

unknown sample reads 0.50 absorbance, what concentration does

the calibration predict? (c) If the residual at 15 ppm was significant


(say actual absorbance was 1.05 vs 1.00 predicted), what might

that indicate (think linearity)?

4. Gage R&R concept: Two operators measure the thickness of a

plastic film using the same micrometer. Each measures the same

sample 3 times: Operator A gets (0.102, 0.105, 0.098 mm),

Operator B gets (0.110, 0.108, 0.109 mm). (a) Compute the mean

and SD for each operator’s readings. (b) Discuss differences: is

there a noticeable bias between operators? Who is more

repeatable? (c) What steps might you take to reduce any observed

differences (training, calibration of micrometer, etc.)?

5. Outlier handling: In a series of viscosity measurements (in cP) of

a sample: 45.2, 46.1, 44.8, 120.5, 45.0, 44.9 – one value is clearly

an outlier. (a) Statistically, how could you justify discarding the

outlier? (Calculate how many SDs away it is from the mean of

others, for instance.) (b) What non-statistical investigation should

accompany this (what might you check about that run)? (c) After

removing it, recalc the mean viscosity. How different is it from

including the outlier?


6. Practical lab task (for thought): Design a brief plan to test the

measurement variability of a pH meter. Include: how many

measurements, of what solutions, using how many operators, etc.,

to separate instrument repeatability from operator technique

variability.

Chapter 5: Sampling and Confidence Intervals

Learning Objectives: By the end of this chapter, students will be able

to: (1) explain the concept of a sampling distribution and the Central

Limit Theorem and why they are fundamental to statistical inference; (2)

construct and interpret confidence intervals for a population mean and

proportion based on sample data; (3) understand the influence of sample

size and confidence level on the width of confidence intervals; and (4)

draw appropriate conclusions from confidence interval results in

chemical engineering contexts (e.g., process parameters, quality

metrics).
Up to now, we have dealt with describing data and understanding

probability distributions in general. Now we shift toward statistical

inference – making educated statements about a population (or process)

based on a sample of data. A core concept is that any statistic computed

from a sample (like the sample mean) is itself a random variable, with its

own distribution (the sampling distribution). Confidence intervals (CIs)

are one of the main tools of inference, allowing us to estimate a

population parameter (like a true mean) with an indication of

uncertainty.

5.1 Populations, Samples, and the Central Limit Theorem

A population is the entire set of subjects or measurements of interest

(conceptually, often infinite or very large). A sample is a subset of data

drawn from the population, ideally at random. For example, consider a

production of 10,000 bottles of soda in a shift (population = 10,000 fill

volumes). We might measure 50 of them (sample) to infer things about

the whole batch.


Sampling Distribution: If we take a sample and compute a statistic

(like sample mean Xˉ\bar{X}), and if we could hypothetically repeat

that sampling process many times, the distribution of Xˉ\bar{X} values

is the sampling distribution. Its mean is the population mean (so Xˉ\

bar{X} is an unbiased estimator of μ), and its variance is σ2/n\sigma^2/n

(population variance divided by sample size n). This means larger

samples yield a tighter distribution of the sample mean around the true

mean.

The Central Limit Theorem (CLT) states that for a large sample size

n, the sampling distribution of Xˉ\bar{X} will be approximately normal,

regardless of the shape of the population distribution (provided the

population has a finite variance and no extreme heavy tails). By "large

n", typically n ≥ 30 is often sufficient in practice, though if the

underlying distribution is very skewed, you might need more. CLT is a

cornerstone because it justifies using normal-based inference for means

in many situations.
Illustration: If an individual measurement of, say, catalyst pellet

diameter is not perfectly normal (maybe slightly skewed), the average

diameter of 50 pellets will be very close to normal by CLT. The mean of

those 50 will have much smaller variability than individual pellets,

specifically σXˉ=σ/50\sigma_{\bar{X}} = \sigma/\sqrt{50}. This tells

us that to reduce uncertainty in estimating the true mean diameter, one

can increase sample size.

Standard Error (SE): The standard deviation of a statistic (e.g., sample

mean) is called the standard error. For the sample mean,

SE(Xˉ)=σnSE(\bar{X}) = \frac{\sigma}{\sqrt{n}}. In practice, σ is

often unknown, so we estimate SE using the sample’s standard deviation

ss: SE≈s/nSE \approx s/\sqrt{n}. The SE is crucial in determining

confidence intervals and test statistics.

5.2 Confidence Intervals for the Mean (Known Variance)

To introduce the concept, first assume we know the population standard

deviation σ (this is rarely true in practice, but it simplifies initial


understanding). If the population is normal or n is large (CLT), a

confidence interval for the true mean μ can be constructed around the

sample mean:

Xˉ±zα/2σn,\bar{X} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}},

where zα/2z_{\alpha/2} is the z-value cutting off an area of α/2\alpha/2

in the upper tail for a (1-α)*100% confidence interval. For a 95% CI,

α=0.05, so z0.025≈1.96z_{0.025} \approx 1.96. For 90% CI,

z0.05≈1.645z_{0.05}\approx1.645, and for 99% CI,

z0.005≈2.576z_{0.005}\approx2.576.

This formula comes from the reasoning that Xˉ\bar{X} is approximately

normal with mean μ and std dev σ/√n. We want an interval that has 95%

chance to cover μ. So we go ~1.96 SEs below and above the observed

Xˉ\bar{X}.

Interpretation: If we say "95% confidence interval for μ is [L, U]", it

means that if we repeated the sampling many times, 95% of the intervals

constructed this way would contain the true μ. It does not mean there's a

95% probability that μ lies in [L, U] (μ is fixed, the interval is random).


But informally, one can think of it as a reasonable range of values for μ

given the data.

Example: A catalyst yields a mean conversion of 80% with known σ =

5% in small-scale trials (n=9 trials). We find Xˉ=80%\bar{X}=80\%. A

95% CI for the true mean conversion: 80 ± 1.96*(5/√9) = 80 ±

1.96*(1.667) = 80 ± 3.27, so [76.73%, 83.27%]. We are 95% confident

the true mean conversion under these conditions is ~76.7 to 83.3%. If a

desired target was 85%, this CI suggests the process likely falls short of

that target.

5.3 Confidence Intervals in Practice (Unknown Variance – t-

distribution)

Usually, σ is unknown. We then use the sample standard deviation s and

rely on the t-distribution instead of normal. The t-distribution (with df

= n-1 degrees of freedom) is wider than normal for small samples, to

account for extra uncertainty in estimating σ. A CI for μ becomes:


Xˉ±tα/2, df=n−1sn.\bar{X} \pm t_{\alpha/2, \, df=n-1} \frac{s}{\

sqrt{n}}.

For large n, t approximates z. For example, with n=10 (df=9), the 95% t

critical value is ~2.262 (compared to 1.96 for z). With n=30 (df=29), t

~2.045. As n → ∞, t →1.96.

Example: Suppose from 15 samples of a new chemical product’s purity,

we get Xˉ=99.2%\bar{X}=99.2\%, s=0.5%. We want a 95% CI for true

purity. df=14, t(0.025,14) ≈ 2.145. So CI = 99.2 ± 2.145*(0.5/√15).

0.5/√15 = 0.129, times 2.145 gives ~0.277. So CI ≈ [98.92%, 99.48%].

We are fairly confident the average purity is between ~98.9 and 99.5%.

Notice how a small s and decent n yields a tight CI of width about

±0.28%.

The width of a CI depends on:

 Confidence level: higher confidence (99% vs 95%) → larger

critical value → wider interval (more cautious).


 Sample size: larger n → smaller standard error → narrower

interval (more precision).

 Data variability (s): more variability → wider interval.

Planning Sample Size: Sometimes an engineer may want to determine

n such that the CI will be a desired width. Roughly, for a given desired

margin m at 95% confidence, you’d set 1.96 * (σ/√n) = m, solve for n.

For example, if you want ±1% precision on a mean with σ ~5, need n ~

(1.96*5/1)^2 ≈ 96. (We often don't know σ initially, so use a prior

estimate or pilot study for planning.)

5.4 Confidence Interval for a Proportion

In quality control, one might estimate a proportion (e.g., fraction of

defective products, or fraction of time a process is in a certain state). For

large n, if X ~ Binomial(n, p) (where p is true proportion of "success"),

the sample proportion p^=X/n\hat{p} = X/n has approximately normal

distribution with mean p and standard error p(1−p)n\sqrt{\frac{p(1-p)}

{n}}. For CI, one formula is:


p^±zα/2p^(1−p^)n.\hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\

hat{p})}{n}}.

This is an approximate method (there are better ones for extreme p or

small n, like Wilson score interval).

Example: Out of 200 products, 8 failed quality tests. p^=8/200=0.04\

hat{p}=8/200 = 0.04 (4% defect rate). A 95% CI for true defect rate p:

0.04 ± 1.96 * sqrt(0.04*0.96/200). The sqrt term = sqrt(0.0384/200) =

sqrt(0.000192) = 0.01385. Times 1.96 gives ~0.0271. So CI ≈ [0.04 -

0.027, 0.04 + 0.027] = [0.013, 0.067] or [1.3%, 6.7%]. We conclude the

defect rate is likely between about 1% and 7%. Note the asymmetry: the

point estimate was 4%, but because of sampling uncertainty, true rate

could be a bit lower or higher. If a regulatory limit was, say, 5%, this CI

unfortunately includes 5% (so we couldn't confidently claim the defect

rate is below 5% yet). If we wanted a tighter interval, we’d need a larger

sample.

5.5 Practical Interpretation in Engineering Context


Confidence intervals are extremely useful for communicating

uncertainty of estimates:

 An engineer reporting the mean concentration of a pollutant in

wastewater might say "Mean = 5.2 mg/L, 95% CI: 4.5 to 5.9

mg/L". This tells regulatory agencies not just a point estimate but

the range of likely true values.

 In R&D, if a new catalyst’s average yield improvement is

estimated with a CI, one can see if zero improvement is outside the

interval or not (if the entire CI is above 0, that suggests statistically

significant improvement – connecting to hypothesis testing logic).

 Confidence intervals can be used to determine if a process change

has practical significance. For example, modifying a process

increased average throughput from 100 to 105 kg/hr, CI for

increase is [+2, +8] kg/hr. This is statistically significant (CI

doesn’t include 0) and also practically significant. But if the CI

was [0, +10], it’s borderline – the true increase could be trivial (0)

or large.
MINITAB Note: In MINITAB, when you use Stat > Basic Statistics >

1-sample t (for example), it will output not only the test result but also

the CI for the mean. It’s good practice to always look at the CI, as it

provides more information than a yes/no hypothesis test.

Case Study (Chem Eng Example): A Nigerian pharmaceutical plant

measures the potency of a drug in 10 tablets; results in assay percent of

label claim: 98, 101, 99, 100, 97, 103, 99, 101, 100, 98. Xˉ=99.6,s≈2.0\

bar{X}=99.6, s≈2.0. 95% CI (df=9, t≈2.262): 99.6 ± 2.262*(2/√10) =

99.6 ± 2.262*0.632 = 99.6 ± 1.43 => [98.17, 101.03]%. So they’re 95%

confident true average potency is ~98.2–101.0%. The spec requires 95–

105%, so that’s comfortably within spec – no issue. If instead s were

larger or sample smaller such that CI was [94, 105]%, even if mean is

99.6, we wouldn’t be as confident all production is on target (the true

mean could be slightly below 95). That might prompt taking a larger

sample or investigating variability sources.

5.6 Weekly/Fortnightly Lab Assignments in Context


Recall from the course outline that students are to have regular

computer-lab assignments. In this chapter’s context, an example

assignment: Use MINITAB to generate random samples from a known

distribution and verify the Central Limit Theorem. For instance, simulate

1000 samples of size n=5 from an exponential distribution (which is

skewed) and observe the distribution of sample means – check that it

approaches normal. Or assign students to measure something around

them (like fill volumes of water bottles they have) and compute a CI,

interpreting it.

Chapter Summary: In this chapter, we have established how to move

from describing a single sample to making statements about the broader

process it came from. The Central Limit Theorem assures us that sample

means tend to be normally distributed for large samples, enabling the

construction of confidence intervals. We derived CIs for means (using z

or t) and for proportions. Understanding and interpreting these intervals

is a fundamental skill – it teaches us to say “we’re not 100% sure, but

we have a range for the value with a certain confidence.” In engineering,


where decisions must often be made under uncertainty, the confidence

interval provides a quantitative handle on that uncertainty. In the next

chapter, we’ll continue our exploration of inference with hypothesis

testing, which is a complementary framework asking yes/no questions

with a controlled false-alarm rate.

End-of-Chapter Exercises:

1. Sampling Distribution: A process has a true mean μ = 50 and σ =

5. If you take samples of size n=25, what is the distribution of the

sample mean (give mean and standard deviation)? According to

CLT, what is approximate probability that a given sample mean

will fall between 48 and 52?

2. Confidence Interval Calculation: A polymer’s tensile strength

(MPa) is measured in 8 specimens: 55, 60, 57, 59, 58, 61, 56, 60.

Compute the sample mean and standard deviation. Assume

roughly normal data. (a) Construct a 95% CI for the true mean

strength. (b) Interpret this interval in context of the polymer’s


guaranteed minimum strength of 54 MPa (does it appear the

guarantee is met on average?).

3. Proportion CI: In a quality audit, 16 out of 200 sampled packages

were found leaking. Calculate a 90% confidence interval for the

true proportion of leaking packages. Based on this, if the company

claims “at most 5% of packages leak,” is that claim plausible?

4. Changing Confidence Level: Using the tensile strength data from

question 2, also compute a 90% CI and a 99% CI for the mean.

Compare the widths of these intervals to the 95% CI. Explain the

trade-off between confidence level and precision.

5. MINITAB Exercise: Generate 50 random values in MINITAB

from a Uniform(0,1) distribution (Calc > Random Data). Treat

those as a “population”. Now use MINITAB’s resampling or by

manually taking, say, 5 random points from those 50 to simulate a

sample. Compute the sample mean and a 95% CI (since σ

unknown, use t). Repeat this sampling a few times (MINITAB can

automate with bootstrapping if known). Do most of your CIs


contain the true mean of the uniform (which is 0.5)? What does

this demonstrate?

6. Critical Thinking: A chemical engineer says, “We don’t need

confidence intervals; we have a lot of data, so the sample mean is

basically the truth.” Discuss why even with a lot of data, reporting

an interval is valuable. What if the engineer’s data were highly

variable? How does sample size help and what caution remains?

Chapter 6: Hypothesis Testing and Statistical Inference

Learning Objectives: Students should be able to: (1) formulate null and

alternative hypotheses for common engineering scenarios (comparing

means, checking a claim, etc.); (2) perform basic hypothesis tests (one-

sample z/t-test for mean, tests for proportions) and calculate test

statistics and p-values; (3) interpret the results of a hypothesis test in

context, including understanding the meaning of p-value and the risk of

Type I/II errors; and (4) evaluate the assumptions behind tests
(normality, independence) and use software (MINITAB) to carry out

tests, interpreting the output.

Hypothesis testing is a formal framework for making decisions or

judgments about a population parameter based on sample data. Where

confidence intervals gave us a range of plausible values, hypothesis tests

ask a yes/no question: for example, "Is the mean at least 5?" or "Are two

processes different in yield?" and provide a significance level for the

decision. This chapter introduces the concepts of null and alternative

hypotheses, test statistics, significance (p-values), and errors in testing.

6.1 Hypotheses: Null and Alternative

A null hypothesis (H₀) is a statement of no effect or no difference, set

up as the default or status quo claim. The alternative hypothesis (H₁ or

Hₐ) is what we seek evidence for – typically indicating some effect,

difference or change. We always assume H₀ is true unless data give

strong evidence against it.

Examples:
 H₀: μ = 50 (the process mean equals 50 units), H₁: μ ≠ 50 (two-

sided alternative, the mean is different from 50).

 H₀: μ ≤ 100, H₁: μ > 100 (one-sided alternative, testing if mean

exceeds 100).

 H₀: p = 0.05 (5% defect rate historically), H₁: p < 0.05

(improvement: defects lower than 5%).

We choose H₀ and H₁ before seeing data ideally, reflecting the question

at hand. The null often represents "no change," "no improvement," or a

claimed value. The alternative represents what we suspect or want to

test.

6.2 Test Statistics and Decision Rule

A test statistic is a function of sample data that measures how far the

data deviate from H₀. For a mean test (with known σ or large n), we use:

z=Xˉ−μ0σ/n,z = \frac{\bar{X} - \mu_{0}}{\sigma/\sqrt{n}},

where μ₀ is the hypothesized mean under H₀. If |z| is large, it indicates

the sample mean is far from μ₀ in SD units – evidence against H₀.


If σ is unknown and n is moderate, use:

t=Xˉ−μ0s/n,t = \frac{\bar{X} - \mu_{0}}{s/\sqrt{n}},

with df = n-1.

For testing a proportion:

z=p^−p0p0(1−p0)n,z = \frac{\hat{p} - p_{0}}{\sqrt{\frac{p_{0}(1-

p_{0})}{n}}},

assuming n large enough.

The decision rule can be the critical value approach: compare the test

statistic to a threshold based on chosen significance level α. For

example, two-sided test at α=0.05: reject H₀ if |z| > 1.96 (for z-test) or |t|

> t_{α/2,df}.

Alternatively (and more commonly now) the p-value approach:

compute the p-value, which is the probability of observing a test statistic

as extreme as (or more than) what we got, assuming H₀ is true. If p-

value ≤ α, reject H₀ (evidence is significant at level α). If p-value > α,

fail to reject H₀.


Interpretation of p-value: It's the risk (under H₀) of falsely rejecting

H₀ given the observed data. A small p (like 0.01) means the data we got

would be very unlikely if H₀ were true, so we suspect H₀ is false.

Conversely, a large p (0.5) means data are quite compatible with H₀.

6.3 Significance Level, Type I and II Errors, and Power

 Significance level (α): The chosen probability of Type I error

(false positive), i.e., rejecting H₀ when it is actually true. Common

α = 0.05 or 5%. This means we tolerate a 5% chance of alarm

when nothing is wrong. If consequences of false alarm are serious,

we may use α = 0.01; if missing a real effect is worse, maybe α =

0.1 is chosen.

 Type I Error: Rejecting H₀ when H₀ is true (false alarm).

Probability = α by test design.

 Type II Error: Failing to reject H₀ when H₁ is true (missed

detection). Probability = β (not directly set by test, depends on

sample size, effect size, α).


 Power = 1 - β: Probability of correctly rejecting H₀ when a

specific alternative is true. For example, the power of a test to

detect that μ is actually 5 units above μ₀ might be 80% with a

certain n. We generally want high power (≥80%) for important

effects.

There is a trade-off: for fixed n, lowering α (being more stringent) often

raises β (harder to detect a true effect). Increasing n reduces both errors

(power goes up).

Engineers often use α=0.05 by convention, but if, say, a plant safety

decision is based on a test, one might choose α=0.01 to be extra sure

before calling something safe (minimize false positives in concluding

safety).

6.4 One-Sample Tests: z-test and t-test

One-sample z-test (rarely used in practice unless σ known or n

large): e.g., testing H₀: μ = μ₀. Calculate z as above and compare to

N(0,1).
One-sample t-test: More common. Use when σ unknown and sample

from ~normal population (or n sufficiently large for CLT). MINITAB’s

1-sample t handles this. We specify H₀ value and get a t and p.

Example: A machine is supposed to fill 500 mL on average. We take 10

samples: mean = 492 mL, s = 15 mL. H₀: μ=500, H₁: μ≠500. t = (492-

500)/(15/√10) = -8/(4.743) = -1.686. df=9. The two-tailed p-value = P(|

T9| > 1.686). From t-table or software, p ~ 0.128. At α=0.05, p > 0.05,

so we fail to reject H₀. We don't have strong evidence the mean is

different from 500; the shortfall of 8 mL could be due to chance with

this sample size. However, note: 8 mL difference might be practically

important. Perhaps we need more data (power might be low with n=10).

If the spec is tight, one might consider the risk of Type II error here.

(We’ll discuss interpretation beyond p-value: maybe a CI approach from

Chapter 5 would show 95% CI ~ [483, 501], which includes 500, hence

consistent with test result).

Paired t-test (a special one-sample test on differences): If we measure

something before and after on the same unit (paired data, e.g. catalyst
activity before/after regeneration on each catalyst sample), we compute

differences and do a one-sample t on those differences (H₀: mean

difference = 0). We mention here because it's a common scenario:

pairing eliminates variation between subjects. For example, if we have 5

catalysts, measure activity, regenerate, measure again, we can test if

average change > 0 via paired t.

6.5 P-value and Decision in Context

It’s crucial to interpret results in context:

 "Reject H₀ at α=0.05" means we found statistically significant

evidence for H₁ at 95% confidence level.

 "Fail to reject H₀" does not prove H₀ true; it just means not

enough evidence against it. Perhaps the effect is too small to detect

with given data, or variability is too high.

Case Study: A Nigerian water treatment facility claims its effluent BOD

(biochemical oxygen demand) is at most 20 mg/L. Regulators take 6

samples: mean = 22 mg/L, s = 3. H₀: μ = 20 vs H₁: μ > 20 (one-tailed,


since regulators worry if it's higher). t = (22-20)/(3/√6) = 2/1.225 =

1.633, df=5. For α=0.05 one-tailed, critical t5 = 2.015, or p = P(T5 >

1.633) ~0.078. Not below 0.05, so fail to reject H₀ at 5% (no significant

violation). At 10% significance, we would reject (p < 0.10). This

illustrates how chosen α matters. The regulator might decide to collect

more data (increase n) to get a clearer decision, given p ~0.078 is

suggestive but not conclusive at 5%. The potential risk: maybe true

mean is slightly above 20 and a small sample didn't catch it with 95%

confidence.

6.6 Using MINITAB for Hypothesis Tests

MINITAB makes hypothesis testing straightforward:

 For one-sample tests: Stat > Basic Statistics > 1-sample t... (or 1-

sample z if σ known). You input the column and H₀ value, choose

test type (≠, <, > alternative).

 The output will show the sample mean, n, SD, the t (or z) statistic,

degrees of freedom (for t), and the p-value.


 It might say something like: "Test of μ = 500 vs ≠ 500: T = -1.686,

p = 0.128". You then interpret: since p 0.128 > 0.05, not

significant; if α was 0.1, then it's borderline significant.

MINITAB also can produce a confidence interval in the same output.

It’s good to examine that – a p-value tells if there's evidence of

difference, but the CI tells how big the difference might be if it exists.

For the filling example, p=0.128 (no sig diff), but CI might be [483, 501]

mL, which suggests even though we can’t reject a 500 mean, the actual

mean could be as low as 483, which might be practically concerning if

underfilling. Hypothesis test alone might miss that nuance. Thus,

reporting both is often recommended.

Error Considerations: If we repeat tests, a 5% significance means on

average 1 in 20 tests on true nulls will give a false positive. Engineers

need to be cautious of multiple comparisons (if you test 5 different

quality metrics, the chance one falsely flags is >5%). There are methods

to adjust for multiple tests (Bonferroni, etc.), but that’s advanced. We


mention so that one doesn’t think "just test everything at 5% and be

surprised if one lights up by chance."

6.7 Example: Power and Sample Size

If an engineer wants to ensure detecting a certain deviation with high

probability, they do a power analysis. For example, if true mean is 495

vs H₀:500, what’s the chance our filling test with n=10, α=0.05 would

catch it? It might be low. Roughly, power = P(reject | μ=495). We

compute t threshold ~2.262 (for df=9 two-tail, half of that one tail since

symmetric). Need our t = (495-500)/(15/√10) = -5/4.743 = -1.054. That’s

not beyond -2.262, so we wouldn't reject; for power we integrate

distribution under alt. Perhaps power ~ 20-30% only. This is a trivial

calc here, but MINITAB has a Power and Sample Size tool to do these

properly. If low power, we increase n to achieve desired power (80%

etc.).
Engineers often face this: "How many samples do I need to be confident

in detecting a 1% difference?" These questions require balancing

statistical and practical significance.

6.8 Recap with Industrial/Nigerian Example

Let's revisit Kenneth Dagde’s Day 8-9 topics – hypothesis testing

introduction, then power and significance. Imagine an industrial

example: A cement plant in Nigeria changed a grinding aid additive,

hoping to increase average cement strength. Historically μ=42 MPa.

New sample of 5 bags yields mean 44 MPa, s=3. H₀: μ=42 vs H₁: μ >

42. t = (44-42)/(3/√5) = 2/(1.342) = 1.49, df=4, one-tail p ~0.10. Not

significant at 0.05. Possibly with n=5, our test is underpowered to detect

a ~2 MPa increase. If we require evidence at 95% confidence, we might

do more trials (maybe 15-20 bags) to confirm. Alternatively, we might

consider the cost of being wrong: if the additive is cheap and unlikely to

harm, one might proceed provisionally. But statistically, we can't claim

with 95% confidence that strength improved.


One must also be wary of assumptions: t-tests assume data is

approximately normal. If sample size is small (n<30), this matters. If

data are skewed or have outliers, non-parametric tests (like sign test,

Wilcoxon) might be used as alternatives. However, we won't dive deep

here – but always plot your data to check nothing weird.

Chapter Summary: We covered how to set up and interpret hypothesis

tests. The p-value tells us if the observed effect is statistically significant

(unlikely under the null hypothesis). We differentiate between statistical

significance and practical significance – a tiny difference can be highly

significant with huge n, and a large difference can be non-significant

with tiny n. Chemical engineers should use hypothesis tests as tools to

make data-driven decisions (like accepting a new process or determining

if a change had an effect) while also using engineering judgment on

what differences matter. In the next chapter, we will extend hypothesis

testing to comparing two groups (two-sample tests) and ANOVA for

several groups, which are very common in experiments and quality

control.
End-of-Chapter Exercises:

1. Formulating H₀/H₁: For each scenario, state appropriate null and

alternative hypotheses and whether the test is one-tailed or two-

tailed:

a. A catalyst vendor claims their catalyst increases yield. Current

yield is 75%. We test if the new catalyst’s mean yield is greater

than 75%.

b. A regulation says sulfur content must be 50 ppm. We want to

check if our fuel meets this (we worry if it's not equal to 50).

c. Historically 10% of products were defective. After a process

improvement, we hope the defect rate is lower.

2. Calculating Test Statistic: A sample of n=16 from a normal

process has Xˉ=105\bar{X}=105 and s=8. We test H₀: μ = 100 vs

H₁: μ ≠ 100 at α=0.05. (a) Compute the t statistic and degrees of

freedom. (b) What is the approximate critical t for α=0.05 (two-

tail, df=15)? (c) Based on these, would you reject H₀? (d)

Calculate or estimate the p-value.


3. Interpreting p-value: If a p-value comes out as 0.003 in a test,

explain in a sentence what that means regarding the data and H₀. If

α was 0.01, what is the decision? If α was 0.001?

4. Type I/II Conceptual: In the context of exercise 1a (catalyst

yield): describe a Type I error and a Type II error in plain language

and the consequence of each. Which error might be more serious

for the company (adopting a new catalyst that actually doesn’t

improve yield vs. not adopting one that actually would improve

yield)? How could you reduce the chances of a Type II error?

5. Using Software: (If available) In MINITAB or another tool, input

the data from exercise 2 (or generate similar data). Perform a 1-

sample t test for H₀: μ=100. Report the output: sample mean, SE

mean, t, and p-value. Does it match your manual calculation?

6. Power check: Suppose in exercise 2 that the true mean is actually

105 (as observed) and true σ is 8. We had n=16. Without complex

calculations, discuss if the test likely had decent power to detect a

5 unit difference. (Hint: if you use a rough effect size 5/8 = 0.625

SD, and n=16, consider that larger n or effect would increase


power). What sample size might be needed to have, say, 90%

power for detecting a 5 unit increase at α=0.05? (This may require

some trial or formula: one approach is use z since n large

approximation: 1.96 + 1.282 = 3.242 = (Δ/σ)*√n; plug Δ=5, σ=8,

solve for n.)

Chapter 7: Comparing Two or More Groups – t-Tests and ANOVA

Learning Objectives: Students will be able to: (1) conduct and interpret

a two-sample t-test for comparing means from two independent groups

(including checking for equal vs. unequal variances); (2) understand the

basic idea of Analysis of Variance (ANOVA) for comparing more than

two means and interpret an ANOVA table; (3) recognize when to use

paired t-tests versus two-sample t-tests; and (4) relate these tests to

engineering situations like A/B testing of process conditions or

comparing multiple catalysts, including understanding the assumptions

involved.
In many chemical engineering investigations, we compare two or more

conditions or groups: e.g. performance of catalyst A vs B, output of

process before vs after an upgrade, yields under several temperature

settings. This chapter extends hypothesis testing to such comparisons.

We first tackle comparing two groups with t-tests, then one-way

ANOVA for multiple groups.

7.1 Two-Sample t-test (Independent Samples)

When we have two separate samples from two populations or

experimental conditions, and we want to test if their means differ, we

use a two-sample t-test. Typical hypotheses: H₀: μ₁ = μ₂ (no

difference), H₁: μ₁ ≠ μ₂ (or one-sided if expecting a direction).

Assumptions: Ideally both samples are from (approximately) normal

distributions. If sample sizes are reasonably large (n₁,n₂ ≥ 30), CLT

helps. Also assume samples are independent of each other. There are

two versions:
 Equal variances assumed (pooled t-test): If we can assume σ₁ =

σ₂, we pool the variances for a more stable estimate.

 Unequal variances (Welch’s t-test): Does not assume equal σ,

uses a degrees of freedom formula (often non-integer df) and

separate variances.

If sample standard deviations differ a lot or sample sizes differ, Welch’s

is safer. Most software (including MINITAB) can test for equal

variances (e.g. Levene’s test) or just use Welch by default.

Test statistic:

t=Xˉ1−Xˉ2−Δ0Sdiff,t = \frac{\bar{X}_1 - \bar{X}_2 - \Delta_0}{S_{\

text{diff}}},

where Δ₀ is the hypothesized difference (often 0), and

Sdiff=s12n1+s22n2S_{\text{diff}} = \sqrt{\frac{s_1^2}{n_1} + \

frac{s_2^2}{n_2}}. If variances assumed equal, SdiffS_{\text{diff}}

uses pooled s².


Degrees of freedom: if pooled, df = n₁+n₂ - 2. If not pooled, use Welch-

Satterthwaite formula (software computes; df could be fractional).

Example: Catalyst A vs Catalyst B: 8 runs each. A: XˉA=85%,sA=5%\

bar{X}_A=85\%, s_A=5\%. B: XˉB=80%,sB=6%\bar{X}_B=80\%,

s_B=6\%. H₀: μ_A = μ_B, H₁: μ_A > μ_B (one-tailed, expecting A

better). Pooled approach: sp ≈ sqrt(((725)+(736))/(8+8-2)) =

sqrt((175+252)/14) = sqrt(427/14)=sqrt(30.5)=5.52%. t = (85-80)/

(5.52*√(1/8+1/8)) = 5 / (5.52*√(0.25)) = 5/(5.52*0.5) = 5/(2.76) =

1.811. df=14. For one-tailed α=0.05, critical t14 ~1.761. t=1.811 exceeds

that, so p ~0.045, we reject H₀ and conclude Catalyst A has significantly

higher mean yield than B at 5% level. (Two-tailed p would be ~0.09, not

sig if two-tailed test). Interpretation: evidence suggests A is better than

B. However, difference is 5%, which may or may not be practically huge

depending on context, but likely meaningful in yield terms.

If variances were very different, we could use Welch’s. Typically, we

might check if 5² vs 6² are similar. Levene’s test might say p > 0.05,
okay to assume equal. If one stdev was 5 and the other 15, then

definitely use Welch.

MINITAB usage: Stat > Basic Statistics > 2-sample t. Choose whether

to assume equal variance (there’s a checkbox). It will output means,

difference, SE diff, t, df, p-value. Also confidence interval for

difference.

Two-sample t is basically the simpler case of ANOVA for 2 groups.

Results align if done via either method.

7.2 Paired t-test (Revisited)

We discussed paired t in Chapter 6, but to contrast: use paired t when

samples are not independent – each data point in group1 pairs naturally

with one in group2 (like measurements on same unit). In pairing, you

reduce noise from unit-to-unit variation. For example, testing two

formulations on the same engine in two runs (one with additive, one

without) – differences per engine are computed, then one-sample t on

differences.
Always decide pairing vs independent based on experiment design. If

pairing is appropriate but you analyze as independent, you lose power

because you ignore the pairing that removes variability. If you wrongly

pair unrelated samples, that's also incorrect.

7.3 One-Way ANOVA: Comparing More than Two Means

When you have k groups (k ≥ 3), doing multiple t-tests is inefficient and

increases Type I error risk (multiple comparisons). Analysis of

Variance (ANOVA) provides a single overall test for H₀: all k means

are equal vs H₁: at least one mean differs.

The idea:

 Between-group variability: How much do group means vary

around overall mean?

 Within-group variability: How much do data vary inside each

group (noise)?

If between is large relative to within, it suggests real differences between

means. The test statistic is an F-ratio:


F=Variance between groupsVariance within groups,F = \frac{\

text{Variance between groups}}{\text{Variance within groups}},

which follows an F distribution with df1 = k-1 (between), df2 = N-k

(within), where N is total observations.

ANOVA table includes:

 SS_between (SSA), df = k-1.

 SS_within (SSE), df = N-k.

 MS_between = SS_between/(k-1), MS_within = SS_within/(N-

k).

 F = MS_between / MS_within.

 p-value for F with (k-1, N-k) df.

If p < α, reject H₀ (not all means equal). But ANOVA doesn’t tell which

differ; for that, do post-hoc tests (Tukey, etc.), or planned comparisons

(t-tests with Bonferroni adjustments). We will mostly focus on

understanding the ANOVA output and when to use it.


Example: Three solvents are tested for reaction yield (3 runs each).

Suppose yields:

Solvent1: 70, 75, 73 (mean 72.7, s ~2.5)

Solvent2: 80, 78, 82 (mean 80, s ~2)

Solvent3: 77, 76, 79 (mean 77.3, s ~1.5)

Overall mean ~76.7. Between-group variation: means 72.7, 80.0, 77.3

vary - calculable SS_between = ∑n_i*(mean_i - overall_mean)². If each

n_i=3:

(72.7-76.7)²3 + (80.0-76.7)²3 + (77.3-76.7)²3 = (-4)²3 + (3.3)²3 +

(0.6)²3 = 163 + 10.893 + 0.36*3 = 48 + 32.67 + 1.08 = 81.75.

df_between=2, MS_between=40.875.

Within-group: sum of squares of deviations in each group: e.g. group1

var ~2.5²=6.25, times (n1-1)=2 yields 12.5; do similarly others. Quick

approximate SSE ~ (2.5²2 + 2²2 + 1.5²*2) = (12.5 + 8 + 4.5) = 25.

df_within=N-k=9-3=6, MS_within ~4.17. Then F = 40.875/4.17 = 9.8.

df(2,6), check F critical ~5.14 for 0.05, so p maybe ~0.01. Indeed F

large, indicates a significant difference among means. Likely solvent2


(mean 80) is higher than solvent1 (72.7). We’d do Tukey’s test to

confirm which differences are significant, but clearly S2 vs S1 is big

~7.3 difference, S3 in middle might not differ from either significantly

depending on thresholds.

ANOVA assumptions: each group ~ normal, equal variances across

groups (ANOVA is somewhat robust if slight differences, but major

variance differences or outliers can distort F test). There are tests for

equal variances (Bartlett’s, Levene’s) – if violated, one can use Welch’s

ANOVA or transform data.

Software: MINITAB *Stat > ANOVA > One-way (when data are in

separate columns or as factor-level). It outputs an ANOVA table with

source (Factor, Error) and SS, df, MS, F, p. Also, many packages give

group means and possibly letter groupings from Tukey (like groups A, B

indicating which are significantly different).


From LO: Day 12 covers planning experiments and one-way ANOVA.

So this fits – students should know how to plan multiple conditions and

analyze via ANOVA.

7.4 Engineering Examples for t-test and ANOVA

 Two-sample t-test example (industrial): Comparing a new

filtration unit’s efficiency vs the old unit. H₀: same mean. If p <

0.05, new unit significantly improves efficiency.

 Paired test example: Emissions measured from an engine on

regular fuel vs biodiesel for each of 10 vehicles. Each vehicle acts

as its own control -> paired t on emission difference (H₀: mean

diff=0).

 ANOVA example: A Nigerian brewery tests 4 different

fermentation temperatures to maximize alcohol content (with 5

pilot fermenters at each temp). After fermentation, measure alcohol

%. Use one-way ANOVA to see if temperature affects mean

alcohol%. If F is significant, perhaps do Tukey to find optimal


temp (like maybe 20°C vs 25°C vs 30°C vs 35°C – often one finds

an optimum region).

 ANOVA in DOE context: This is essentially what design of

experiments use – if factors have multiple levels, ANOVA is used

to see factor significance. Chapter 9 will expand on factorial

design ANOVA, but one-way ANOVA is conceptually the starting

point (one factor with k levels).

7.5 Communicating Results

Reporting results might be like: “The two-sample t-test shows a

statistically significant increase in mean conversion with Catalyst A

(mean 85%) compared to Catalyst B (80%), p=0.045, at 95% confidence

level.” Or for ANOVA: “ANOVA results (F2,12 = 7.5, p = 0.008)

indicate a significant effect of temperature on yield. Post-hoc analysis

reveals the yield at 50°C (mean 90%) is significantly higher than at 40°C

(mean 82%, p<0.01), whereas 45°C (88%) is not significantly different

from 50°C or 40°C at α=0.05.”


This gives both statistical and practical context.

Chapter Summary: This chapter introduced the tools for comparing

means across groups – a vital part of analyzing experiments and process

changes. We learned that:

 Two-sample t-tests allow comparison of two independent samples

(or paired for dependent samples).

 ANOVA generalizes this to multiple groups, using F-test to avoid

inflating Type I error from multiple t-tests.

 We should check assumptions (normality, equal variances) to

validate our tests or use appropriate alternatives.

 Significant results tell us a difference exists, and engineers then

consider if it's practically meaningful.

These methods are the bridge to experimental design analysis. In the

next chapter, we will go deeper into designed experiments with multiple

factors, where ANOVA becomes even more central (with factorial


designs, interaction effects, etc.), and we will also see how to use

software like Design-Expert for DOE analysis.

End-of-Chapter Exercises:

1. Two-sample vs Paired: You want to test a new antiscalant

chemical in cooling towers. You measure scale deposition rate in

10 towers for a month without the chemical, then in the same 10

towers for the next month with the chemical. (a) Should you use a

paired test or two-sample test? Why? (b) What are H₀ and H₁? (c)

If the average reduction in scale is 15% with p = 0.03 from the test,

interpret this result.

2. Manual two-sample t: Sample1: n1=12, Xˉ1=50\bar{X}_1=50,

s1=4; Sample2: n2=10, Xˉ2=46\bar{X}_2=46, s2=5. Test H₀:

μ1=μ2 vs H₁: μ1≠μ2 at α=0.05. (a) Compute the t statistic (use

Welch’s formula for S_diff). (b) Approximate the degrees of

freedom (using formula or a rule of thumb; many textbooks give df

≈ min(n1-1,n2-1) as a rough lower bound). (c) Determine if p is


roughly <0.05 or not. (d) Would pooled t likely differ much here

(check ratio of variances)?

3. ANOVA concept: Three machines produce a part. You sample 10

parts from each and measure length. State H₀ and H₁ for ANOVA.

If p-value from one-way ANOVA is 0.001, what does that

conclude? Does it tell you which machine differs? How might you

find out?

4. ANOVA calc (small): Given groups data: A: [5,6,7], B: [5,5,6], C:

[7,8,9]. (a) Compute group means: Aˉ,Bˉ,Cˉ\bar{A}, \bar{B}, \

bar{C}. (b) Compute overall mean. (c) Sketch or conceptually

compute SS_between and SS_within (you can do it by steps as

shown). (d) Determine df and MS for each. (e) Compute F. (f)

Based on F (and knowledge that F crit ~5.14 for df2,6 at 0.05), is

there a significant difference?

5. Levene’s test: Explain in simple terms why having very different

variances in different groups could be a problem for ANOVA or t-

tests. If you found variances significantly different (p<0.05 in

Levene’s test), what are two possible approaches to proceed?


(Hint: one is using a different test like Welch’s ANOVA, another

is transforming data like using log scale).

6. Use of software: Suppose you have the catalyst yield data:

Catalyst A yields = [83,85,88,79,90], Catalyst B yields =

[80,78,85,82,75]. Use a software’s two-sample t (or manually

compute) to answer: (a) What is the difference in means and 95%

CI for it? (b) What is the p-value? (c) Does it assume equal

variances or not? If using MINITAB, also note if an F-test for

equal variances is given and its result.

7. Practical interpretation: A researcher reports “F(3,20)=2.35,

p=0.095” for a four-group comparison. What does that mean in

terms of statistical conclusion at α=0.05? And α=0.10? What might

the researcher consider doing (in terms of sample size or

experiment design) if differences were practically important but

not quite significant?


Chapter 8: Experimental Design – Fundamentals of Factorial

Experiments and ANOVA Applications

Learning Objectives: Students will: (1) understand the principles of

experimental design – in particular factorial designs (varying multiple

factors simultaneously); (2) know how to set up a full factorial

experiment and analyze it using ANOVA to determine main effects and

interactions; (3) gain exposure to the concept of blocking and

randomization in experiments; (4) use software (Design-Expert or

MINITAB) to design and analyze a factorial experiment, including

generating an ANOVA table and interpreting model coefficients.

(This chapter corresponds to planning and executing an experimental

program and analyzing it – a key learning outcome for the course. It

also aligns with the idea of weekly lab assignments where students

practice DOE in software.)

8.1 Why Design Experiments?


In chemical engineering, often we want to study the effect of several

factors (e.g. temperature, pressure, concentration) on an outcome (yield,

purity, conversion). Rather than vary one factor at a time (OFAT),

Design of Experiments (DOE) provides a structured, efficient

approach. DOE allows us to see interaction effects and get more

information from fewer runs.

Example scenario: You have a reactor and want to see how temperature

(high/low) and catalyst type (A/B) affect conversion. A full factorial

design with 2 factors at 2 levels each requires 2×2 = 4 experiments (plus

repeats possibly). That will show if, say, high temperature is better, A vs

B which is better, and if the effect of catalyst depends on temperature

(interaction).

Key design principles:

 Randomization: Run experiments in random order to avoid time-

related biases (like ambient conditions changing).

 Replication: Repeat runs to estimate experimental error (noise).


 Blocking: If some nuisance factor (like day or operator) might

affect results, structure experiment to block it (so each day runs all

conditions perhaps, then analyze day-to-day variation separately).

8.2 Full Factorial Designs and Notation

A full factorial design at two levels for k factors is 2^k runs (not

counting replicates or center points). At two levels, factors often coded

as -1 = low, +1 = high (coded units). For example, a 2² design:

 Run1: A=-1,B=-1 (low A, low B)

 Run2: A=+1,B=-1

 Run3: A=-1,B=+1

 Run4: A=+1,B=+1

These combinations are often visualized in a matrix or a cube for 3

factors (like Fig. 3.2 in NIST reference – a cube for 2³ design with

8 runs).

We can have 3-level factors, but that increases runs (3^k). Often we

stick to 2-level (for screening factors for significance) or fractional


designs to reduce runs (Chapter 10 might mention fractional, but let's

keep to full factorial here).

The advantage is we can estimate:

 Main effects: effect of changing one factor from low to high

(averaged over other factors).

 Interaction effects: e.g. AB interaction = does the effect of A

depend on level of B? Statistically, an interaction is present if the

difference in response between A high vs low is different when B

is high versus low.

ANOVA in DOE: We build a model:

y=β0+βAxA+βBxB+βABxAxB+errory = \beta_0 + \beta_A x_A + \

beta_B x_B + \beta_{AB} x_A x_B + \text{error}. For coded factors,

βA\beta_A is half the effect of A (difference between its two level

means). βAB\beta_{AB} captures interaction (difference in differences).


ANOVA can partition sums of squares into contributions for A, B, AB,

and error (if replicates exist). The p-value for each indicates if that effect

is significant.

Example (2² with replication): Suppose we run 4 combinations twice

each:

Low A, Low B yields: 50, 52

High A, Low B: 55, 54

Low A, High B: 60, 59

High A, High B: 70, 68

We can see trends: Increasing A (low→high) at B low: ~52→54.5

(effect +2.5). At B high: ~59.5→69 (effect +9.5). So A’s effect seems

bigger when B is high – an interaction.

Compute averages:

(We can formal: A effect = avg(High A) - avg(Low A) = ((54.5+69)/2 -

(51+59.5)/2) (but since equal replicates, easier individually:)

Low A overall mean = (50+52+60+59)/4 = 55.25.


High A overall mean = (55+54+70+68)/4 = 61.75.

So main effect A ~ +6.5 increase.

B effect:

Low B mean = (50+52+55+54)/4 = 52.75.

High B mean = (60+59+70+68)/4 = 64.25.

B effect ~ +11.5 increase from low to high.

Interaction AB:

If no interaction, the effect of A at low B (which was +2.5) would equal

effect of A at high B (+9.5) – clearly not equal. We compute AB

interaction effect as (observed A effect difference between B levels)/2

presumably = (9.5-2.5)/2 = 3.5 (depending on contrast coding). But

conceptually, AB is significant if differences like that are big relative to

noise.

We would do ANOVA (with error df=4*(2-1)=4 replicates deg) to

confirm significance. Likely B is significant (bigger effect), A maybe

significant, AB possibly significant too.


Design-Expert/MINITAB usage: Software can generate runs (with

random order) and then analyze.

Design-Expert specifically is built for DOE:

 You specify factors (names, levels), choose design (full factorial,

etc.), it gives run order.

 After experiments, you input results and use Analyze to fit a model.

It gives ANOVA table, showing which factors have significant p

(often highlighting in bold), model R², etc. It also can show effect

plots (pareto chart of effects, interaction plots, etc.).

MINITAB:

 Use Stat > DOE > Factorial > Create Factorial Design (choose

factors, replicates, blocks).

 Then Stat > DOE > Factorial > Analyze Factorial Design, select

terms (A, B, interactions) to include, and get output including an

ANOVA and coefficients.


We encourage students to try a simple DOE in software as a tutorial

exercise, e.g. a 2³ with hypothetical data, to see how results are

displayed.

8.3 Planning an Experiment (Example with 2 Factors)

Case Study: A Nigerian paint manufacturer wants to improve drying

time of paint. Factors: Temperature (25°C vs 35°C) and Drying Agent

additive (No vs Yes). Full factorial 2², duplicate runs. They randomize

run order. After collecting drying time data, they do ANOVA:

 If factor Temp p<<0.05 (say high temp dramatically lowers drying

time) and additive p ~0.01 (additive also lowers time), and

interaction p ~0.3 (no strong interaction), then main effects are

additive. Interpretation: Both higher temp and additive

independently speed up drying, and their effects roughly add (no

unexpected synergy or conflict).

 If an interaction were significant, e.g. maybe additive works only

at higher temp, they'd see that in interaction plot (lines crossing).


 Based on stats, they decide to implement high temp + additive

(provided any practical constraints like cost or equipment allow).

Important LO: "make appropriate conclusions based on experimental

results; plan and execute experimental program". That implies students

should be able to interpret an ANOVA and identify which factors matter

and then pick optimal conditions accordingly.

8.4 One-Way vs Factorial ANOVA

One-way ANOVA is one factor with multiple levels (like testing 5

catalysts, single factor "catalyst" with 5 levels). Factorial ANOVA

(Two-way or more) includes interactions. If we replicate each cell, we

can also separate interaction and error.

ANOVA Table for Factorial Example: Taking our 2² example, the

table might look like:

Source df SS MS F p

A 1 SSA SSA/1 F_A p_A

B 1 SSB SSB/1 F_B p_B


Source df SS MS F p

A*B 1 SSAB SSAB/1 F_AB p_AB

N - (effects+1) e.g. 8 total runs - (3


Error SSE MSE
effects +1) = 4 df

Total 7 SST

Each SS can be computed by contrasts (there are formulas using +1/-1

coding: e.g. SSA = [sum of (+1/-1 coded for A)*y / (2^(k-1))]^2 etc., but

not needed detail here).

The p-values tell significance. If the model has significant terms, one

might drop non-significant terms and refit (hierarchically – usually if

interaction not sig, drop it and re-estimate main effects with more df for

error).

Link to Communication: Students should present DOE results in

reports or presentations (LO7: communicate results in number of ways).

Typical DOE result communication includes:


 Graphical: interaction plots, main effect plots (Design-Expert can

plot these).

 Numerical: "ANOVA showed factor A (p=0.002) and factor B

(p=0.0005) significantly affect yield, AB interaction (p=0.4) not

significant. Thus, factors act mostly independently. The high levels

of both factors gave highest yield (95%), compared to ~80% at low

settings. We recommend using high A and high B."

8.5 Lab Task: DOE in Software

A suggested tutorial: Use Design-Expert (or MINITAB) to create a 2³

design (3 factors at 2 levels, perhaps Temperature, Pressure,

Concentration in a reactor). Provide some hypothetical or actual data.

Then:

 Show how to input it, get ANOVA results, and identify which

factors or interactions are significant.


 Possibly illustrate a normal probability plot of effects (common

in DOE analysis) which helps spot significant effects visually:

effects falling off straight line are likely significant.

Screenshot Example: We might include a screenshot of Design-

Expert’s ANOVA or an interaction plot.

Figure 8.1: 3D response surface from a Design-Expert RSM example,

illustrating how two factors (e.g., time and temperature) affect a

response (conversion). While this comes from a response surface design

(Chapter 10), in a 2-factor factorial scenario an interaction can

sometimes be visualized as surfaces that are not parallel planes. Here,

the curvature indicates interaction and quadratic effects. In factorial

ANOVA, significant interactions would be identified by statistical tests

rather than smooth surfaces.

(The figure above is an example from RSM where a 3D plot is available.

For a 2-level factorial, since responses only measured at corners, we

often visualize with interaction plots (lines). But due to limitations, we


show an RSM surface as a visual aid, acknowledging we'll formally

cover RSM next chapter.)

8.6 Measuring Experimental Error and Lack of Fit

If replicates are done, we get an estimate of pure error (the variation in

repeating the same conditions). ANOVA uses that as the denominator in

F-tests. If no replicates, one can’t separate error – then either assume

effects with no df leftover equals error (risky) or do center points etc. It's

advanced, but likely outside scope of undergrad intro beyond

encouraging replication.

Lack of fit vs pure error: Particularly in RSM designs, one can test if a

model (like a linear model) fits well or there's curvature unexplained.

For factorial at 2-level, you can't detect curvature with only corners, but

adding center runs allows testing lack of fit. This might be too advanced

here but just to mention: If center points included and the means at

center differ significantly from predicted by linear model (p<0.05 in


LOF test), that signals curvature, pushing toward a response surface

design (Chapter 10 topic).

8.7 Example from Nigerian Industry

Suppose a petrochemical plant wants to maximize distillation throughput

by adjusting reflux rate (low/high) and reboiler heat input (low/high).

They do 2² factorial with replicates. ANOVA shows a significant

interaction: maybe increasing reflux helps only if reboiler heat is also

high, otherwise it doesn’t. So the best operation is both high (the

combination effect > individual). They then implement that in the plant.

Another scenario: A lab in a Nigerian university tests how pH (6 vs 8)

and stirring speed (100 vs 300 rpm) affect biodiesel yield from palm oil.

They do DOE, find stirring has a big effect, pH not much, no interaction.

Conclusion: focus on optimizing mixing.

Chapter Summary: We have introduced the structured approach of

factorial experiments and how ANOVA is used to interpret them,

identifying which factors significantly impact an outcome and whether


factors interact. Students should now appreciate why planning an

experiment with multiple factors is more efficient than changing one

factor at a time, and how statistical analysis of the results gives objective

conclusions. In practice, DOE is a powerful tool for process

development and optimization, and software tools like Design-Expert

can greatly aid in both design and analysis (with visual aids like 3D plots

to help interpret results).

In the next chapter, we will extend these ideas to more advanced designs

(such as response surface methodology for optimizing a process via

quadratic models) and discuss handling of more complex scenarios (like

if you have many factors but want to minimize runs via fractional

factorials, or if factors have more than two levels).

End-of-Chapter Exercises:

1. Factorial planning: You have 3 factors to test (Catalyst type,

Temperature level, and Solvent type). Each can be at 2 levels. (a)

How many runs are in a full factorial 2³? (b) Why is it generally
better to do this factorial design than to keep one factor constant

and vary others one by one (consider interactions)? (c) If resources

only allow 8 runs (which is 2³ exactly) with no replicates, what

assumption are you making about error or about the significance of

effects?

2. Identifying interactions: Consider a 2² experiment with factors X

and Y. The results (avg responses) are: Low X, Low Y = 10; High

X, Low Y = 15; Low X, High Y = 20; High X, High Y = 30. (a)

Calculate the effect of X at low Y and at high Y. (b) Is there an

interaction? (c) Plot an interaction plot (sketch X on x-axis with

two lines for Y levels).

3. ANOVA table understanding: In a 2² with 3 replicates (so 12

runs), how many degrees of freedom for error do you get? If an

ANOVA produced F-statistics for factor A as F=10 (p=0.005),

factor B as F=0.5 (p=0.49), and interaction AB as F=0.2 (p=0.66),

which terms are significant? What would you conclude about

factor B and the interaction?


4. Software DOE: Use a software or manual calculation to analyze

the following factorial data (2 factors A and B at 2 levels, single

replicate):

Run (A,B): (-,-)=50; (+,-)=55; (-,+)=52; (+,+)=60.

(a) Compute main effects of A and B. (b) Compute the interaction

effect (hint: contrast = (observed (+,+)+(-,-) - [(+,-)+(-,+)]) / 2 ).

(c) Which effect seems largest? (d) If you had an estimate of error

with s_e = 2, would the largest effect be significant (say t =

effect/(s_e*0.5) > 2)?

5. Practical DOE design: Outline a small experiment (either real or

hypothetical) in your field where you would use a factorial design.

Describe factors and levels, and what response you measure. How

would randomization be done? After the experiment, assume you

got some results - describe qualitatively how you'd determine

which factors matter (what would you look at in the data or

analysis).

6. Linking with previous chapters: Why is randomization important

in DOE in terms of statistical assumptions? (Hint: if runs weren’t


randomized, how could that violate independence or introduce

bias?) Also, when we replicate runs and do ANOVA, which

previous concepts are used to assess if differences are real or just

due to variability? (Hint: think hypothesis testing – ANOVA’s F-

test is essentially testing H₀: all means equal.)

Chapter 9: Advanced Experimental Design – Response Surface

Methodology and Optimization

(Chapter 9 and beyond will introduce more advanced topics like RSM,

which involve big data analytics and optimization. Due to length, we

provide a brief outline here, and note that content continues in full

textbook...)

(The textbook would continue with chapters on response surface designs

(Box-Behnken, Central Composite), discussion of big data analytics in

chemical engineering (Chapter 11), cloud computing applications

(Chapter 12), and finally communication of statistical findings (Chapter


13), each with theory, examples, and exercises, following the

comprehensive approach outlined.)

You might also like