Quality and Lean: statistical thinking
Paolo Carbone, Antonio Moschitta
1
Goals
Quality of design intermixed with quality of
performance
Introduce practical statistical techniques to
address quality related issues
Techniques common to ISO 9001, Six Sigma,
lean approaches
2
Statistical thinking
Quality and statistics are not separable!
Bell Systems first used statistics in 1903
Became popular starting 1920
3
Key point
Processes/products(services) produce data
Quality managers/six sigma experts use powerful
statistical techniques to monitor processes
To account for possible variability random variables are
used to model data
Processes are considered stable (in control) if
variability and central tendency are stable over time
Quality is inversely related to variability!
4
Deming’s viewpoint
Variation exists in all systems. If stable, it can be
predicted
Variations is in the process, not necessarily induced by
people
Deming: ‘Numerical goals are meaningless’, they
generate fear
Managers are responsible for the system
5
Analyzing data
• Production of goods/services generates lots
of data
• Set value goals for your analysis
• Use parsimony principle (Occam’s razor)
• Need to know methods
• We concentrate on: estimation problems,
hypothesis testing, ANOVA and control
charts
6
Statistics
Descriptive (set of methods to appreciate and
synthesize data behavior)
Inferential/mathematical statistics (methods make
inferences and predict future systems’ behavior)
7
Example of descriptive
methods
central tendency
indicators
dispersion
indicators
graphical methods
8
More examples
9
Basics - review
• Random variables:
• Cumulative distribution function (CDF)
• Properties:
10
Basics - review
• Probability density function:
11
Basics - review
• Expectation operator
• If
12
Basics - review
• Statistical Moments
• Central moments
13
Basics - review
• Other important moments
• Variance
• Standard deviation
14
Gaussian rv
• PDF:
15
Gaussian rv
• CDF
Other important pdf’s:
- Poisson
- binomial
- exponential
- uniform
16
Basics - review
• Why is Gaussian PDF so important for
practical purposes?
• Central limit theorem (CLT)
• In practice: many independent concurrent
causes with comparable intensities may result
in overall Gaussian behavior
• e.g. tolerance in products, measurement
errors
• Measurement errors? Current practices tend
to avoid using these terms
17
Practical issues
• Possible problems:
• Are experimental results, the outcome of a
Gaussian rv?
• If data are indeed originated from a Gaussian
distribution, what is their mean value, their
standard deviation?
• Process must first be characterized under
nominal behavior, controlled/monitored for
verification of operations and analyzed for
improvements
18
Methods
• Probability theory vs statistical inference
• Deductive vs inferential approach
• What is of interest for us? Both approaches
• Can not do statistical inferences without
knowing probability theory (PT)
• Can not control/improve processes without
application of both approaches
• Assume you know basic PT
19
Data analysis for quality control:
4 tools/techniques
• Estimators
• Hypothesis testing
• ANOVA
• Control charts
20
Basics of statistics
• Two classes of typical problems:
• Estimation - Hypothesis testing
• Parametric vs non-parametric
• Estimation example (Gaussian rv, unknown
mean value):
• How do we estimate 𝜃?
21
Basics of statistics
• Search and find a statistics (a way to select,
combine observations) that provides us with
the ‘best’ approximation
• Best?
• Which criteria?
• Common: correctness, minimum variance,
robustness, computational complexity
• Example: mean estimator as first value in an
ordered sequence
• Is this a reasonable choice?
22
An example
• Estimator is a rv!
• Correctness:
• Estimator Variance:
• Minimum variance of unbiased estimators...
23
Statistical efficiency
• How well are we using information in our
data?
24
Hypothesis testing
• An example: you must devise a procedure to
accept/reject lots of supplied material
provided by external supplier
• Three approaches:
• verify all products
• do not verify any product
• verify a subset of the products
• Which implications?
25
Risks
• Producer’s risk: reject conforming products
• Consumer’s risk: accept not conforming
products
• Example: bipolar transistors (BJT) are sold to
our company. Requirement: mean current
gain = 100
• We know that each BJT’s current gain is a rv
26
Test
• Set of hypotheses
• H0: null hypothesis, H1: alternative hypothesis
• Decide on the basis of a ‘decision rule’
• Suggestions?
• Take a mean estimator
• If estimator above a certain threshold H0 is
true, otherwise H1 is true.
27
Errors
• Producer’s and consumer’s risks
• Define
• We have
28
Results
If S is the discrimination threshold
• that is the quantile of level alpha of Gaussian
rv with zero mean and unity variance
• Quantile 𝑞𝛼 | 𝐹 𝑞𝛼 = 𝛼
29
Test design
• The threshold becomes:
• Similarly:
30
See it
31
Test design
• Set of hypotheses
• Statistics
• Region of acceptance/rejection
• Decision rule
• If H0 is rejected, test is significative
32
Test parameters
• Results depend on , , ,
• If N fixed, when increases decreases
and viceversa
• If N decreases and decrease
• Is it reasonable/possible to increase N
arbitrarily?
33
The OCC
• Operating characteristic curve:
34
Types of tests
• Several possibilities, which implications?
• One-tailed versus two-tailed tests
35
Two-tailed test
• Involved pdf’s
36
Decision rule
• Need two thresholds:
• total type-I error probability:
• We obtain:
37
Operating Characteristic curve
• OCC in this case:
• or
• where:
38
Unknown variance
• Gaussian distribution, unknown variance,
inference on the mean, two tailed test:
• Estimate standard deviation as:
39
Unknown variance: test design
• Statistics:
• If H0 is true, t is a Student-t rv with N-1 df:
40
p-value
• if test significative nothing is said about trust
in results. We need additional information
• p-value is the smallest significance level at
which H0 can be rejected
• alternatively: observed significance level
• the smaller the p value, the greater the
weight of said evidence
• often reported to leave results interpretation
to the report reader
41
Probability plot
• Technique to test data pdf
• Non-parametric hypothesis test
• Based on the quantile-quantile plot
• Example with Gaussian quantiles of level p:
• zp versus zp provides a straight line graph
42
Gaussian probability test
• what if one axis is an estimated zp?
• Graph is a straight line only if data follows the
pdf used to obtain the quantile on the other
axis (e.g. Gaussian)
• How to estimate quantiles from data?
• order statistic
43
Gaussian probability plot
• The associated quantile level is
• Theoretical quantile is:
• Procedure:
• collect N-sample record
• order data
• associate to each datum:
• build a sequence of points,
• plot the sequence on a Cartesian plot 44
An example
• Approximately linear: data about normal
45
Gaussian paper
• Evaluation of requires computational
resources
• Possible to overcome this limit by using
normal paper, only need
• same graph as before, same results
46
Other pdfs
• Approach possible for major useful pdfs: Gaussian with arbitrary
mean and standard deviation, exponential, ...
• Predefined probability paper:
• Gaussian, same as before
• Exponential, different paper
• Same procedure: need only
47
ANOVA
• prove/disprove equality of mean values of
several populations
• gauge repeatability/riproducibility problem.
When taking measurements, variability may
be due to:
• repeatability conditions: same operator
carries out experiments under similar
conditions (variability due to fluctuations in
temperature/humidity, instruments,
measurand, ...)
48
ANOVA
• Riproducibility conditions: same experiments,
different conditions (laboratory, operators,
instruments)
• Is the total observed variability due to
repeatability or riproducibility conditions or
both?
• ANOVA can provide an answer
49
Data collection
• factor level: determines the riproducibility
context. For each level many experimental
repetitions are done
50
Data modeling
• Observations can be modeled as:
51
Fixed effects
• levels are fixed in number
52
Example
53
ANOVA results
54
Control charts
• Managing processes provides large amount of data
• Use these data to take decisions about process
status
• many critical to quality parameters can be modeled
using random variables
• Decide whether the process is ‘in control’ or ‘out-of-
control’
• When ‘in control’ mean value and variance
stationary, stable at predefined values
• When ‘out-of-control’ special causes to variability
occur 55
Control charts
• Owing to the Central Limit Theorem (CLT)
random variables are often assumed
Gaussian
56
Control charts
• Two phases:
• Characterize process, is it in-control? Does it
show natural tendencies?
• Decide upon
57
Example
• Gaussian rv,
• control mean value over time
• Using
• Two thresholds: LCL, UCL 58
Control chart
• As a result
59
Risks
• Sequential hypothesis testing
• Consumer’s and producer’s risks
• s = number of points to be plotted on the
chart before a type-I error occurs, ARL0
• ARL=average run length,
1
• Type-I 𝐴𝑅𝐿0 =
𝛼
60
Risks
1
• Similarly, type-II 𝐴𝑅𝐿1 =
1−𝛽
• Western Electric rules: more sensitive, but
increases the type-I error probability
61
Design of control chart
• rv parameters known or not known
• limits: ‘3-sigma’
• ‘probability’: using quantiles
• when parameters unknown, need estimators
62
Estimators: Gaussian case
• Preliminary analysis
63
Rational subgroup
• Data are grouped in set of lesser dimension
so that variability in subgroups are due only
on chance causes and between which any
difference may be due to assignable causes,
for which the chart has been designed
• Example: 3 shifts in day
64
Estimators
• Mean value(s)
• Standard deviation/variances (several
approaches)
• Based on range
65
Range-based estimation
• Properties
• Thus:
• Averaging the estimators Rm leads to a more
accurate estimator of
66
Range-based estimation
• NB: range-based estimators ignore the
information provided by samples in a rational
subgroup other than the minimum and the
maximum ones
67
Improved efficiency
• Sample standard deviation in the m-th
subgroup:
• with
68
Correct estimator
• Remove bias:
• with
69
Use all information
• Averaged estimator
• and
70
Alternatively
• Sample standard deviation applied to entire
matrix
• with
• ... however ... (between groups effects may
impact)
71
Variable control chart
• statistics:
• since statistics is
• 3-sigma limits
• Thus,
72
x-chart
• probability limits:
• if standard deviation unknown
73
Control chart
• Consequently
• and
• with tabulated
74
Range chart
• Used to monitor variability in the data
• statistics: range in rational subgroups
• Knowing that:
• limits are
• where
75
76
Example
• Given
• X-Chart limits:
• R-chart limits:
77
Charts
78
-MR Control charts
• MR is acronym of Mobile Range
• In some cases, subgroup has dimension N=1
• Create fictitious ranges
• mobile range defined as:
𝑀𝑅𝑖 = 𝑥𝑖 − 𝑥𝑖−1 , 𝑖 = 1, … , 𝑀 − 1
79
Control chart limits
• For the MR chart
• -chart
80
Example
• Data in table represent the
atomic weight of a silver
sample measured at NIST
by means of a mass
spectrometer
• Design -MR charts
81
Example
• MR-chart
• -chart
82
Results
83
-s Control charts
• When N >> 1,range-based estimator inefficient
• If parameter to be monitored is 𝐺 𝜃, 𝜎 then
84
If sigma unknown
• Estimator
• Limits
85
The -chart
• Limits
86
Example
A company provides electric
energy and considers the
frequency
at which energy is provided a
critical parameter for determining
product quality. The value of 50
Hz represents the reference
value.
Measurements are taken each
1.2 hours from 11 A.M. to 9.50
A.M. of the
following day, for N=8
consecutive days. Each subgroup
is formed using the N
corresponding values measured
at the same time of the day.
87
Charts
• -chart: and
• s-chart
88
Charts
89
Attribute charts
• Sometimes, parameters can be classified
using attributes (e.g. conforming/not
conforming)
• Want to monitor:
–not conformance probability/absolute number of
entities
–number of defects in lots
90
p-chart and np-charts
• Number n of not conforming products in N
tests
• Binomial rv with and variance
equal to
• where p is the probability of not conformance
in a single test.
91
p chart
• Statistics:
• mean value:
• Variance:
• When p is known (3 limits):
92
np chart
• If you want to monitor number of not
conforming, the statistics is x
• with
93
c,u control charts
• May happen that a product is conforming
even if it contains defects
• subgroup = lot of N product = inspected unit
• distribution of defects in lot: Poisson rv
94
c,u control charts
• For a Poisson rv:
• statistic = number of defects in inspected unit
• limits in c chart
95
c control chart
• If c unknown, must be estimated
• average number of defects in several
inspected units, then
96
u control chart
• Sometimes want to monitor number of
defects per product inspected in lots of N
products
• Statistics: Y=x/N, x=number of defects
• Expected value c/N and
• if u=c/N, limits:
97
Control chart design coefficients
98