Basic Statistics
1. Introduction to Statistics
2. Probability distributions
- Binomial distribution
- Poisson Distribution
- Normal distribution
3. Sampling distributions and Estimation.
1. The concept of Statistics
Why Statistics?
Through the advancement of electronics and computers,
todays society is inundated with vast amount of data. In its
raw form, this data is of little use. But, with statistical
analysis, the data can be transformed into valuable
information. This knowledge is vital for drawing
conclusions and making decisions.
Statistical thinking will be one day be as necessary for
efficient citizenship as the ability to read and write.
H.G. Wells
The field of statistics can be broken into two major
areas: Descriptive statistics and Inferential
Statistics
Descriptive statistics: It describes some of the
fundamental features of a set of data (Population or
Sample) such as mean, median, standard deviation,
Inferential statistics: It deals with drawing
conclusions from a population based on information of the
sample (drawn from the population).
Probability and Statistics
Population
Descriptive
Statistics
Probability
Inferential
Statistics
Sample
4
Data Collection
A decision can be no better than the data
upon which it was based.
Why do we need to collect data?
1. To identify and/or verify a problem.
2. To Analyze a problem.
3. To understand, describe, or monitor a process
4. To Test a hypothesis
5. To find a relationship between inputs and outputs of a process
Two kinds of Numerical Data
Continuous data: length, height, volume,.
Discrete data: number of defects, number of
failures,.
Population and Sample
Population is a set or collection of all possible
objects or individuals of interest.
Finite population: ex) The number of employees in
Samsung Electro-Mechanics as of January 1, 2001.
Infinite population: ex) MLCC chips coming from the
production line.
Population and Sample
A Sample is any subset or sub collection of a
population.
A Random Sample of size n is a sample chosen
in such a way that every possible sample of size n has a
likely chance of being chosen equally. (unbiased).
It is highly unlikely to know the true population
parameters. There is a need to draw conclusions from
sample statistics.
8
Characteristics of distribution
Statistical analysis is detecting the characteristics of data distribution
and expressing that characteristics into figures.
Characteristics of distribution
Central tendency (mean, median,mode)
- It shows the location where data is centered.
Variation (range, variance, standard deviation)
- Degree of data scattering centered on the arithmetic mean
Shape
- In what direction is the data biased?
Central tendency
Mode
Most frequently occurring value in a data set.
Median
Number reflecting the 50% rank of a set of values.
1) In case of data in odd number : Data in the middle
2) In case of data in even number : (Sum of two data in the middle)/2
Mean(arithmetic mean)
X1 + X2 + X3 + + Xn
Average of population
=
N
Sample of population
X=
X1 + X2 + X3 + + Xn
n
Xi
=
N
Xi
=
n
10
Variability
Range
Numerical distance between the highest and the lowest
values in a data set.
Variance and Standard deviation
Population variance
2 =
( Xi X )2
N
Sample variance
( Xi X )2
S2 =
n-1
population standard dev.
=
( Xi X )2
N
Sample standard dev
S =
( Xi X )2
n-1
The arithmetic mean is a one-dimensional value, while variance is a twodimensional value. We get the standard deviation by extracting the square
root of the variance. In sample statistics, however, the variance loses 1
degree of freedom.. In case of the sample, it has n-1 degree of freedom
as divisor.
11
Comparison of symbols between parameter
and statistics
Value
population
sample statistics
number of set
mean
variance
s2
St. dev
Correlation coefficient
Regression coefficient
a, b
Error
12
2. Probability Distribution
It is the major pillar of the bridge that allows us to make
inferences about a population based on information
obtained from a sample
The Probability Distribution of a
discrete random variable is an assignment of
probabilities to each of the possible values that
the random variable can take on. And, its
mathematical model is the Probability Density
Function.
13
(1) Binomial distribution
The problem of determining the probability associated with
defective data.
A Binomial Distribution needs to satisfy the following
conditions:
1) A sequence of n Bernoulli trials.(Only two possible
outcomes)
2) Trials are identical.
3) Trials are independent.
4) Probability of success on every trial is the same.
14
Example
<Problem>
In a certain diode manufacturing process, the defective rate is known to
be 1%. When the inspector take 50 random sample every hour, what is
the probability of finding no more than 1 defective.
<Solution>
The solution can be obtained by adding the probability of finding none
and one.
At first, we will try to find the probability of finding none of defectives,
15
From Minitab
menu
Calc>Probability Distributions>Binomial
This is the place
where all the
probability
distributions
can be found!
16
Probability of finding none of defectives
Number of
Random Sample
Defectiv
e rate
No
defective
17
Result in Session window
Defective rate of 1%
Number of Random sample
Probability of no defective
is
0.6050.
18
Next, probability of one defective
In this case, we
put 1 here
Result is 0.3056
Total Probability:
0.6050+3056=0.9106
19
Another way of calculation using worksheet.
Prepare a following worksheet.
Input the number of
defect in C1( named x)
Prepare a column for
probability(named p)
20
From Minitab Menu
Calc>Probability
Distribution>Binomial
We use this
21
Result is..
Probability of no defective
Probability of one defective
Final answer is additives.
22
To find cumulative probability at a
time
Check
here!
Cumulative Probability
23
Understanding of Binomial Distribution
The binomial probability distribution is defined by
P(X=x)=nCxpx(1-p)n-x
n
Cx =
x
(n
n!
) =x!(n-x)!
The Binomial distribution is used frequently in quality control. It is
appropriate probability model for sampling from an infinitely large
population, where p represents the defective rate and x, the number of
defects out of n sample.
The control chart of defects is based on the Binomial
distribution with the mean and variance in the next page.
24
The property of binomial distribution
Binomial distribution for n=4, p=1/2
P(X)
1)
6/1
6
5/1
The probability distribution always shows
symmetry in p=0.5 although n is low.
6
4/1
6
3/1
2)
6
2/1
6
1/1
6
Form of binomial distribution
Binomial distribution for n=9, p=1/3
P(X)
0.3
If n increases, probability distribution gets
near
symmetry even not in p=0.5.
Expectation value, standard deviation,
variance of binomial distribution
Expectation value : = E(X) = np
0.2
Variance : 2 = Var(X) = np(1-p) = npq
Standard deviation : = np(1-p) = npq
0.1
0 1 2 3 4 5 6 7 8 9
25
(2) Poisson distribution
Poisson distribution is characterized by the form
the number of occurrences per unit
interval
Defect, Electric or Mechanical
failure, an arrival, call,..
Time, space, area,
26
example
<Problem>
Suppose that the number of wire-bonding defects per unit that
occur in a semiconductor device is Poisson distributed with
mean=4. Then, what is the probability that a randomly selected
semiconductor device will contain two or fewer wire-bonding
defect?
27
From Minitab menu
File>New>Minitab Worksheet
In the worksheet, make one
column of defect number(x),
And another column for
cumulative probability(p)
28
Calc>Probability Distribution>Poisson
1. Select Cumulative
2. Mean=4
3. Input defect number
column and output
column
29
Probability of no defect
Cumulative Probability of 0,1
Cumulative Probability of 0, 1, 2
30
Examples for Poisson Distribution
1. The number of speeding tickets issued in a certain county
per week
2. The number of disk drive failures per month for a particular
kind of disk drive
3. The number of calls arriving at an emergency dispatch
station per hour.
4. The number of flaws per square yard in a certain type of
fabric.
31
Relationship with RTY
P(X=x) =
e-m mx
x!
m : Average
x : no of occurence
When x=0
RTY = e-dpu
dpu = -ln(RTY)
32
(3) Normal distribution
The normal distribution is probably the most important
distribution in quality control and statistical analysis.
X~N( ,
Variable
Normal
distribution
Mean
Standard
deviation
Normal distribution is defined by the mean and
standard deviation.
33
The shape of normal distribution?
Symmetric
Unimodal
68.3
%
Bell-shaped
95.5
%
99.73
-4
-3
-2
-1
34
What is Sigma?
The distance from
mean to deflection
point.
68.3
%
95.5
%
99.73
-4
-3
-2
-1
68.3% of the
population values fall
between the limits
defined by the mean
plus and minus one
sigma.
35
Probability density function
The Probability distribution function is
defined by
36
Shapes of Normal curve
[For difference and ]
1 2 , 1 = 2
68.3
%
-4
-3
-2
-1
1 = 2 , 1
2
95.5
%
99.73
%
2
1
1 2 , 1
2
1 = 1
2
2
37
Standard Normal Distribution
X-
Z =
Is used for coordinate transformation.
It becomes normal distribution with mean=0 and
standard deviation=1.
N(0,12)
68.3
-4
-3
-2
95.5
99.7
%
-1 3%
0
38
Minitab application
Calc>Probability distribution>Normal
Find
area(probability)
with known x
Find x with
known
Probability
Minitab recognizes left-sided area as cumulative probability
39
Normal distribution Example 1
<Problem> The tensile strength of a certain product is an
important quality characteristics. It is known that the strength is
normally distributed with mean=40 and standard
distribution of 2, denoted as N(40,22).
When the customer wants a strength of at least 35, what is the
probability of customer satisfaction?
40
solution
2
N(40,22).
3
5
Known
spec.
What is
the
area?
40
Minitab solution
provides area here!
41
Calc>Probability
Distribution>Normal
Check here
Mean is 40
St. deviation is 2
X is 35
42
The area we
want(probability) is
1-0.0062=0.9938
43
Example 2
It is known that the quality characteristics of certain process
follows normal probability function(mean=0, st.dev.=1). When the
defective rate is 1%, what is the sigma level?
<Solution> The problem is to find the value of z when the
cumulative probability is known. In minitab, the inverse
cumulative probability is used.
44
Check here
Input 1-0.01=0.99
45
Z is 2.33
46
3. Sampling Distributions and Estimation
Question:
When we do not know the mean of the
population, we use sample but what is
degree of accuracy that this represent
the population mean?
47
Standard Error of the Mean
Mean of the
sample mean
Variance of the
sample mean
Standard error of the
mean
=
2
_ =
n
2
x
x_ =
48
Central Limit Theorem
For almost all populations, the sampling distribution of
the mean can be approximated closely by a normal
distribution, provided the sample size is sufficiently
large.
Z=
X-
/n
49
Estimation
Estimate parameters out of sample
1) Point Estimation
single number
2) Interval Estimation
estimate confidence interval
50
Confidence interval for population mean.
1) Known standard deviation : use Normal distribution
=0.05 Z/2 -Z /2
, : 95%
/2 = 0.025
P(L< <U) = 1-
X-
P(-Z /2 < /n
-Z0.025= -1.96
<Z
/2
) = 1-
X- Z /2 /n < < X+ Z /2 /n
/2 = 0.025
Z0.025= 1.96
100(1-)
51
2)unknown standard deviation : t-distribution
=0.05 t/2 -t /2
, Reliability standard :
95%
P(L< <U) = 1-
P(-t /2 <
X-
S/n < t
/2
) = 1-
X- t /2 S/n < < X+ t /2 S/n
100(1-)
) t- n-1 t /2, n-1
.
52
Example
1. A random sample of 64 customers at a local
supermarket showed that their average shopping
time was 33 minutes with a sample standard
deviation of 16 minutes. Find a 90% confidence
interval for the true average shopping time.
2. A test on a random sample of 9 cigarettes yielded an
average nicotine content of 15.6 milligrams and a
standard deviation of 2.1 milligrams. Construct a 99%
confidence interval for the true but unknown average
nicotine content of this particular brand of cigarette.
Assume that nicotine content is normally distributed.
53