Ms.
Priya Kothari
(confidence intervals )
Statistical inference is the process of analyzing the result and making conclusions from data
subject to random variation. It is also called inferential statistics.
Hypothesis testing and confidence intervals are the applications of the statistical inference.
Statistical inference is a method of making decisions about the parameters of a population,
based on random sampling.
It helps to assess the relationship between the dependent and independent variables. The
purpose of statistical inference to estimate the uncertainty or sample to sample variation. It
allows us to provide a probable range of values for the true values of something in the
population. The components used for making statistical inference are:
Sample Size
Variability in the sample
Size of the observed differences
Statistical Inference Examples
An example of statistical inference is given below.
Question: From the shuffled pack of cards, a card is drawn. This trial is repeated for 400 times, and the suits are given below:
Suit Spade Clubs Hearts Diamonds
No.of times 90 100 120 90
drawn
While a card is tried at random, then what is the probability of getting a
1.Diamond cards
2.Black cards
3.Except for spade
Solution:
By statistical inference solution,
Total number of events = 400
i.e.,90+100+120+90=400
(1) The probability of getting diamond cards:
Number of trials in which diamond card is drawn = 90
Therefore, P(diamond card) = 90/400 = 0.225
(2) The probability of getting black cards:
Number of trials in which black card showed up = 90+100 =190
Therefore, P(black card) = 190/400 = 0.475
(3) Except for spade
Number of trials other than spade showed up = 90+100+120 =310
Therefore, P(except spade) = 310/400 = 0.775
Importance of Statistical Inference
Inferential Statistics is important to examine the data properly.
To make an accurate conclusion, proper data analysis is important to interpret the research results.
It is majorly used in the future prediction for various observations in different fields.
It helps us to make inference about the data.
The statistical inference has a wide range of application in different fields, such as:
Business Analysis
Artificial Intelligence
Financial Analysis
Fraud Detection
Machine Learning
Share Market
Pharmaceutical Sector
1.Point Estimation
When a single value is used as an estimate, the estimate is called a point estimate of the
population parameter. In other words, an estimate of a population parameter given by a single
number is called as point estimation.
For example
(i) 55 is the mean mark obtained by a sample of 5 students randomly drawn from a class of
100 students is considered to be the mean marks of the entire class. This single value 55is
a point estimate.
(ii) 50 kg is the average weight of a sample of 10 students randomly drawn from a class of
100 students is considered to be the average weight of the entire class. This single value 50 is
a point estimate.
Note
The sample mean ( ) is the sample statistic used as an estimate of population mean (μ)
2. Interval Estimation
Generally, there are situations where point estimation is not desirable and we are interested in finding limits within
which the parameter would be expected to lie is called an interval estimation.
For example,
If T is a good estimator of θ with standard error s then, making use of general property of the standard deviations, the
uncertainty in T, as an estimator of q, can be expressed by statements like “ We are about 95% certain that the
unknown q, will lie somewhere between T-2s and T+2s”, “we are almost sure that q will in the interval ( T-3s and
T+3s)” such intervals are called confidence intervals and is explained below.
Confidence interval
After obtaining the value of the statistic ‘t’ (sample) from a given sample, Can we make some reasonable probability
statements about the unknown population parameter ‘ θ’ ?.
This question is very well answered by the technique of Confidence Interval. Let us choose a small value of a which is
known as level of significance(1% or 5%) and determine two constants say, c1 and c2 such that P (c1 < θ < c 2 |t) = 1 − α .
The quantities c1 and c2, so determined are known as the Confidence Limits and the interval [c 1,c2] within which the
unknown value of the population parameter is expected to lie is known as Confidence Interval. (1 − a) is called as
confidence coefficient.
Confidence Interval for the population mean for Large Samples (when is
known)
If we take repeated independent random samples of size n from a population with an
unknown mean but known standard deviation, then the probability that the true population
mean μ will fall in the following interval is (1− α) i.e
So, the confidence interval for population mean (μ), when standard deviation (σ) is known and is
given by
For the computation of confidence intervals and for testing of significance, the critical
values
Za at the different level of significance
Example 8.11
A machine produces a component of a product with a standard deviation of 1.6 cm in length. A random sample of 64
components was selected from the output and this sample has a mean length of 90 cm. The customer will reject the part if
it is either less than 88 cm or more than 92 cm. Does the 95% confidence interval for the true mean length of all the
components produced ensure acceptance by the customer?
Solution:
Here μ is the mean length of the components in the population.
The formula for the confidence interval is
Therefore, 90 − (1.96 × 0.2) ≤ μ ≤ 90 + (1.96 × 0.2)
(89.61 ≤ μ ≤ 90.39)
This implies that the probability that the true value of the population mean length of the
components will fall in this interval (89.61,90.39) at 95% . Hence we concluded that 95%
confidence interval ensures acceptance of the component by the consumer.
Example 8.12
A sample of 100 measurements at breaking strength of cotton thread gave a mean of 7.4 and a standard deviation of 1.2
gms. Find 95% confidence limits for the mean breaking strength of cotton thread.
Solution:
This implies that the probability that the true value of the population mean breaking strength of
the cotton threads will fall in this interval (7.165,7.635) at 95% .
Example 8.13
The mean life time of a sample of 169 light bulbs manufactured by a company is found to be 1350 hours with a standard
deviation of 100 hours. Establish 90% confidence limits within which the mean life time of light bulbs is expected to lie.
Solution:
Given: n = 169, = 1350 hours, s = 100 hours, since the level of significance is (100-90)% =10% thus a is 0.1, hence the
significant value at 10% is Za/2 = 1.645
Hence 90% confidence limits for the population mean are
Hence the mean life time of light bulbs is expected to lie between the interval (1337.35,
1362.65)
Hypothesis Testing
One of the important areas of statistical analysis is testing of hypothesis.
Often, in real life situations we require to take decisions about the population on the basis of sample information.
Hypothesis testing is also referred to as “Statistical Decision Making”.
It employs statistical techniques to arrive at decisions in certain situations where there is an element of uncertainty on the
basis of sample, whose size is fixed in advance.
So statistics helps us in arriving at the criterion for such decision is known as Testing of hypothesis which was initiated by
J. Neyman and E.S. Pearson.
For Example: We may like to decide on the basis of sample data whether a new vaccine is effective in curing cold, whether
a new training methodology is better than the existing one, whether the new fertilizer is more productive than the earlier
one and so on.
Statistical Hypothesis
Statistical hypothesis is some assumption or statement, which may or may not be true, about a population.
There are two types of statistical hypothesis
(i) Null hypothesis (ii) Alternative hypothesis
The actual test begins by considering two hypotheses. They are called the null hypothesis and the alternative hypothesis.
These hypotheses contain opposing viewpoints.
The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be
considered the status quo and as a result if you cannot accept the null it requires some action.
The alternative hypothesis:It is a claim about the population
that is contradictory to H0 and what we conclude when we reject H0.This is usually what the researcher is trying to prove.
Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if you have enough
evidence to reject the null hypothesis or not. The evidence is in the form of sample data.
After you have determined which hypothesis the sample supports, you make a decision.
There are two options for a decision. They are "reject H0 �0"
if the sample information favors the alternative hypothesis or "do not reject if the sample information favors the
alternative hypothesis or "do not reject
Table 9.2.19.2.1: Mathematical Symbols Used in H0 �0 and Ha ��:
H0�0 Ha��
not equal (≠)(≠) or greater than (>) or less
equal (=)
than (<)
greater than or equal to (≥)(≥) less than (<)
less than or equal to (≥)(≥) more than (>)
H0: always has a symbol with an equal in it.
Ha: never has a symbol with an equal in it. The choice of symbol depends on the wording of the
hypothesis test . However, be aware that many researchers (including one of the co-authors in research work) use = in
the null hypothesis , even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because
we only make the decision to reject or not reject the null hypothesis .
Exercise
A medical trial is conducted to test whether or not a new medicine reduces cholesterol
by 25%. State the null and alternative hypotheses.
Answer
•H0: The drug reduces cholesterol by 25%. p=0.25
•Ha: The drug does not reduce cholesterol by 25%. p≠0.25
Mean Median Mode Questions and Answers
1. Find the mean of the first 10 odd integers.
Solution:
First 10 odd integers: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19
Mean = Sum of the first 10 odd integers/Number of such integers
= (1 + 3 + 5 + 7 + 9 + 11 + 13 + 15 + 17 + 19)/10
= 100/10
= 10
Therefore, the mean of the first 10 odd integers is 10.
2. What is the median of the following data set?
32, 6, 21, 10, 8, 11, 12, 36, 17, 16, 15, 18, 40, 24, 21, 23, 24, 24, 29, 16, 32, 31, 10, 30, 35, 32, 18, 39, 12, 20
Solution:
The ascending order of the given data set is:
6, 8, 10, 10, 11, 12, 12, 15, 16, 16, 17, 18, 18, 20, 21, 21, 23, 24, 24, 24, 29, 30, 31, 32, 32, 32, 35, 36, 39, 40
Number of values in the data set = n = 30
n/2 = 30/2 = 15
15th data value = 21
(n/2) +1 = 16
16th data value = 21
Median = [(n/2)th observation + {(n/2)+1}th observation]/2
= (15th data value + 16th data value)/2
= (21 + 21)/2
= 21
3. Identify the mode for the following data set:
21, 19, 62, 21, 66, 28, 66, 48, 79, 59, 28, 62, 63, 63, 48, 66, 59, 66, 94, 79, 19 94
Solution:
Let us write the given data set in ascending order as follows:
19, 19, 21, 21, 28, 28, 48, 48, 59, 59, 62, 62, 63, 63, 66, 66, 66, 66, 79, 79, 94, 94
Here, we can observe that the number 66 occurred the maximum number of times.
Thus, the mode of the given data set is 66.
Variance is the expected value of the squared variation of a random variable from its mean
value, in probability and statistics. Informally, variance estimates how far a set of numbers
(random) are spread out from their mean value.
The value of variance is equal to the square of standard deviation, which is another central tool.
Variance is symbolically represented by σ2, s2, or Var(X).
The formula for variance is given by: