Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
29 views31 pages

CH4020 Assignment 1

Solutions

Uploaded by

Daljeet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views31 pages

CH4020 Assignment 1

Solutions

Uploaded by

Daljeet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Question 2

A question commonly asked in GTC/DC meetings is whether you have put error bars in
your figures.

1. Represent any data of your choice in the form of histograms and box plots.

2. What are error bars?

3. How will you incorporate error bars in your data?

4. Explain how error bars may be plotted using different tools? Put up screenshots.

Answer:

1. Histogram and Box Plot Representation:


To represent data in the form of histograms and box plots, let’s consider a dataset
consisting of 20 random values. Let these represent marks scored out of 20 by a class
of 20 students in a test. A histogram(Figure 1) shows the frequency distribution of
the data, while a box plot(Figure 2) provides a summary of the distribution using the
median, quartiles, and potential outliers.

Figure 1: Histogram

2. What are Error Bars?


Error bars are graphical representations of the variability of data. They provide an
indication of the uncertainty or error in a reported measurement. Error bars can
represent different types of errors, such as:

• Standard deviation (SD)


• Standard error of the mean (SEM)

3
Figure 2: Boxplot

• Confidence intervals (CI)

They are typically drawn as vertical or horizontal lines on a plot, extending from a
data point to show the range within which the true value is likely to fall.

3. Incorporating Error Bars in Data:


To incorporate error bars(Figure 3), we first need to calculate the appropriate measure
of error for each data point (such as the standard deviation or standard error). These
values are then used to create the error bars.

Figure 3: Error Bars

4. Plotting Error Bars Using Different Tools:


Error bars can be plotted using various tools, such as:

4
• Python (Matplotlib): The ‘errorbar‘ function in Matplotlib can be used to add
error bars to a plot(Figure 4).

Figure 4: Python (Matplotlib)

• Matlab: In MATLAB(Figure 5), we use the errorbar() function to add error bars
to your data plot.

Figure 5: Matlab

Question 3
With respect to the standard normal distribution, find the following [5]:

1. P (−∞ < Z < 0)

2. P (−∞ < Z < 0.2)

3. P (Z > 0.2)

4. P (−1 < Z < 1)

5. P (−2 < Z < 2)

Answer:

1. 0.5

5
2. 0.57926

3. 0.42074

4. 0.68269

5. 0.9545

Question 4
In the annual book exhibition held in YMCA grounds, Chennai, over the past several years
during Pongal festival, there is a steady stream of visitors. They visit the various stalls and
exit the exhibition at different times. Depending on the path they take – a few bored ones
may bypass many stalls and exit practically instantaneously, some may follow the straight
path, some may keep circling, unwilling to leave, etc. Suppose the duration of their visit is
described in terms of a random variable T and the associated continuous probability density
function is given by
 
1 t
f (t) = exp −
τ τ
Here τ is a parameter of the distribution with units of time.

1. What are the expected bounds of the probability distribution function?

2. Show that the above function is a legitimate probability distribution function.

3. What is the mean duration of the visit to the book exhibition?

4. What is the variance of this distribution?

Answer:

1. The random variable T represents time, so it must be non-negative. This means t can
take any non-negative value, starting from 0 and extending to infinity.
At t = 0:  
1 0 1
f (0) = exp − = (1)
τ τ τ
As t → ∞:
 
1 t
lim f (t) = lim exp − =0 (2)
t→∞ t→∞ τ τ

Since the given probability distribution function is a decreasing function the upper
bound is τ1 and the lower bound is 0.

6
2. For f (t) to be a legitimate probability distribution function, it must satisfy the below
two conditions:

(a) Non-negativity: f (t) ≥ 0 for all t.


As shown in part 1 of the question, the lower bound is 0 and the upper bound is
1
τ
, hence the pdf is non-negative.
(b) Normalization: The total probability must integrate to 1 over the entire range
of possible values of t. Z ∞
f (t) dt = 1.
−∞

Since t ≥ 0
Z ∞ Z ∞  
1 t
f (t) dt = exp − dt
0 0 τ τ
Let u = τt , hence du = dt
τ
or dt = τ du.
Substituting into the integral:
Z ∞   Z ∞
1 t
exp − dt = exp(−u) du
0 τ τ 0

The integral of exp(−u) from 0 to infinity is:


Z ∞
exp(−u) du = [− exp(−u)]∞
0 = [0 − (−1)] = 1
0

Since the integral equals 1, the given function f (t) is a legitimate probability
distribution function.

3. The mean µ of the distribution is given by the expected value E[T ]:


Z ∞
µ = E[T ] = tf (t) dt
0

Substituting f (t) into the integral:


Z ∞  
1 t
µ= t · exp − dt
0 τ τ

Let u = τt , so t = τ u and dt = τ du.


Z ∞ Z ∞
1
µ= τ u · exp(−u) · τ du = τ u exp(−u) du
0 τ 0
R∞
The integral 0 u exp(−u) du is a known standard result solved using the gamma
function and equals 1.

7
Therefore,

µ=τ

The mean duration of the visit to the book exhibition is τ .

4. The variance σ 2 of the distribution is given by:

σ 2 = E[T 2 ] − (E[T ])2

First, calculate E[T 2 ]:


Z ∞
2
E[T ] = t2 f (t) dt
0

Substituting f (t) into the integral:


Z ∞  
2 12 t
E[T ] = t · exp − dt
0 τ τ

Let u = τt , so t = τ u and dt = τ du.


Z ∞ Z ∞
2 1
E[T ] = τ u · exp(−u) · τ du = τ 2
2 2
u2 exp(−u) du
0 τ 0
R∞
The integral 0 u2 exp(−u) du is a known standard result solved using the gamma
function and equals 2.
Therefore,

E[T 2 ] = 2τ 2

Now, substitute into the variance formula:

σ 2 = E[T 2 ] − (E[T ])2 = 2τ 2 − τ 2 = τ 2

The variance of this distribution is τ 2 .

Question 5
1. The average grade on a mathematics test is 52, with a standard deviation of 5. If the
instructor assigns S’s to the highest 10%, and the grades follow a normal distribution,
what is the lowest grade that will be assigned an ‘S’ ?

8
2. A teacher decides that the top 10% of students should receive S’s and the next 15%
A’s. If the test scores are normally distributed with a mean of 70 and a standard
deviation of 10, find the scores that should be assigned S’s and A’s.

Answer:
1. The lowest grade that will be assigned an ‘S’ is the one corresponding to the 90th
percentile of a normal distribution with a mean of 52 and a standard deviation of
5. The z-score corresponding to the 90th percentile is approximately 1.28. Using the
z-score formula:

X −µ
z=
σ
Substituting the known values:

X − 52
1.28 =
5
Solving for X:

X = 1.28 · 5 + 52 = 6.4 + 52 = 58.4


Therefore, the lowest grade that will be assigned an ‘S’ is approximately 58.4.
2. We need to find the scores that correspond to the top 10% and the next 15%. The test
scores are normally distributed with a mean of 70 and a standard deviation of 10.
For the top 10% (S’s), the z-score is 1.28. Using the z-score formula:

X − 70
1.28 =
10
Solving for X:

X = 1.28 · 10 + 70 = 12.8 + 70 = 82.8


For the next 15% (A’s), we find the z-score for the 75th percentile, which is approxi-
mately 0.67. Using the z-score formula:

X − 70
0.67 =
10
Solving for X:

X = 0.67 · 10 + 70 = 6.7 + 70 = 76.7


Therefore, the scores that should be assigned S’s are above 82.8, and the scores that
should be assigned A’s are between 76.7 and 82.8.

9
Question 6
A probability density function is given such that:
(
Ax + b, 0 ≤ x ≤ 1
f (x) =
0, elsewhere
19
with the condition P (0.25 < X < 0.5) = 80
. Find the constants A and b, as well as the
mean value.

Answer:

1. Step 1: Apply the condition for a valid PDF


For f (x) to be a valid probability density function, it must satisfy the normalization
condition:
Z 1
f (x) dx = 1
0

Substituting f (x) = Ax + b:
Z 1
(Ax + b) dx = 1
0

Perform the integration:


Z 1  1
A 2
(Ax + b) dx = x + bx
0 2 0

Substitute the limits:


 
A 2 A 2
(1 ) + b(1) − (0 ) + b(0) = 1
2 2

Simplify:

A
+b=1
2
This gives the first equation:

A
+ b = 1 (Equation 1)
2

10
19
2. Step 2: Use the given probability P (0.25 < X < 0.5) = 80
This probability is computed as:
Z 0.5
19
P (0.25 < X < 0.5) = f (x) dx =
0.25 80

Substitute f (x) = Ax + b:
Z 0.5
19
(Ax + b) dx =
0.25 80
Perform the integration:
Z 0.5  0.5
A 2
(Ax + b) dx = x + bx
0.25 2 0.25

Substitute the limits:


   
A 2 A 2 19
(0.5 ) + b(0.5) − (0.25 ) + b(0.25) =
2 2 80

Simplify:
   
A A 19
+ 0.5b − + 0.25b =
8 32 80

Simplify further:

7.5A + 20b = 19 (Equation 2)

3. Step 3: Solve the system of equations


We now have two equations:

A
+b=1
2

7.5A + 20b = 19

Solve the first equation for b:

A
b=1−
2
Substitute this into the second equation:

11
 
A
7.5A + 20 1 − = 19
2
Simplify:

7.5A + 20 − 10A = 19

Solve for A:

A = 0.4

Substitute A = 0.4 into the first equation to find b:

0.4
+b=1
2

0.2 + b = 1

b = 1 − 0.2 = 0.8

Thus, the constants are:

A = 0.4, b = 0.8

4. Step 4: Find the mean value


The mean µ of a continuous random variable with PDF f (x) is given by:
Z 1
µ= xf (x) dx
0

Substitute f (x) = Ax + b = 0.4x + 0.8:


Z 1
µ= x(0.4x + 0.8) dx
0

Distribute x:
Z 1
µ= (0.4x2 + 0.8x) dx
0

Now integrate term by term:


Z 1 Z 1
2
µ = 0.4 x dx + 0.8 x dx
0 0

12
These integrals are standard:
Z 1 Z 1
2 1 1
x dx = , x dx =
0 3 0 2
Substitute the results:

1 1
µ = 0.4 · + 0.8 ·
3 2
Simplify:

0.4 0.8 0.4


µ= + = + 0.4
3 2 3
Thus, the mean value is:

1.6
µ= ≈ 0.533
3

Question 7
A student waits for an electric vehicle in his Institute. He knows that the e-vehicle comes
every 15 minutes, but he doesn’t know when the next one will come. Let’s assume the vehicle
is as likely to come in any one instant as in any other within the next 15 minutes.

1. Is the student’s waiting time a continuous or discrete random variable? What are its
maximum and minimum values?

2. What is the appropriate probability distribution/density function?

3. What is the probability that the vehicle will come within the next 15 minutes?

4. If the student has to be in a location within the next 15 minutes and the total travel
time is 10 minutes, what is the probability that the student will make it on time?

Answer:

1. The student’s waiting time is a continuous random variable because it can take any
value within a given interval. Specifically, the waiting time T can be any value between
0 and 15 minutes.
The minimum value of T is 0 minutes (if the vehicle arrives immediately) and the
maximum value is 15 minutes (if the vehicle arrives just before the 15-minute mark).

13
2. Since the vehicle is equally likely to arrive at any moment within the 15-minute window,
the waiting time T follows a uniform distribution.
The probability density function (PDF) for a uniform distribution over the interval
[0, 15] is:
(
1
15
, 0 ≤ t ≤ 15
f (t) =
0, elsewhere

3. The probability that the vehicle will come within the next 15 minutes is 1, because it
is guaranteed to arrive within this time frame.

4. If the student needs to be at a location within the next 15 minutes and the travel
time is 10 minutes, the student must catch the vehicle within the first 5 minutes of the
15-minute window to make it on time.
Let X be the time at which the vehicle arrives. The student will be on time if X ≤ 5.
The probability of this event is:
Z 5
P (X ≤ 5) = f (t) dt
0

Substitute f (t):
Z 5
1 5 1
P (X ≤ 5) = dt = =
0 15 15 3

Therefore, the probability that the student will make it on time is 13 .

Question 8
The joint probability density function of random variables X and Y is described by
(
e−x e−y , for x > 0 and y > 0
fXY (x, y) =
0, otherwise
Is this a legitimate function? Find the probability {1 < X < 2 and 0 < Y < 2}.

Answer:

1. Legitimacy of the Function:


To determine if fXY (x, y) is a legitimate joint probability density function, we need to
check if it satisfies the normalization condition:
Z ∞ Z ∞
fXY (x, y) dx dy = 1
0 0

14
Substitute fXY (x, y) = e−x e−y :
Z ∞ Z ∞
e−x e−y dx dy
0 0

This can be separated into two integrals:


Z ∞  Z ∞ 
−x −y
e dx e dy
0 0

Evaluate each integral:


Z ∞ ∞
e−x dx = −e−x 0 = 1

0

Z ∞ ∞
e−y dy = −e−y 0 = 1

0

Thus,
Z ∞ Z ∞
e−x e−y dx dy = 1 × 1 = 1
0 0

Since the total integral is 1, fXY (x, y) is a legitimate joint probability density function.

2. Probability {1 < X < 2 and 0 < Y < 2}:


To find this probability, we need to integrate the joint density function over the specified
range:
Z 2 Z 2
P (1 < X < 2 and 0 < Y < 2) = e−x e−y dy dx
1 0

Evaluate the inner integral first:


Z 2 2
e−y dy = −e−y 0 = 1 − e−2

0

Now, substitute this result into the outer integral:


Z 2
e−x 1 − e−2 dx

1

Distribute e−x :
Z 2
−2
e−x dx

1−e
1

15
Evaluate the remaining integral:
Z 2 2
e−x dx = −e−x 1 = e−1 − e−2

1

Combine everything:

1 − e−2 e−1 − e−2


 

Simplify:

P (1 < X < 2 and 0 < Y < 2) = 1 − e−2 e−1 − e−2


 

Question 9
According to Chebyshev’s theorem, the probability that any random variable X will assume
a value within k standard deviations of the mean is at least 1 − k12 , i.e.,
1
P (µ − kσ < X < µ + kσ) ≥ 1 − .
k2
For the random variable following the normal distribution, is Chebyshev’s theorem valid?
Choose for convenience, k = 2.

Answer:

1. Chebyshev’s Theorem:
Chebyshev’s theorem states that for any random variable X with mean µ and standard
deviation σ, the probability that X lies within k standard deviations of the mean is
at least 1 − k12 . This theorem applies to all types of distributions, not just normal
distributions.

2. Application to Normal Distribution:


For a normal distribution, the probability of a random variable X falling within k
standard deviations of the mean is given by the cumulative distribution function (CDF)
of the normal distribution. Specifically, for k = 2:

P (µ − 2σ < X < µ + 2σ) = Φ(2) − Φ(−2),

where Φ is the CDF of the standard normal distribution.


Using standard normal distribution tables or a calculator:

Φ(2) ≈ 0.9772 and Φ(−2) ≈ 0.0228

Therefore:

16
P (µ − 2σ < X < µ + 2σ) = 0.9772 − 0.0228 = 0.9544

According to Chebyshev’s theorem:

1 1
P (µ − 2σ < X < µ + 2σ) ≥ 1 − 2
= 1 − = 0.75
2 4
Thus, Chebyshev’s theorem provides a lower bound of 0.75 for the probability, while
the actual probability for a normal distribution is approximately 0.9544.

3. Conclusion:
Yes, Chebyshev’s theorem is valid for any distribution, including the normal distri-
bution. However, the theorem provides a more general bound that is not necessarily
tight for the normal distribution. For a normal distribution, the actual probability
within k = 2 standard deviations of the mean is higher than the bound provided by
Chebyshev’s theorem.

Question 10
Let the random variables X1 and X2 denote the length and width, respectively, of a man-
ufactured part. Assume that X1 is normal with E(X1 ) = 2 cm and standard deviation 0.1
cm, and that X2 is normal with E(X2 ) = 5 cm and standard deviation 0.2 cm. Also, assume
that X1 and X2 are independent. Determine the probability that the perimeter exceeds 14.5
cm.

Answer:
The perimeter P of the part is given by:

P = 2(X1 + X2 )
We want to find the probability that the perimeter exceeds 14.5 cm:

P (P > 14.5) = P (2(X1 + X2 ) > 14.5)


Dividing both sides of the inequality by 2:

P (X1 + X2 > 7.25)


We need to find the distribution of X1 + X2 . Since X1 and X2 are independent normal
random variables, X1 + X2 is also normally distributed with:

E(X1 + X2 ) = E(X1 ) + E(X2 ) = 2 + 5 = 7

Var(X1 + X2 ) = Var(X1 ) + Var(X2 ) = (0.1)2 + (0.2)2 = 0.01 + 0.04 = 0.05


Thus, the standard deviation of X1 + X2 is:

17

σX1 +X2 = 0.05 ≈ 0.2236
So, X1 + X2 is normally distributed with mean 7 and standard deviation approximately
0.2236.
To find P (X1 + X2 > 7.25), we standardize this:

(X1 + X2 ) − 7
Z=
σX1 +X2
We need:
 
7.25 − 7
P (X1 + X2 > 7.25) = P Z >
0.2236
Calculate the Z-score:
7.25 − 7
Z= ≈ 1.119
0.2236
Now, use the standard normal distribution table or a calculator to find:

P (Z > 1.119) ≈ 1 − Φ(1.119)


Using the standard normal CDF:

Φ(1.119) ≈ 0.8686
Thus:

P (Z > 1.119) ≈ 1 − 0.8686 = 0.1314


Therefore, the probability that the perimeter exceeds 14.5 cm is approximately 0.1314.

Question 11
Let X and Y be independent, normal random variables with E(X) = 2, Var(X) = 5,
E(Y ) = 6, and Var(Y ) = 8. Determine the following:

1. E(3X + 2Y )

2. Var(3X + 2Y )

3. P (3X + 2Y < 18)

4. P (3X + 2Y < 28)

Answer:
Let Z = 3X + 2Y .

18
1. Expected value:
The expected value of Z is:

E(Z) = E(3X + 2Y ) = 3E(X) + 2E(Y )

Substituting the given values:

E(Z) = 3 · 2 + 2 · 6 = 6 + 12 = 18

So:

E(3X + 2Y ) = 18

2. Variance:
The variance of Z is:

Var(Z) = Var(3X + 2Y ) = 32 Var(X) + 22 Var(Y )

Since X and Y are independent, their covariance is zero. Therefore:

Var(Z) = 9 · Var(X) + 4 · Var(Y )

Substituting the given variances:

Var(Z) = 9 · 5 + 4 · 8 = 45 + 32 = 77

So:

Var(3X + 2Y ) = 77

3. Probability P (3X + 2Y < 18):


Since X and Y are normal random variables, Z is also normally distributed with mean
18 and variance 77. We standardize this to find:

Z ∼ N (18, 77)

To find P (3X + 2Y < 18):


 
3X + 2Y − 18 18 − 18
P (3X + 2Y < 18) = P √ < √ = P (Z < 0)
77 77
For a standard normal distribution Z ∼ N (0, 1), P (Z < 0) = 0.5:

P (3X + 2Y < 18) = 0.5

19
4. Probability P (3X + 2Y < 28):
Standardize this to find:
   
3X + 2Y − 18 28 − 18 10
P (3X + 2Y < 28) = P √ < √ =P Z<√
77 77 77
Calculate the Z-score:

10
√ ≈ 1.141
77
Using the standard normal CDF:

P (Z < 1.141) ≈ 0.8729

So:

P (3X + 2Y < 28) ≈ 0.8729

Question 12
A tobacco company claims that the amount of nicotine in cigarettes is a random
variable with mean µ = 2.2 mg and standard deviation σ = 0.3 mg. However, the
sample mean nicotine content of 100 randomly chosen cigarettes was x̄ = 3.1 mg.
What is the approximate probability that the sample mean would have been as high
or higher than 3.1 mg if the company’s claim was true?

Answer:
To determine the probability that the sample mean would have been as high or higher
than 3.1 mg, we can use the Central Limit Theorem (CLT). The CLT states that the
distribution of the sample mean will be approximately normal if the sample size is
sufficiently large.
Given:

• Population mean (µ) = 2.2 mg


• Population standard deviation (σ) = 0.3 mg
• Sample size (n) = 100
• Sample mean (X̄) = 3.1 mg

20
The standard error of the mean is given by:

σ
SEM = √
n
Substituting the values:

0.3 0.3
SEM = √ = = 0.03
100 10
The Z-score for the sample mean can be calculated using the formula:

X̄ − µ
Z=
SEM
Substituting the values:

3.1 − 2.2 0.9


Z= = = 30
0.03 0.03
A Z-score of 30 is extremely large, which means the probability that the sample mean
would be as high or higher than 3.1 mg, assuming the company’s claim is true, is
extremely small.
For practical purposes, a Z-score of 30 corresponds to a probability so close to 0 that
it is effectively 0. This implies that the observed sample mean of 3.1 mg is highly
unlikely if the true mean is 2.2 mg, as claimed by the company.

Question 13
In a scholarship programme of an Institute for the fourth year, students studying with
CGPA of over 8 receive a scholarship of Rs. 15,000. Students with CGPA between 7
and 8 receive Rs. 10,000. Students with CGPA between 6 and 7 receive a scholarship
of Rs. 5,000. The fourth-year programme of this Institute has 500 students, and their
grades are normally distributed with mean µ = 5.2 and standard deviation σ = 1.2.
What is the total cost to the Institute for providing these scholarships?

Answer:
To calculate the total cost to the Institute for providing scholarships, we need to
determine the proportion of students in each CGPA range and then compute the
corresponding total scholarship amount.
Given:

• Number of students: N = 500


• CGPA is normally distributed with mean µ = 5.2 and standard deviation σ = 1.2.

21
• Scholarship amounts:
– CGPA > 8: Rs. 15,000
– 7 < CGPA ≤ 8: Rs. 10,000
– 6 < CGPA ≤ 7: Rs. 5,000

To find the proportion of students in each CGPA range, we first standardize the CGPA
values using the Z-score formula:

X −µ
Z=
σ
• For CGPA > 8:
8 − 5.2 2.8
Z= = ≈ 2.33
1.2 1.2
• For CGPA = 7:
7 − 5.2 1.8
Z= = ≈ 1.5
1.2 1.2
• For CGPA = 6:
6 − 5.2 0.8
Z= = ≈ 0.67
1.2 1.2
Using the standard normal distribution table:

• Proportion of students with CGPA > 8: P (Z > 2.33) ≈ 0.0099


• Proportion of students with CGPA > 7: P (Z > 1.5) ≈ 0.0668
• Proportion of students with CGPA > 6: P (Z > 0.67) ≈ 0.2514

So, the proportion of students in each range is:

• CGPA > 8: P (Z > 2.33) = 0.0099


• 7 < CGPA ≤ 8: P (1.5 < Z ≤ 2.33) = P (Z > 1.5) − P (Z > 2.33) = 0.0668 −
0.0099 = 0.0569
• 6 < CGPA ≤ 7: P (0.67 < Z ≤ 1.5) = P (Z > 0.67) − P (Z > 1.5) = 0.2514 −
0.0668 = 0.1846

Multiply the total number of students (500) by the proportions:

• Number of students with CGPA > 8: 500 × 0.0099 ≈ 4.95 ≈ 5 students


• Number of students with CGPA 7 < CGPA ≤ 8: 500 × 0.0569 ≈ 28.45 ≈ 28
students
• Number of students with CGPA 6 < CGPA ≤ 7: 500 × 0.1846 ≈ 92.3 ≈ 92
students

Now, multiply the number of students by the respective scholarship amounts:

22
• For CGPA > 8: 5 × 15000 = 75000 Rs.
• For CGPA 7 < CGPA ≤ 8: 28 × 10000 = 280000 Rs.
• For CGPA 6 < CGPA ≤ 7: 92 × 5000 = 460000 Rs.

Total Cost:
The total cost to the Institute is:

75000 + 280000 + 460000 = 815000 Rs.

So, the total cost to the Institute for providing these scholarships is Rs. 815,000.

Question 14
Concentrations of a toxic agent are measured at a plant exit pipe. Assume the con-
centrations are normally distributed. From extensive plant data maintained over 40
years, the population mean may be taken as 41.2 g/L and standard deviation is 0.90
g/L.

(a) What is the probability that the concentration in this effluent will be more than
42.3 g/L?
(b) There is a change in the process which theoretically should not affect the popu-
lation mean. Assume that population standard deviation is unaltered. To check
that there is no change in population mean, five samples are taken at the outlet
and if the sample mean is more than 42.3 g/L then corrective action must be
taken.
i. What is the p-value expressed as a percentage associated with this test if
corrective action has to be taken?
ii. State the null and alternate hypotheses.

Answer: part (a): Probability that the concentration is more than 42.3 g/L
Given:

• Population mean, µ = 41.2 g/L


• Population standard deviation, σ = 0.90 g/L
• We are asked to find P (X > 42.3 g/L)

The Z-score is calculated using the formula:

X −µ
Z=
σ

23
Substitute the values:

42.3 − 41.2 1.1


Z= = ≈ 1.222
0.90 0.90
Using the Z-table, the probability corresponding to Z > 1.222 is approximately:

P (Z > 1.222) ≈ 0.111

Thus, the probability that the concentration is more than 42.3 g/L is approximately
11.1%.
Part (b): p-value associated with the test
For the hypothesis test:

• Null hypothesis H0 : The population mean concentration is µ = 41.2 g/L


• Alternative hypothesis H1 : The population mean concentration is greater than
41.2 g/L (i.e., µ > 41.2 g/L)

We are given:

• Sample size, n = 5
• Population standard deviation, σ = 0.90 g/L
• Sample mean threshold for action, X̄ = 42.3 g/L

The Z-score for the sample mean is calculated using the formula:

X̄ − µ
Z=
√σ
n

Substitute the values:

42.3 − 41.2 1.1


Z= 0.90 = ≈ 2.732

5
0.4025

Using the Z-table, the probability corresponding to Z > 2.732 is approximately:

P (Z > 2.732) ≈ 0.00315


Thus, the p-value is approximately 0.315%.

24
Question 15

Answer:
From the figure, the critical values of the T-distribution are approximately ±2.365.
The shaded areas in both tails represent a total significance level of α = 0.05, or 5%,
with 0.025 in each tail (since this is a two-tailed test).
To find the degrees of freedom (df), we can refer to a T-distribution table or use
statistical software to determine the degrees of freedom that correspond to a critical
value of 2.365 for a two-tailed test with α = 0.05.
The critical value of t = 2.365 corresponds to 9 degrees of freedom.
Thus, the degrees of freedom for this T-distribution are:

Degrees of Freedom = 9

Question 16
A plant is suspected of discharging harmful effluents above the stipulated limit of 200
mg/L into a nearby river. The plant denies this and shows results from sampling of
the river carried out by them.
However, the Court orders an independent testing agency to sample the effluent con-
centrations. The plant lawyer further argues that his client’s results are more accurate
as his sample shows less standard deviation. The Court appoints a neutral expert to
give his recommendation.

(a) State the claims of the Plant and the Neutral Expert.
(b) What conclusions will be drawn by the neutral expert hypothesis testing? State
the hypotheses clearly.

25
(c) What conclusion will be drawn if the Plant tries to confuse the judge by invoking
the ̸= alternate hypothesis?

Detail Plant Neutral Expert


Claim ? ?
Sample size 3 20
Mean concentration (mg/L) 195 205
Sample standard deviation (mg/L) 4 6
points near mixing of random locations near
Sampling location(s)
river with the sea the plant discharge

Answer:
a)

• Plant’s Claim: The plant claims that the effluent concentration is below the
stipulated limit of 200 mg/L. They present a mean concentration of 195 mg/L
based on a sample size of 3 measurements.
• Neutral Expert’s Claim: The neutral expert claims that the effluent concen-
tration exceeds the limit of 200 mg/L, presenting a mean concentration of 205
mg/L based on a sample size of 20 measurements.

b)
To perform hypothesis testing, we define the following hypotheses:

• Null Hypothesis (H0 ): The effluent concentration is within the acceptable


limit, i.e.,
H0 : µ ≤ 200 mg/L
• Alternate Hypothesis (H1 ): The effluent concentration exceeds the acceptable
limit, i.e.,
H1 : µ > 200 mg/L

Given that the neutral expert reports a mean of 205 mg/L from a sample size of 20,
this evidence suggests rejecting the null hypothesis in favor of the alternate hypothesis.
The larger sample size also increases the reliability of the expert’s results compared to
the plant’s sample size of 3.
part c:
If the plant attempts to use a two-tailed alternate hypothesis, i.e.,

H1 : µ ̸= 200 mg/L

the focus shifts to determining whether the effluent concentration is different from 200
mg/L, rather than exceeding it.

26
The plant might argue that since their sample mean of 195 mg/L is also different from
200 mg/L (but less than the limit), the court should not reject their claim. However,
the primary concern in this case is whether the concentration exceeds 200 mg/L (one-
tailed test).
Since the neutral expert’s larger sample shows a mean concentration above the limit,
the correct conclusion is to reject the plant’s claim and accept that the effluent con-
centration is harmful.

Question 17
Fill up the Table 1 given below by stating which probability function you will use to
describe the sample mean distribution X̄. Give the full form of the statistic you will
use with mean and appropriate standard deviation in each case.

Probability Population Probability


Sample
SI. No. Distribution of standard deviation distribution used for
Size
Parent population (σ) known X̄
1 Normal 5 Yes
2 Normal 330 No
3 Normal 5 No
4 Not normal 45 No
5 Not normal 25 No

Answer:

Sl. No. Parent Pop. Sample Size σ Known Dist. for X̄


Dist.
1 Normal 5 Yes Normal Distri-
bution
2 Normal 330 No Normal (CLT)
3 Normal 5 No t-Distribution
4 Not Normal 45 No t-Distribution
5 Not Normal 25 No Insufficient info

Table 1: Information for Question 17

Case 1: Normal Parent Population, Small Sample Size (n = 5), σ Known


Since the parent population is normally distributed and the population standard devi-
ation (σ) is known, the sampling distribution of the sample mean X̄ will also follow a
Normal Distribution, even for a small sample size.

Case 2: Normal Parent Population, Large Sample Size (n = 330), σ Un-


known

27
In this case, the population is normal, but the population standard deviation (σ) is
unknown. However, since the sample size is large (n = 330), by the Central Limit
Theorem, the distribution of the sample mean X̄ can still be approximated as a Nor-
mal Distribution, even without knowledge of σ.

Case 3: Normal Parent Population, Small Sample Size (n = 5), σ Unknown


When the parent population is normal but the sample size is small and the population
standard deviation (σ) is unknown, the sampling distribution of the sample mean X̄
follows a t-Distribution. This is because the t-distribution accounts for the addi-
tional uncertainty introduced by estimating σ from the sample.

Case 4: Non-Normal Parent Population, Moderate Sample Size (n = 45), σ


Unknown
For a non-normal parent population with a moderate sample size (n = 45) and an
unknown population standard deviation (σ), we can still use the t-Distribution to
approximate the sampling distribution of X̄, as the sample size is sufficiently large to
provide a reasonable approximation.

Case 5: Non-Normal Parent Population, Small Sample Size (n = 25), σ


Unknown
In this case, both the parent population is non-normal, the sample size is small
(n = 25), and σ is unknown. Without more information about the distribution, we
cannot confidently use the normal or t-distribution to approximate the distribution of
X̄. In this scenario, more complex non-parametric methods or further data would be
required for analysis.

Question 18
An astronomer measures the distance of a distant star from the Earth. However, due
to atmospheric disturbances, any measurements will not yield the exact distance, d.
As a result, the astronomer has decided to make a series of measurements and then use
their average value as an estimate of the actual distance.If the astronomer believes that
the values of successive measurements are independent random variables with a mean
’d’ and a standard deviation of 2 light-years, how many measurements are needed to
be at least 95% sure that her estimate is accurate to within ±0.5 light-years?

Answer: We can use the formula for the confidence interval for the mean of a normal
distribution:
σ
µ ± Zα/2 √
n
where:

• µ is the true mean (which is d),

28
• σ is the standard deviation (here, σ = 2 light years),
• n is the number of measurements,
• Zα/2 is the critical value for a 95% confidence interval (which is approximately
1.96).

We are given that the margin of error must be within ±0.5 light years. Hence, the
margin of error formula becomes:

σ
Margin of error = Zα/2 √
n
Substitute the known values:

2
0.5 = 1.96 × √
n
Now, solve for n:

0.5 2
=√
1.96 n

√ 2 × 1.96
n= = 7.84
0.5

n = (7.84)2 = 61.47

Since n must be an integer, we round up to n = 62.


Thus, the astronomer needs to make at least 62 measurements to be 95% sure that the
estimate is accurate within ±0.5 light years.

Question 19
Performances of two schools, one in a city and another in a small town, are compared.
The generally held view is that the average performance in the city school is 5% higher
than the town school. Equal samples of size 10 are collected from both schools. The
sample means of city and town schools were 75% and 68% respectively. The sample
standard deviations for the city and town schools were 5% and 8% respectively. Assume
populations of city and town schools are normal.
A) Is the generally held view correct based on the evidence collected?
Based on this problem statement, answer the following:
B) The statistical test used will involve hypothesis testing of means using
(a) Standard normal distribution
(b) T-distribution

29
(c) F-distribution
(d) Chi-square distribution
C) The appropriate hypotheses (null and alternate) are
Null Hypothesis H0 : ,
Alternate Hypothesis H1 :
D) The overall standard deviation used in the test will be
E) The appropriate statistics value will be
F) The degrees of freedom, if applicable, for this test will be
G) The p-value for this test will be

Answer:
A) Is the generally held view correct?
We will perform a hypothesis test to determine if the difference in means is statistically
significant.
B) The statistical test used will involve:
Since the sample sizes are small (n = 10 for each group) and the population standard
deviations are unknown, we use the t-distribution. Thus, the answer is:

(b) T-distribution

C) Hypotheses
The null and alternate hypotheses are as follows:

• Null Hypothesis H0 : The average performance of the city school is not 5%


higher than the town school. Mathematically:

H0 : µcity − µtown = 5

• Alternate Hypothesis H1 : The average performance of the city school is indeed


5% higher than the town school. Mathematically:

H1 : µcity − µtown > 5

D) Standard Deviation
The standard error (SE) of the difference between the two means is calculated as:
s
s2city s2
SE = + town
ncity ntown

Substituting the given values:


r

r
52 82 25 64 √
SE = + = + = 2.5 + 6.4 = 8.9 ≈ 2.983
10 10 10 10

30
Thus, the overall standard deviation used in the test is approximately 2.983.
E) Test Statistic
The test statistic t for the difference in sample means is:

(x̄city − x̄town ) − ∆0
t=
SE
Substituting the given values:

(75 − 68) − 5 7−5 2


t= = = ≈ 0.67
2.983 2.983 2.983
Thus, the test statistic value is approximately 0.67.
F) Degrees of Freedom
The degrees of freedom for a two-sample t-test with unequal variances is calculated as:
 s2 2
city s2town
ncity
+ ntown
df = !2
s2  2
city s2
town
ncity ntown

ncity −1
+ ntown −1

Substituting the given values:


25 64 2

+ (2.5 + 6.4)2 8.92 79.21
df = 25102 1064 2 = 2.52 6.42 = 6.25 40.96 = ≈ 15.11
( 10 ) ( ) + 9 + 9 5.24
9
+ 109 9 9

Thus, the degrees of freedom is approximately 15.


G) P-value
Using a t-distribution table or calculator, we find the p-value for a one-tailed t-test
with t = 0.67 and df = 15. The p-value is approximately 0.26.
Conclusion
Since the p-value (0.26) is greater than the typical significance level (0.05), we fail to
reject the null hypothesis. Therefore, the evidence does not support the claim that the
city school’s average performance is 5% higher than the town school’s.

Question 20
From historical data, the steady-state yields of ammonia from an adiabatic reactor
supplied by XYZ company are normally distributed. This reactor, supplied by the
company, is operated in several plants around the world. The mean yield of ammonia
from a sample of 6 measurements taken at an Indian plant is 27%, and the sample
variance is 9.

31
(a) Can the Indian plant accept this yield to be possible if XYZ company guarantees
an average yield of 30% from its reactors?
(b) If the same yield is obtained from a sample size of 40, can the yield still be
considered acceptable?

Answer: Part (a)


Given data:

• Sample mean, x̄ = 27%


• Claimed population mean, µ = 30%
• Sample variance, s2 = 9 =⇒ s = 3%
• Sample size, n = 6

We perform a t-test for the population mean since the sample size is small and the
population standard deviation is unknown.

H0 : µ = 30%

H1 : µ ̸= 30%

This is a two-tailed test.


The formula for the t-statistic is:
x̄ − µ
t=
√s
n

Substitute the values:


27 − 30 −3 −3
t= = = ≈ −2.45
√3 3
1.2247
6 2.449

df = n − 1 = 6 − 1 = 5

For a two-tailed test at a significance level of α = 0.05 and df = 5, the critical value
from the t-distribution table is approximately tα/2 = ±2.571.
Since the calculated t-value (|t| = 2.45) is less than the critical value (2.571), we fail
to reject the null hypothesis. There is insufficient evidence to conclude that the
yield is significantly different from 30%. Thus, the Indian plant can accept the yield
as possible.
Part (b)
We now check if the yield is still acceptable if the sample size is increased to 40.
Given data:

• Sample size, n = 40

32
• Sample mean, x̄ = 27%
• Claimed population mean, µ = 30%
• Sample standard deviation, s = 3%

The formula for the t-statistic remains the same:


x̄ − µ
t=
√s
n

Substitute the new sample size:


27 − 30 −3 −3
t= = = ≈ −6.32
√3 3
0.4743
40 6.3246

df = 40 − 1 = 39

For a two-tailed test at a significance level of α = 0.05 and df = 39, the critical value
from the t-distribution table is approximately tα/2 = ±2.023.
Since the calculated t-value (|t| = 6.32) is much greater than the critical value (2.023),
we reject the null hypothesis. Thus, with a sample size of 40, the yield of 27% is
significantly different from 30%, and it cannot be considered acceptable.

33

You might also like