CH4020 Assignment 1
CH4020 Assignment 1
A question commonly asked in GTC/DC meetings is whether you have put error bars in
your figures.
1. Represent any data of your choice in the form of histograms and box plots.
4. Explain how error bars may be plotted using different tools? Put up screenshots.
Answer:
Figure 1: Histogram
3
Figure 2: Boxplot
They are typically drawn as vertical or horizontal lines on a plot, extending from a
data point to show the range within which the true value is likely to fall.
4
• Python (Matplotlib): The ‘errorbar‘ function in Matplotlib can be used to add
error bars to a plot(Figure 4).
• Matlab: In MATLAB(Figure 5), we use the errorbar() function to add error bars
to your data plot.
Figure 5: Matlab
Question 3
With respect to the standard normal distribution, find the following [5]:
3. P (Z > 0.2)
Answer:
1. 0.5
5
2. 0.57926
3. 0.42074
4. 0.68269
5. 0.9545
Question 4
In the annual book exhibition held in YMCA grounds, Chennai, over the past several years
during Pongal festival, there is a steady stream of visitors. They visit the various stalls and
exit the exhibition at different times. Depending on the path they take – a few bored ones
may bypass many stalls and exit practically instantaneously, some may follow the straight
path, some may keep circling, unwilling to leave, etc. Suppose the duration of their visit is
described in terms of a random variable T and the associated continuous probability density
function is given by
1 t
f (t) = exp −
τ τ
Here τ is a parameter of the distribution with units of time.
Answer:
1. The random variable T represents time, so it must be non-negative. This means t can
take any non-negative value, starting from 0 and extending to infinity.
At t = 0:
1 0 1
f (0) = exp − = (1)
τ τ τ
As t → ∞:
1 t
lim f (t) = lim exp − =0 (2)
t→∞ t→∞ τ τ
Since the given probability distribution function is a decreasing function the upper
bound is τ1 and the lower bound is 0.
6
2. For f (t) to be a legitimate probability distribution function, it must satisfy the below
two conditions:
Since t ≥ 0
Z ∞ Z ∞
1 t
f (t) dt = exp − dt
0 0 τ τ
Let u = τt , hence du = dt
τ
or dt = τ du.
Substituting into the integral:
Z ∞ Z ∞
1 t
exp − dt = exp(−u) du
0 τ τ 0
Since the integral equals 1, the given function f (t) is a legitimate probability
distribution function.
7
Therefore,
µ=τ
E[T 2 ] = 2τ 2
Question 5
1. The average grade on a mathematics test is 52, with a standard deviation of 5. If the
instructor assigns S’s to the highest 10%, and the grades follow a normal distribution,
what is the lowest grade that will be assigned an ‘S’ ?
8
2. A teacher decides that the top 10% of students should receive S’s and the next 15%
A’s. If the test scores are normally distributed with a mean of 70 and a standard
deviation of 10, find the scores that should be assigned S’s and A’s.
Answer:
1. The lowest grade that will be assigned an ‘S’ is the one corresponding to the 90th
percentile of a normal distribution with a mean of 52 and a standard deviation of
5. The z-score corresponding to the 90th percentile is approximately 1.28. Using the
z-score formula:
X −µ
z=
σ
Substituting the known values:
X − 52
1.28 =
5
Solving for X:
X − 70
1.28 =
10
Solving for X:
X − 70
0.67 =
10
Solving for X:
9
Question 6
A probability density function is given such that:
(
Ax + b, 0 ≤ x ≤ 1
f (x) =
0, elsewhere
19
with the condition P (0.25 < X < 0.5) = 80
. Find the constants A and b, as well as the
mean value.
Answer:
Substituting f (x) = Ax + b:
Z 1
(Ax + b) dx = 1
0
Simplify:
A
+b=1
2
This gives the first equation:
A
+ b = 1 (Equation 1)
2
10
19
2. Step 2: Use the given probability P (0.25 < X < 0.5) = 80
This probability is computed as:
Z 0.5
19
P (0.25 < X < 0.5) = f (x) dx =
0.25 80
Substitute f (x) = Ax + b:
Z 0.5
19
(Ax + b) dx =
0.25 80
Perform the integration:
Z 0.5 0.5
A 2
(Ax + b) dx = x + bx
0.25 2 0.25
Simplify:
A A 19
+ 0.5b − + 0.25b =
8 32 80
Simplify further:
A
+b=1
2
7.5A + 20b = 19
A
b=1−
2
Substitute this into the second equation:
11
A
7.5A + 20 1 − = 19
2
Simplify:
7.5A + 20 − 10A = 19
Solve for A:
A = 0.4
0.4
+b=1
2
0.2 + b = 1
b = 1 − 0.2 = 0.8
A = 0.4, b = 0.8
Distribute x:
Z 1
µ= (0.4x2 + 0.8x) dx
0
12
These integrals are standard:
Z 1 Z 1
2 1 1
x dx = , x dx =
0 3 0 2
Substitute the results:
1 1
µ = 0.4 · + 0.8 ·
3 2
Simplify:
1.6
µ= ≈ 0.533
3
Question 7
A student waits for an electric vehicle in his Institute. He knows that the e-vehicle comes
every 15 minutes, but he doesn’t know when the next one will come. Let’s assume the vehicle
is as likely to come in any one instant as in any other within the next 15 minutes.
1. Is the student’s waiting time a continuous or discrete random variable? What are its
maximum and minimum values?
3. What is the probability that the vehicle will come within the next 15 minutes?
4. If the student has to be in a location within the next 15 minutes and the total travel
time is 10 minutes, what is the probability that the student will make it on time?
Answer:
1. The student’s waiting time is a continuous random variable because it can take any
value within a given interval. Specifically, the waiting time T can be any value between
0 and 15 minutes.
The minimum value of T is 0 minutes (if the vehicle arrives immediately) and the
maximum value is 15 minutes (if the vehicle arrives just before the 15-minute mark).
13
2. Since the vehicle is equally likely to arrive at any moment within the 15-minute window,
the waiting time T follows a uniform distribution.
The probability density function (PDF) for a uniform distribution over the interval
[0, 15] is:
(
1
15
, 0 ≤ t ≤ 15
f (t) =
0, elsewhere
3. The probability that the vehicle will come within the next 15 minutes is 1, because it
is guaranteed to arrive within this time frame.
4. If the student needs to be at a location within the next 15 minutes and the travel
time is 10 minutes, the student must catch the vehicle within the first 5 minutes of the
15-minute window to make it on time.
Let X be the time at which the vehicle arrives. The student will be on time if X ≤ 5.
The probability of this event is:
Z 5
P (X ≤ 5) = f (t) dt
0
Substitute f (t):
Z 5
1 5 1
P (X ≤ 5) = dt = =
0 15 15 3
Question 8
The joint probability density function of random variables X and Y is described by
(
e−x e−y , for x > 0 and y > 0
fXY (x, y) =
0, otherwise
Is this a legitimate function? Find the probability {1 < X < 2 and 0 < Y < 2}.
Answer:
14
Substitute fXY (x, y) = e−x e−y :
Z ∞ Z ∞
e−x e−y dx dy
0 0
Z ∞ ∞
e−y dy = −e−y 0 = 1
0
Thus,
Z ∞ Z ∞
e−x e−y dx dy = 1 × 1 = 1
0 0
Since the total integral is 1, fXY (x, y) is a legitimate joint probability density function.
Distribute e−x :
Z 2
−2
e−x dx
1−e
1
15
Evaluate the remaining integral:
Z 2 2
e−x dx = −e−x 1 = e−1 − e−2
1
Combine everything:
Simplify:
Question 9
According to Chebyshev’s theorem, the probability that any random variable X will assume
a value within k standard deviations of the mean is at least 1 − k12 , i.e.,
1
P (µ − kσ < X < µ + kσ) ≥ 1 − .
k2
For the random variable following the normal distribution, is Chebyshev’s theorem valid?
Choose for convenience, k = 2.
Answer:
1. Chebyshev’s Theorem:
Chebyshev’s theorem states that for any random variable X with mean µ and standard
deviation σ, the probability that X lies within k standard deviations of the mean is
at least 1 − k12 . This theorem applies to all types of distributions, not just normal
distributions.
Therefore:
16
P (µ − 2σ < X < µ + 2σ) = 0.9772 − 0.0228 = 0.9544
1 1
P (µ − 2σ < X < µ + 2σ) ≥ 1 − 2
= 1 − = 0.75
2 4
Thus, Chebyshev’s theorem provides a lower bound of 0.75 for the probability, while
the actual probability for a normal distribution is approximately 0.9544.
3. Conclusion:
Yes, Chebyshev’s theorem is valid for any distribution, including the normal distri-
bution. However, the theorem provides a more general bound that is not necessarily
tight for the normal distribution. For a normal distribution, the actual probability
within k = 2 standard deviations of the mean is higher than the bound provided by
Chebyshev’s theorem.
Question 10
Let the random variables X1 and X2 denote the length and width, respectively, of a man-
ufactured part. Assume that X1 is normal with E(X1 ) = 2 cm and standard deviation 0.1
cm, and that X2 is normal with E(X2 ) = 5 cm and standard deviation 0.2 cm. Also, assume
that X1 and X2 are independent. Determine the probability that the perimeter exceeds 14.5
cm.
Answer:
The perimeter P of the part is given by:
P = 2(X1 + X2 )
We want to find the probability that the perimeter exceeds 14.5 cm:
17
√
σX1 +X2 = 0.05 ≈ 0.2236
So, X1 + X2 is normally distributed with mean 7 and standard deviation approximately
0.2236.
To find P (X1 + X2 > 7.25), we standardize this:
(X1 + X2 ) − 7
Z=
σX1 +X2
We need:
7.25 − 7
P (X1 + X2 > 7.25) = P Z >
0.2236
Calculate the Z-score:
7.25 − 7
Z= ≈ 1.119
0.2236
Now, use the standard normal distribution table or a calculator to find:
Φ(1.119) ≈ 0.8686
Thus:
Question 11
Let X and Y be independent, normal random variables with E(X) = 2, Var(X) = 5,
E(Y ) = 6, and Var(Y ) = 8. Determine the following:
1. E(3X + 2Y )
2. Var(3X + 2Y )
Answer:
Let Z = 3X + 2Y .
18
1. Expected value:
The expected value of Z is:
E(Z) = 3 · 2 + 2 · 6 = 6 + 12 = 18
So:
E(3X + 2Y ) = 18
2. Variance:
The variance of Z is:
Var(Z) = 9 · 5 + 4 · 8 = 45 + 32 = 77
So:
Var(3X + 2Y ) = 77
19
4. Probability P (3X + 2Y < 28):
Standardize this to find:
3X + 2Y − 18 28 − 18 10
P (3X + 2Y < 28) = P √ < √ =P Z<√
77 77 77
Calculate the Z-score:
10
√ ≈ 1.141
77
Using the standard normal CDF:
So:
Question 12
A tobacco company claims that the amount of nicotine in cigarettes is a random
variable with mean µ = 2.2 mg and standard deviation σ = 0.3 mg. However, the
sample mean nicotine content of 100 randomly chosen cigarettes was x̄ = 3.1 mg.
What is the approximate probability that the sample mean would have been as high
or higher than 3.1 mg if the company’s claim was true?
Answer:
To determine the probability that the sample mean would have been as high or higher
than 3.1 mg, we can use the Central Limit Theorem (CLT). The CLT states that the
distribution of the sample mean will be approximately normal if the sample size is
sufficiently large.
Given:
20
The standard error of the mean is given by:
σ
SEM = √
n
Substituting the values:
0.3 0.3
SEM = √ = = 0.03
100 10
The Z-score for the sample mean can be calculated using the formula:
X̄ − µ
Z=
SEM
Substituting the values:
Question 13
In a scholarship programme of an Institute for the fourth year, students studying with
CGPA of over 8 receive a scholarship of Rs. 15,000. Students with CGPA between 7
and 8 receive Rs. 10,000. Students with CGPA between 6 and 7 receive a scholarship
of Rs. 5,000. The fourth-year programme of this Institute has 500 students, and their
grades are normally distributed with mean µ = 5.2 and standard deviation σ = 1.2.
What is the total cost to the Institute for providing these scholarships?
Answer:
To calculate the total cost to the Institute for providing scholarships, we need to
determine the proportion of students in each CGPA range and then compute the
corresponding total scholarship amount.
Given:
21
• Scholarship amounts:
– CGPA > 8: Rs. 15,000
– 7 < CGPA ≤ 8: Rs. 10,000
– 6 < CGPA ≤ 7: Rs. 5,000
To find the proportion of students in each CGPA range, we first standardize the CGPA
values using the Z-score formula:
X −µ
Z=
σ
• For CGPA > 8:
8 − 5.2 2.8
Z= = ≈ 2.33
1.2 1.2
• For CGPA = 7:
7 − 5.2 1.8
Z= = ≈ 1.5
1.2 1.2
• For CGPA = 6:
6 − 5.2 0.8
Z= = ≈ 0.67
1.2 1.2
Using the standard normal distribution table:
22
• For CGPA > 8: 5 × 15000 = 75000 Rs.
• For CGPA 7 < CGPA ≤ 8: 28 × 10000 = 280000 Rs.
• For CGPA 6 < CGPA ≤ 7: 92 × 5000 = 460000 Rs.
Total Cost:
The total cost to the Institute is:
So, the total cost to the Institute for providing these scholarships is Rs. 815,000.
Question 14
Concentrations of a toxic agent are measured at a plant exit pipe. Assume the con-
centrations are normally distributed. From extensive plant data maintained over 40
years, the population mean may be taken as 41.2 g/L and standard deviation is 0.90
g/L.
(a) What is the probability that the concentration in this effluent will be more than
42.3 g/L?
(b) There is a change in the process which theoretically should not affect the popu-
lation mean. Assume that population standard deviation is unaltered. To check
that there is no change in population mean, five samples are taken at the outlet
and if the sample mean is more than 42.3 g/L then corrective action must be
taken.
i. What is the p-value expressed as a percentage associated with this test if
corrective action has to be taken?
ii. State the null and alternate hypotheses.
Answer: part (a): Probability that the concentration is more than 42.3 g/L
Given:
X −µ
Z=
σ
23
Substitute the values:
Thus, the probability that the concentration is more than 42.3 g/L is approximately
11.1%.
Part (b): p-value associated with the test
For the hypothesis test:
We are given:
• Sample size, n = 5
• Population standard deviation, σ = 0.90 g/L
• Sample mean threshold for action, X̄ = 42.3 g/L
The Z-score for the sample mean is calculated using the formula:
X̄ − µ
Z=
√σ
n
24
Question 15
Answer:
From the figure, the critical values of the T-distribution are approximately ±2.365.
The shaded areas in both tails represent a total significance level of α = 0.05, or 5%,
with 0.025 in each tail (since this is a two-tailed test).
To find the degrees of freedom (df), we can refer to a T-distribution table or use
statistical software to determine the degrees of freedom that correspond to a critical
value of 2.365 for a two-tailed test with α = 0.05.
The critical value of t = 2.365 corresponds to 9 degrees of freedom.
Thus, the degrees of freedom for this T-distribution are:
Degrees of Freedom = 9
Question 16
A plant is suspected of discharging harmful effluents above the stipulated limit of 200
mg/L into a nearby river. The plant denies this and shows results from sampling of
the river carried out by them.
However, the Court orders an independent testing agency to sample the effluent con-
centrations. The plant lawyer further argues that his client’s results are more accurate
as his sample shows less standard deviation. The Court appoints a neutral expert to
give his recommendation.
(a) State the claims of the Plant and the Neutral Expert.
(b) What conclusions will be drawn by the neutral expert hypothesis testing? State
the hypotheses clearly.
25
(c) What conclusion will be drawn if the Plant tries to confuse the judge by invoking
the ̸= alternate hypothesis?
Answer:
a)
• Plant’s Claim: The plant claims that the effluent concentration is below the
stipulated limit of 200 mg/L. They present a mean concentration of 195 mg/L
based on a sample size of 3 measurements.
• Neutral Expert’s Claim: The neutral expert claims that the effluent concen-
tration exceeds the limit of 200 mg/L, presenting a mean concentration of 205
mg/L based on a sample size of 20 measurements.
b)
To perform hypothesis testing, we define the following hypotheses:
Given that the neutral expert reports a mean of 205 mg/L from a sample size of 20,
this evidence suggests rejecting the null hypothesis in favor of the alternate hypothesis.
The larger sample size also increases the reliability of the expert’s results compared to
the plant’s sample size of 3.
part c:
If the plant attempts to use a two-tailed alternate hypothesis, i.e.,
H1 : µ ̸= 200 mg/L
the focus shifts to determining whether the effluent concentration is different from 200
mg/L, rather than exceeding it.
26
The plant might argue that since their sample mean of 195 mg/L is also different from
200 mg/L (but less than the limit), the court should not reject their claim. However,
the primary concern in this case is whether the concentration exceeds 200 mg/L (one-
tailed test).
Since the neutral expert’s larger sample shows a mean concentration above the limit,
the correct conclusion is to reject the plant’s claim and accept that the effluent con-
centration is harmful.
Question 17
Fill up the Table 1 given below by stating which probability function you will use to
describe the sample mean distribution X̄. Give the full form of the statistic you will
use with mean and appropriate standard deviation in each case.
Answer:
27
In this case, the population is normal, but the population standard deviation (σ) is
unknown. However, since the sample size is large (n = 330), by the Central Limit
Theorem, the distribution of the sample mean X̄ can still be approximated as a Nor-
mal Distribution, even without knowledge of σ.
Question 18
An astronomer measures the distance of a distant star from the Earth. However, due
to atmospheric disturbances, any measurements will not yield the exact distance, d.
As a result, the astronomer has decided to make a series of measurements and then use
their average value as an estimate of the actual distance.If the astronomer believes that
the values of successive measurements are independent random variables with a mean
’d’ and a standard deviation of 2 light-years, how many measurements are needed to
be at least 95% sure that her estimate is accurate to within ±0.5 light-years?
Answer: We can use the formula for the confidence interval for the mean of a normal
distribution:
σ
µ ± Zα/2 √
n
where:
28
• σ is the standard deviation (here, σ = 2 light years),
• n is the number of measurements,
• Zα/2 is the critical value for a 95% confidence interval (which is approximately
1.96).
We are given that the margin of error must be within ±0.5 light years. Hence, the
margin of error formula becomes:
σ
Margin of error = Zα/2 √
n
Substitute the known values:
2
0.5 = 1.96 × √
n
Now, solve for n:
0.5 2
=√
1.96 n
√ 2 × 1.96
n= = 7.84
0.5
n = (7.84)2 = 61.47
Question 19
Performances of two schools, one in a city and another in a small town, are compared.
The generally held view is that the average performance in the city school is 5% higher
than the town school. Equal samples of size 10 are collected from both schools. The
sample means of city and town schools were 75% and 68% respectively. The sample
standard deviations for the city and town schools were 5% and 8% respectively. Assume
populations of city and town schools are normal.
A) Is the generally held view correct based on the evidence collected?
Based on this problem statement, answer the following:
B) The statistical test used will involve hypothesis testing of means using
(a) Standard normal distribution
(b) T-distribution
29
(c) F-distribution
(d) Chi-square distribution
C) The appropriate hypotheses (null and alternate) are
Null Hypothesis H0 : ,
Alternate Hypothesis H1 :
D) The overall standard deviation used in the test will be
E) The appropriate statistics value will be
F) The degrees of freedom, if applicable, for this test will be
G) The p-value for this test will be
Answer:
A) Is the generally held view correct?
We will perform a hypothesis test to determine if the difference in means is statistically
significant.
B) The statistical test used will involve:
Since the sample sizes are small (n = 10 for each group) and the population standard
deviations are unknown, we use the t-distribution. Thus, the answer is:
(b) T-distribution
C) Hypotheses
The null and alternate hypotheses are as follows:
H0 : µcity − µtown = 5
D) Standard Deviation
The standard error (SE) of the difference between the two means is calculated as:
s
s2city s2
SE = + town
ncity ntown
30
Thus, the overall standard deviation used in the test is approximately 2.983.
E) Test Statistic
The test statistic t for the difference in sample means is:
(x̄city − x̄town ) − ∆0
t=
SE
Substituting the given values:
ncity −1
+ ntown −1
Question 20
From historical data, the steady-state yields of ammonia from an adiabatic reactor
supplied by XYZ company are normally distributed. This reactor, supplied by the
company, is operated in several plants around the world. The mean yield of ammonia
from a sample of 6 measurements taken at an Indian plant is 27%, and the sample
variance is 9.
31
(a) Can the Indian plant accept this yield to be possible if XYZ company guarantees
an average yield of 30% from its reactors?
(b) If the same yield is obtained from a sample size of 40, can the yield still be
considered acceptable?
We perform a t-test for the population mean since the sample size is small and the
population standard deviation is unknown.
H0 : µ = 30%
H1 : µ ̸= 30%
df = n − 1 = 6 − 1 = 5
For a two-tailed test at a significance level of α = 0.05 and df = 5, the critical value
from the t-distribution table is approximately tα/2 = ±2.571.
Since the calculated t-value (|t| = 2.45) is less than the critical value (2.571), we fail
to reject the null hypothesis. There is insufficient evidence to conclude that the
yield is significantly different from 30%. Thus, the Indian plant can accept the yield
as possible.
Part (b)
We now check if the yield is still acceptable if the sample size is increased to 40.
Given data:
• Sample size, n = 40
32
• Sample mean, x̄ = 27%
• Claimed population mean, µ = 30%
• Sample standard deviation, s = 3%
df = 40 − 1 = 39
For a two-tailed test at a significance level of α = 0.05 and df = 39, the critical value
from the t-distribution table is approximately tα/2 = ±2.023.
Since the calculated t-value (|t| = 6.32) is much greater than the critical value (2.023),
we reject the null hypothesis. Thus, with a sample size of 40, the yield of 27% is
significantly different from 30%, and it cannot be considered acceptable.
33