Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views9 pages

Solution

The document contains solutions to a mock test covering various statistical concepts, including data analysis processes, probability distributions, and hypothesis testing. It discusses the application of Poisson and binomial distributions, maximum likelihood estimation, and the interpretation of regression analysis results. Each solution is structured with calculations and explanations to support the findings.

Uploaded by

parviarora06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views9 pages

Solution

The document contains solutions to a mock test covering various statistical concepts, including data analysis processes, probability distributions, and hypothesis testing. It discusses the application of Poisson and binomial distributions, maximum likelihood estimation, and the interpretation of regression analysis results. Each solution is structured with calculations and explanations to support the findings.

Uploaded by

parviarora06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Mock Test-1

Solution 1:
i)
1. Developing objectives to be met by the results of the data analysis
2. Identifying the data items required for the analysis
3. Collecting the data
4. Processing and formatting the data
5. Cleaning the data
6. Exploratory data analysis
7. Modelling
8. Communicating the results
9. Monitoring the process (updating the data & repeating the process if required)
[4]

ii) B [1]

Explanation:
In classical statistics, μ is a fixed quantity, and therefore cannot have a probability distribution
associated with it.

However, in Bayesian statistics, μ is a random variable, and therefore statements about its probability
can be made

iii) A – III
B – IV
C – II
D–I
[2]
[7 Marks]
Solution 2:

i) The number of claims incurred by each policyholder follows the poisson distribution with mean
0.03. Therefore X, the number of claims for the 100 policyholders follows the Poi(3), X ~ Poi(3).

Since the poisson distribution only takes integer value P(X<6) = P(X<=5)
Using the poisson cumulative probability tables gives 0.91608
[2]

ii) Counting the numbers of trials up to and including the 4th success. This describes the variable (X)
is Type 1 negative binomial distribution with k= 4 and p = 0.4

𝑥−1
P(X=x) = ( ) 0.44 0.6𝑥−4 x = 4,5,6,….
3

So P(X <7) = P(X=4) + P(X=5) + P(X=6)

3
P(X=4) = ( ) 0.44 = 0.0256
3
𝑥−1
Now using the iterative formula P(X=x) = 𝑥−4q P(X=x-1)

4
P(X=5) = 1 ×0.6 × 0.0256 = 0.06144
5
P(X=6) = 2 ×0.6 × 0.06144 = 0.09216

Hence, P(X <7) = 0.0256 + 0.06144 + 0.09216 = 0.1792


[2]
iii) Here the variable(X) is binomial distribution with n = 1000 and p = 0.015
Since n is large and p is small, hence poisson approximation can be used

Page 2 of 9
Bin(1000,0.015) ~ Poi(15) (approximately)

Using the cumulative Poisson table gives

P(X <10) = P(X <=9) = 0.06985 [2]


[6 Marks]

Solution 3:

i) Correct Answer (B) [2]


1 1 ∞
E(𝑒 𝑡𝑋 ) = 𝜇 (𝜇 − 𝑡)−1 ∫0 𝑒 −𝑧 . dz
E(𝑒 𝑡𝑋 ) = (1 − 𝑡µ)−1

ii) Total time Y of time periods of N policies will be Y = 𝑋1 + 𝑋2 + ……. +𝑋𝑁

MGF of Y is given by E(𝑒 𝑡𝑌 ) = E(𝑒 𝑡 ∑ 𝑋𝑖 ) = П1𝑁 𝐸(𝑒 𝑡𝑋 )

𝑀𝑌 (𝑡) = (1 − 𝑡µ)−𝑁
[3]

iii) The 𝑀𝑌 (𝑡) is of the form of MGF for Gamma distribution


1
Thus, the distribution is Gamma(N, 𝜇) [1]
[6 Marks]

Solution 4: Let X be the amount of fixed benefit health insurance claims and Y the amount of indemnity
based health insurance claim.

Then:
X~ N(900, 1002) and Y ~ N(1400, 3002)

We require

P((Y1+Y2 + Y3 ) > (X1+X2 + X3+ X4) + 900)


= P((Y1+Y2 + Y3 ) - (X1+X2 + X3+ X4) >900)

So we need the distribution of (Y1+Y2 + Y3) - (X1+X2 + X3 + X4):

(Y1+Y2 + Y3) - (X1+X2 + X3 + X4) ~ N( 3×1400 – 4×900, 3×3002+4×1002)

i.e (Y1+Y2 + Y3) - (X1+X2 + X3 + X4) ~ N(600,310000)

Therefore

P((Y1+Y2 + Y3) - (X1+X2 + X3 + X4)>900)

900−600
= P( Z > ) = P( Z > 0.54) = 1 – P( Z < 0.54) = 1- 0.70540 = 0.2946
√310000
[4 Marks]

Solution 5:
i) A group of random variables is said to be independent and identically distributed if the variables
are independent of each other and follow the same probability distribution
[1]
ii)

Page 3 of 9
a) As there are 5 coin tosses and the probability of each coin toss being either heads or tails is
0.5, the probability of this exact outcome is 0.5^5 = 0.03125 [1]

b) p-value is the probability of an observation at least as “extreme” as the actual observation.


Under the null hypothesis, the expected number of heads is 2.5, while the actual number of
heads is 4 (> 2.5). Thus, we need to calculate the probability of 4 or 5 heads.
Let the number of heads be X. Then X ~ Bin (5, 0.5)
Prob (X >= 4) = 1 – Prob (X <= 3) = 1 – 0.8125 (from tables) = 0.19 approx
As the p-value is 0.19 > 0.05, the null hypothesis cannot be rejected at 5% significance level
[3]
iii)
a) Likelihood can be calculated as:
𝑛

𝐿(𝜃) = ∏ 𝑓(𝑥𝑖 ; 𝜃)
𝑖=1
which yields
𝐿(𝜃) = 𝜃 4 (1 − 𝜃)
[1]

b) C [3]
Explanation:
Differentiating the log likelihood,

𝜕 𝜕 4 1
log 𝐿(𝜃) = [4 log 𝜃 + log(1 − 𝜃)] = −
𝜕𝜃 𝜕𝜃 𝜃 1−𝜃

Equating to 0,
4 1 4
− = 0 ⇒ 𝜃 = 4 − 4𝜃 ⇒ 𝜃 = = 0.8
𝜃 1−𝜃 5

Checking for maximum:


𝜕2 𝜕 4 1 4 1
2
log 𝐿(𝜃) = [ + ]= − 2−
𝜕𝜃 𝜕𝜃 𝜃 1 − 𝜃 𝜃 (1 − 𝜃)2

Substituting θ = 0.8, this works out to -31.25, which is negative. Thus, θ = 0.8 represents the
maximum.
The MLE of θ is therefore 0.8.

c) Prior expected value of θ = 0.2*(0.1 + 0.3 + 0.5 + 0.7 + 0.9) = 0.5 [1]

d) As the prior distribution is uniform across nearly the entire possible range of θ (0 to 1), it
indicates that we have no knowledge (or very little knowledge) about the value of θ.
[1]

e) C [2]
Explanation:
The posterior expected value would lie somewhere between the prior expected value and the
MLE (observed test statistic). Here, the prior EV is 0.5 and the MLE is 0.8.
Thus, the posterior EV would lie between 0.5 and 0.8 – i.e., it would be greater than 0.5.

f) C [2]
Explanation:
The posterior EV would lie between the prior EV and the MLE.
The prior EV is 0.5.
The MLE is simply the proportion of coin tosses that result in “heads” – in this case, also 0.5 (8
tosses, 4 heads, 4 tails).
Since both the prior EV and the MLE are 0.5, the posterior EV must also be exactly 0.5.

Page 4 of 9
g) B [1]

h) D [1]

iv)
a) No – For the chi-squared test, values less than 5 for any expected value are generally not
considered. If we try to form a contingency table (as is done in the next sub-question) based on 8
coin tosses, the expected value in each cell would be less than 5
[1]

b) Number of degrees of freedom = (rows – 1 ) * (columns – 1) = 1 * 1 = 1


Expected values in each cell would be 5 [= row total * column total / table total]

Thus, the squared difference of the actual value in each cell with the expected value is (observed
value – 5)^2, i.e. 4, 4, 4, 4.
The χ2 statistic is therefore 4 * 4/5 = 3.2
For 1 df, the 5% value of χ2 is 3.841, which is higher than the figure of 3.2 calculated above.

Thus, there is insufficient evidence to reject the null hypothesis (i.e., that each coin toss is
independent of the preceding toss) at the 5% level.
[3]
[21 Marks]
Solution 6:
i) The marginal density is
∞ ∞
fY(y) = 3∫0 𝑒 −𝑥 𝑒 −3𝑦 𝑑𝑥 = 3𝑒 −3𝑦 ∫0 𝑒 −𝑥 𝑑𝑥 = 3𝑒 −3𝑦 [1]

ii) The conditional probability P( Y ≤ y | Y> 4) is FY(y)


P( Y ≤ y ,Y> 4) P( 4<𝑌 ≤ 𝑦) P( 4<𝑌 ≤ 𝑦) FY(y)−FY(4)
P( Y ≤ y | Y> 4) = = = = ,y > 4
P( Y> 4) P( Y> 4) P( Y> 4) P( Y> 4)
Therefore

fY(y) 3𝑒 −3𝑦
f(y | Y>4) = = = 3𝑒 12−3𝑦 , y>4 [2]
P( Y> 4) 𝑒 −12

iii) The correct option is (C) [2]


The conditional expectation is given as
∞ ∞
E[Y | Y>4] = ∫4 𝑦f(y | Y > 4)𝑑𝑦 = ∫4 3y𝑒 12−3𝑦 𝑑𝑦

By taking t = y-4,
∞ ∞ ∞
E[Y | Y>4] = ∫0 3(t + 4)𝑒 −3𝑡 𝑑𝑡 = ∫0 3t𝑒 −3𝑡 𝑑𝑡 + ∫0 12𝑒 −3𝑡 𝑑𝑡
[5 Marks]

Solution 7:

i) The graph appears to show an approximately linear relationship. However, it does appear to have a
slight curve and this would warrant closer inspection of the model to see if it is appropriate to the
data. [1]

ii) Least squares estimates:


Obtaining the estimates of α and β with Y = ln µX

390
SXX = ∑ 𝑋 2 - n𝑋̅ 2 = 15540 – 10( 10 )^2 = 330

Page 5 of 9
390 −70.47
SXY = ∑ 𝑋𝑌 - n𝑋̅𝑌̅ = -2726.66 – 10( 10 )( 10
) = 21.67

SXY 21.67
β̂ = SXX = 330 = 0.0657

−70.47 390
α = 𝑌̅ - β̂𝑋̅ = ( 10 ) – 0.0657× 10 = -9.61
̂
Therefore, we obtain
B = 𝑒 𝛼 = 0.000067
C = 𝑒 𝛽 = 1.07
[3]

𝑆𝑋𝑌
iii) r = = =.990645
√𝑆𝑋𝑋 𝑆𝑌𝑌

The correlation coefficient shows a strong positive relationship between the variables force of
mortality and age. The positive value of the regression slope parameter β̂ also suggest the positive
correlation between the variables.
[2]
iv) The coefficient of determination is given by
2
𝑆𝑋𝑌 21.672
𝑅2 = 𝑆 = 330×1.45 = 98.14%
𝑋𝑋 𝑆𝑌𝑌

Where SYY= ∑ 𝑌 2 - n𝑌̅ 2 = 1.45

This says that 98.14% of the variation in the data can be explained by the model and hence indicates
an extremely good fit of the model
[2]

v) The completed table of residuals using 𝑒̂𝑖 = yi - 𝑦̂𝑖 is:


Age,X 30 32 34 36 38 40 42 44 46 48
Residual, 𝑒̂𝑖 0.079 0.028 -0.004 -0.045 -0.087 -0.058 0.001 0.009 0.048 0.036

Age 34: -7.38 – (-9.61 + 0.0657×34) = -0.004


Age 42: -6.85 – (-9.61 + 0.0657×42) = 0.001
Age 48: -6.42 – (-9.61 + 0.0657×48) = -0.036

The residuals should be pattern less when plotted against X, however it is clear to see that some pattern
exists – this indicates that the linear model may not be a good fit. [3]

vi) The variance of mean predicted response is:

1 (𝑋0 −𝑋̅)2 1 (45−39)2


{𝑛 + 𝑆𝑋𝑋
} 𝜎̂ 2 = {10 + 330
}× 0.0034 = 0.00071

1 21.672
Where 𝜎̂ 2 = 8(1.45 - 330
) = 0.0034

The estimate is Y = ln µ45 = -9.61+0.067×45 = -6.65

Using the t8 distributions , a 95% confidence interval for Y = ln µ45 is

-6.65 ± 2.306√0.00071 = (-6.71, -6.59)

The corresponding 95% confidence interval for µ45 is (0.001219, 0.001374)


[4]
vii) The width of the interval is only affected by the variance of the mean predicted response. Which
depends on the value of (𝑋0 − 𝑋̅)2 . This term will now be smaller as the new 𝑋0 = 41 value is closer
to 𝑋̅than 𝑋0 = 45. Therefore the interval will be narrower. [2]
Page 6 of 9
[17 Marks]
Solution 8:
i)
Period 1:
E(X) = (10+30)/2 = 20
SXX = (10-20)^2 + (30-20)^2 = (-10)^2 + (10)^2 = 200

E(Y) = (10+20)/2 = 15
SYY = (10-15)^2 + (20-15)^2 = (-5)^2 + 5^2 = 50

SXY = (-10 * -5) + (10 * 5) = 100

Correlation = SXY / √ (SXX * SXY) = 100 / √ (200 * 50) = 1

Period 2:
E(X) = (10+30)/2 = 20
SXX = 200, as above

E(Y) = (10+15)/2 = 12.5


SYY = (10-12.5)^2 + (15-12.5)^2 = (-2.5)^2 + 2.5^2 = 12.5

SXY = (-10 * -2.5) + (10 * 2.5) = 50


Correlation = 50 / √ (200 * 12.5) = 1
[5]

ii) In the period 1 base, the figures of X & Y are: (10, 10); (30, 20); (20, 4); (60, 6).

E(X) = (10 + 30 + 20 + 60) / 4 = 30


SXX = (-20)^2 + 0^ 2 + (-10)^2 + 30^2 = 1400

E(Y) = (10 + 20 + 4 + 6)/4 = 10


SYY = 0^2 + 10^2 + (-6)^2 + (-4)^2 = 152

SXY = (-20 * 0) + (0 * 10) + (-10 * -6) + (30 * -4) = -60


Correlation = -60 / √(1400*152) = -0.13
[4]

iii) In part (a), the correlation between X and Y was calculated separately for each period, and they
appeared to be perfectly positively correlated.
However, on combining the periods in part (b), X and Y turn out to be (weakly) negatively correlated.
Thus, while the friend’s assumption of strong positive correlation may be valid for some periods,
overall, the correlation between the two indices / industries X & Y appears to be very weak and
negative. As such, the portfolio is diversified.
[2]
[11 Marks]

Solution 9:

i) Sample mean = (16.4 + 17.3 + 16.7) / 3 = 16.8


Prior mean = A/B = 15/1 = 15 (formula from Tables)
Credibility factor Z = 3/(3+1) = 0.75
Credibility estimate = Z * 16.8 + (1-Z) * 15 = 16.35
[3]

ii) The variance of the gamma distribution is mean / B. Reducing the B parameter while
keeping the mean constant increases the variance, reflecting greater uncertainty. [1]

Page 7 of 9
iii) Revised credibility factor Z = 3/3.2 = 0.94 (approx.)
Revised credibility estimate = 0.94 * 16.8 + 0.06* 15 = 16.69 [1]
[5 Marks]

Solution 10:

i) B
Explanation:
Likelihood is the probability of the exact outcome observed. As x policies have resulted
in a claim and 1-x have not, and the policies are independent, the probability is given by
the product: 𝑞 𝑥 (1 − 𝑞)𝑛−𝑥

The 𝑛𝐶𝑥 factor is not relevant here as for each policy, we know whether there has been
a claim or not. [1]

ii) Let X be the number of claims. Then X ~ Bin (n, q), with mean nq and variance nq(1-q).

By central limit theorem, 𝑞̂ = X/n approximately follows 𝑁(𝜇, 𝑆), where:

𝜇 = x/n = 3 / 10,000 = 0.3 per mille = 3 * 10^-4

S = 𝑞̂ ∗ (1 − 𝑞̂)
= 3 * 10^-4 * 0.9997 = 3 * 10^-4 (approx.) = 0.3 per mille

𝑆 3∗10−4
1.96 ∗ √ = 1.96 ∗ √ = 0.34 per mille (approx.)
𝑛 10,000
𝑆
The 95% confidence interval is 𝑞̂ ± 1.96 ∗ √𝑛
Plugging in the values calculated above, and noting that 𝑞̂ can’t be negative,
95% confidence interval: (0 per mille, 0.64 per mille) [4]

iii) As 0.2 per mille falls within the 95% confidence interval, there is insufficient evidence to
reject the null hypothesis q at p = 5%. [1]
[6 Marks]

Solution 11:

i) The PF of Z is

f(z) = (𝑛𝑧)𝜇 𝑧 (1 − 𝜇)(𝑛−𝑧)


The PF function of Y can be obtained by replacing z with ny :

𝑛
f(y) = (𝑛𝑦 ) 𝜇 𝑛𝑦 (1 − 𝜇)(𝑛−𝑛𝑦)
This can be written as :

𝑛
f(y) = exp{ln (𝑛𝑦 ) + ny lnμ + n ln(1 − μ) − ny ln(1 − μ)}
𝜇 𝑛
= exp{𝑛𝑦 ln (1−𝜇) + nln(1 − μ) + ln (𝑛𝑦 )}
𝜇
𝑦 ln( )+ ln(1−μ)
1−𝜇 𝑛
=exp{ 1/𝑛
+ ln (𝑛𝑦 )}
Comparing this to the generalized form of exponential family of distributions:

𝜇 𝑒𝜃
𝜃 = ln (1−𝜇) . Rearranging this gives µ = 1+𝑒 𝜃

Page 8 of 9
𝑒𝜃 1
b(𝜃) = - ln(1 − μ) = - ln( 1-1+𝑒 𝜃 ) = - ln(1+𝑒 𝜃) = ln (1 + 𝑒 𝜃 )
𝜑=𝑛,
1
a(𝜑) = 𝜑
𝑛 𝜑
c(y, 𝜑) = ln (𝑛𝑦 ) = ln (𝜑𝑦 )]
[4]

ii) Using the properties of exponential distributions


𝑑 𝑒𝜃
E(Y) = 𝑏 ∕(𝜃) = 𝑑𝜃(ln (1 + 𝑒 𝜃 )) = 1+𝑒 𝜃

𝑒 𝜃 (1+ 𝑒 𝜃 )− 𝑒 𝜃 𝑒 𝜃 𝑒𝜃
V(Y) = a(𝜑) 𝑏 ∕∕ (𝜃) = 𝑛(1+𝑒 𝜃 )2
= 𝑛(1+𝑒 𝜃 )2

𝜇
Substituting 𝜃 = ln (1−𝜇)

(
𝜇
1−𝜇 𝜇
V(Y) = 𝜇 2 = 𝑛(1−𝜇)
(1 − 𝜇)2 = 𝜇(1 − 𝜇)/n
𝑛(1+ )
1−𝜇
[3]

iii) Using the model output, we can see that

𝛽1 > 2 × 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟(𝛽)

i.e 0.5459 > 2 × 0.08352 = 0.16704

Since
𝛽1 > 2 × 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟(𝛽) , it can be concluded that the parameter 𝛽1 for the variable “no. of
assignment” is significant in the model.
[2]

iv) Using binomial canonical link function,


𝜇
𝜂(𝜇) = ln (1−𝜇) = 𝛼𝑖 + 𝛽1 𝑁 + 𝛽2 S

So for 𝛼𝑌 = - 1.501 , 𝛽1 = 0.5459, 𝛽2 = 0.0251 and N = 4, S = 65

𝜇
ln (1−𝜇) = - 1.501 + 0.5459 × 4 + 0.0251 × 65 = 2.3141

𝜇 = ( 1 + 𝑒 −2.3141 )−1 = 91%

Hence probability of passing students in the given scenario is 91%


[3]
[12 Marks]

*******************

Page 9 of 9

You might also like