Sample final exam questions.
Important: When a question asks for a numerical expression, your answer should be an
expression involving only numbers and operations on numbers. You do not have to simplify such an expression but variables should not appear. For example, this is a numerical
expression:
27.85 + 67.3/9
891 +
147.21 + 93.5
and this is not
v
u
u
7.85 + 67.3/9
891x + t
.
b 147.21 + 93.5/
Problems (1) - (19) concern a random sample from the Gamma distribution, where one of
the Gamma parameters is known. As a reminder, the Gamma distribution has pdf of the
form
{
x1 ex for x > 0
()
f (x|, ) =
0
otherwise
This distribution has moments 1 = / and 2 = ( + 1)/2 .
Assume we have iid observations X1 , . . . , Xn whose distribution is Gamma with known to
be 2, and with unknown, so that the pdf of the form
{
f (x|) =
2 xex for x > 0
0
otherwise
where > 0 is unknown. It will help to know that E[Xi ] = 2/ and E[Xi2 ] = 6/2 .
(1) What is the expected value of X in terms of ?
(2) What is Var(Xi ) in terms of ?
(3) What is Var(X) in terms of and n?
(4) Give the delta method approximation to the bias of = log( X2 ) as an estimator of the
quantity = log(2/).
(5) Give the delta method approximation to the variance of in the previous question.
(6) Let B denote your bias approximation in (4) and let V denote your bias approximation
in (5) and assume these are correct. Give an approximation to the mean squared error of
in terms of B and V .
(7) Which term in (6) is the more dominant term as n goes to infinity?
Circle one: the bias term/the variance term
(8) Give a method of moments estimator of .
(9) Write down the log-likelihood function.
(10) What is the maximum likelihood estimator of ?
(11) Calculate Fisher information I() = Var
log f (X|) for this situation.
(12) Assume you have correctly
calculated the Fisher information I() in (11), what is
the limiting distribution of n(
) as n ? Be specific about all parameters of the
distribution. Just giving the name of the distribution will not suce.
(13) Give the form of an approximate 95% confidence interval for based on the approximate
distribution in (12). (Make sure that every term in your confidence interval is either known
or estimated from the data.)
(14) Suppose has prior distribution with pdf of the form
{
(|) =
e for > 0
0
otherwise
where is given. What is the posterior distribution of given X1 , . . . , Xn ? If you can, be
specific about name of the distribution and its parameters.
(15) Suppose we wish to test the hypothesis H0 : = 1 vs. HA : = 1. Write down the
generalized likelihood ratio test statistic .
(16) In (15) do we reject H0 for suciently small, or suciently large values of the test
statistic ? Circle one: suciently small/suciently large
(17) Under H0 , assuming n large, what is the approximate distribution of 2 log in (15)?
(18) In (17) what is the approximate expected value of 2 log ? (Give a number!)
(19) Suppose the observed value of 2 log from the data is 5, and assume that under
H0 the approximate distribution of 2 log in (17) is continuous with pdf denoted by f
Describe a p-value for the test in (15) as an integral involving f .(You do not need to have
correctly identified this distribution in (17) to answer this.)
Problems (20) - (29) concern the following situation. We have two independent samples
X1 , . . . , Xm and Y1 , . . . , Yn . The Xi are iid N (X , 2 ) and the Yi are iid N (Y , 2 ) with
parameters X , Y , and unknown.
(20) What is the expected value of X Y in terms of the unknown unknown parameters?
(21) Are X and Y independent? Circle one: Yes/No
(22) What is the variance of X Y in terms of the unknown parameters?
(23) Does X Y have a normal distribution? Circle one: Yes/No
(24) What is the distribution of
(25) Are
i=1 (Xi
X)2 and
i=1 (Xi
i=1 (Yi
X)2 / 2 ?
Y )2 . independent? Circle one: Yes/No
(26) What is the distribution of
m
i=1 (Xi
X)2 +
2
i=1 (Yi
Y )2
(27) Are X Y and
i=1 (Xi
X)2 +
i=1 (Yi
Y )2 independent? Circle one: Yes/No
(28) Assuming X = Y what is the distribution of the quantity
X Y
sP 1/m + 1/n
where
sP =
m
i=1 (Xi
X)2 + ni=1 (Yi Y )2
m+n2
Be precise!
(29) Suppose m = 3 and n = 4 and when we order the Xi and Yj in the combined sample
we find that the ordering is as follows:
X3 < X1 < Y3 < X2 < Y1 < Y4 < Y2
Give an exact p-value for testing H0 : X = Y vs. HA : X < Y using the WilcoxonMann-Whitney rank sum test.
Problems (30) - (36) concern the following situation. A study is conducted in order to
understand dierences in responses to a certain drug used to reduce the size of liver tumors
in rats. Genotypes at two distinct genetic loci on dierent chromosomes that are thought
to be relevant to response. The genotype of a rat is either aa, aA, or AA at the first locus,
and the genotype is bb, bB, or BB at the second. A study is conducted in which 2 rats
having liver tumors are sampled from the population of rats with each possible genotype
combination and the percentage reduction (Y )in tumor size as a result of treatement is
determined for each.
Here are the data (left panel) and the summary of the corresponding ANOVA table (with
some entries missing) obtained when using the R commands:
AOV<-aov(Y~geno1+geno2+geno1*geno2)
summary(AOV)
obs# geno1 geno2 Y
1
aa
bb
75.8
2
aa
bb
82.0
3
aa
bB
80.9
4
aa
bB
77.2
5
aa
BB
49.9
6
aa
BB
46.3
7
aA
bb
50.3
8
aA
bb
55.2
9
aA
bB
51.8
10
aA
bB
54.6
11
aA
BB
17.5
12
aA
BB
23.7
13
AA
bb
55.6
14
AA
bb
55.8
15
AA
bB
60.7
16
AA
bB
75.2
17
AA
BB
26.3
18
AA
BB
29.9
geno1
geno2
geno1:geno2
Residuals
Df Sum Sq
2
2197
2
4230
4
100
180
Mean Sq
1098
2115
25
F value Pr(>F)
55.05 8.96e-06
106.01 5.55e-07
1.25
0.357
If the genotypes are numbered 1,2,3 for aa,aA, and AA, and for bb, bB, BB, let Yijk denote
the reduction obtained for the k-th rat drawn from the population with genotype of i at the
first locus, and j at the second locus.
(30) Write down a numerical expression (i.e. an expression involving only numbers and
operations on them) for 3i=1 3j=1 2k=1 (Yijk Y )2 .
(31) Assuming that the data follow the standard two-way ANOVA model with interactions
Yijk = + i + j + ij + eijk , for i = 1, 2, 3, j = 1, 2, 3, k = 1, 2,
where the , i , j , and ijk are unknown constants, what constraint do we typically assume
the constants 1 , 2 , and 3 satisfy?
(32) In the ANOVA table what are the missing values of Residual Df and Residual Mean
Sq?
Residual Df =
Residual Mean Sq =
(33) Give an unbiased estimator of 2 .
(34) Use the ANOVA table to give a p-value for testing the null hypothesis of additivity of
the eects of the two genes vs. the alternative hypothesis of non-additivity.
(35) Suppose instead we fit an ANOVA model without the ij terms, i.e. assume
Yijk = + i + j + eijk for i = 1, 2, 3, j = 1, 2, 3, k = 1, 2.
How many degrees of freedom would we have for estimating 2 ? (Your answer should be a
number.)
Questions 36 - 47 concern the following situation. Experiments are carried out to determine
the concentration of a certain chemical additive that will produce the strongest possible
material when that material is treated with the additive. In each of 10 experimental trials,
the concentration of the additive is varied and the resulting material strength is determined.
The data are as in the following table:
conc strength
5
5.74
5.5
8.30
6
7.47
6.5
10.14
7
9.46
7.5
9.77
8
8.77
8.5
8.75
9
5.23
9.5
4.18
10
1.78
An additional column of squared concentrations called conc.sq is added to the dataset, and
then a linear model of the form
Strengthi = 0 + 1 conci + 2 conc.sqi + ei
is fitted to the data, with the usual assumptions, (in particular ei are independent with
ei N (0, 2 )) and the output is as given in the following table
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -36.6280
5.6265 -6.510 0.000186 ***
conc
13.1614
______
8.510 2.79e-05 ***
conc.sq
-0.9334
0.1027 -9.092 1.72e-05 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 0.7518 on 8 degrees of freedom
Multiple R-squared: 0.936,Adjusted R-squared:
0.92
F-statistic: 58.5 on 2 and 8 DF, p-value: 1.678e-05
(36) Use the table to give a (numerical) unbiased estimate of 1 .
(37) Give a p-value for testing the null hypothesis that the relationship between concentration and expected strength is actually a linear one, vs. the alternative that the relationship
is nonlinear.
(38) Give a numerical prediction of the strength obtained when the concentration is 5.
(39) Give a numerical expression for the residual obtained when the concentration is 5.
(40) Give a numerical expression for the sum of squared residuals.
(41) Give a numerical value for the square of the correlation coecient between the observed
strengths and the fitted values.
(42) Give a numerical estimate of the concentration that gives rise the highest value of
expected strength.
(43) Is the answer in (42) unbiased? Circle one (YES/NO)
(44) What is the numerical value for the conc Std. Error that is missing from the table
summarizing the model fit?
(45) True or False. The expected value of e2i is 2 .
(46) True or False. The expected value of the square of the i-th residual e2i is 2 .
(47) Write down the design/model matrix for fitting the model.
(48) Generally speaking, for estimating 2 using an estimator
2 with 2 2 for some
number of degrees of freedom , is it better to have less or more degrees of freedom?
Circle one answer: more/less
(49) True or False. If 5 confidence intervals are constructed for n dierent parameters,
and each confidence interval has 99% coverage probability, then the chance that all of the
intervals contain their respective true values is at least 95%.
(50) True or False. Based on an iid sample X1 , . . . , Xn from the N (, 1) distribution with
unknown,
consider the level test that rejects H0 : = 0 vs. HA : > 0 provided
X > z() n. The power of this test does not depend on the true value of .