0 ratings 0% found this document useful (0 votes) 17 views 7 pages 340 s23 Final
The document outlines the structure and content of a final exam for a statistics course, including multiple-choice and short-answer questions covering topics such as random variables, linear regression, and hypothesis testing. It specifies the points allocated for each section and provides rules for computation and showing work. Additionally, it includes various statistical scenarios and problems for students to solve, emphasizing the importance of understanding statistical concepts and methodologies.
AI-enhanced title and description
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save 340-s23-final For Later STAT340 Final exam
Points:
Mct-6 (12)
MC7-9 (/6)
[sat.2 (8)
SAS (14)
[sa4.s (18)
[sas (4)
Total (42)
First (given) name:
Write here:
Last (family) name:
Wite here:
Rules:
+ You must show work forall computations (unless otherwise specified) to receive ful crcl
+ You do NOT need to simplify any expressions you write downMultiple choice 2pts each
MC1,2
Let X and ¥ be independent random variables such that E(X) = —
following value is closest o H(3X ~ 2Y)?
E(Y) = 8, Var(X) = 1, Var(¥) = 2. Which of the
a0
b.5
60
45
2.10
‘Which ofthe following values is closastto Var(3X — 2Y)?
a0
b.5
60
4.5
2.10
MC3,4
Let X;, X2,..., X,, be an independent and identically distributed sample such that X; has mean ys and variance o”. Asn
increases, how does your sample variance (i.e. 1 3>(X, — X)*) tend to change?
1. Increases proportional to n-
b. Increases proportional to /7t
©. Does not tend to change
4, Decreases proportional to /7=
. Decreases proportional ton.
[As increases, how does the variance ofthe sample mean Xtand to change?
‘a Increases proportional ton.
b, Increases proportional to /77
¢. Does not tend to change
4. Decreases proportional to /7
‘e. Decreases proportional ton.
MC5
‘Allelse held equal, in a simple linear regression context, which of the folowing is true as o? increases? Choose ALL that apply!
(2. the number of ight choices is 21)
a. § tends to increase
b. SE(A,) tends to increase
¢. RSE (Fesidual standard error) tends to increase
4. R? tends to increase
©. Df (degree of freedom) tends to increase
MC6
‘Which ofthe following is NOT an assumption ofa linear regression model?
‘a. The response variable has a linear relationship with the predictor variables.
», The predictor variables are normally distrbuted,
«6, The errors are normally distributed,
4d, The errors have constant variance.
©, The errors are independent.MC7
You decide to buy a lottery ticket every day until you win the jackpot. Assume you have an infinite line of credit (Le. your eredit card
lets you buy an unlimited number of lottery tickets), Which ofthe following isthe best random variable to use to model the number of
tickets you end up buying?
‘a, Normal
». Binomial
«. Poisson
4. Geometric
‘. Exponential
MCc8
Fe
We generate observations from the random variable defined by the CDF above J”. We will generate @ sequence or random varables
U;,Uz,...,U,, ~ Uni f(0, 1) (iid), and then let X, = F-™(U,). Which of the following intervals do you expect to see the most
number of observations X; alin?
a. (—00, 2.5)
b. (2.5, 4.1)
©. (5,9)
4. (9, 00)
©, Unable to determine
Mc9
‘A statistics instructor gives each of the 160 students in a class a different random data set. Each data set is obtained by doing
rprorm(n-28, mean-5, sé-4.5) . The students are told that — 4.5, but they do NOT know jt = 5. The students are tasked to use a
Monte Carlo test to test the hypotheses Hy : = 5 vs Hy : 4 # 5 ata = 0.05. Which of the following is true?
‘a. Using as many Monte Carlo replications as computationally possible will ower the standard error of the point estimate.
», Using as many Monte Carlo replications as computationally possible will improve the power of the test,
instead ofazstatiste E55 increases the power since ss based onthe deta
av ve
4. We expect to see on average 8 students committing a type | error.
‘8, None of the above are true,
6, Use a t statisticShort answer 4pts each
SA
Afair coins fipped 4 times. Assume each fip has no influence on any other flip. Let A and BB denote the following two events:
+ A: There are 2 heads in total out of 4 fps.
+B: The first ip is a head.
‘Answer each of the folowing, showing all work for full points. Ifit helps you, here's alist of every possible outcome:
Hie, HAM, HATH, ATT, TH, THT, ATT, HITT,
‘Ties, TAHT, THTH, THTT, TWH, TTHT, TTTH, TTT.
‘2. What is P(A)?
b, What is P(B)?
©. What is P(A&B)? (ie. what isthe probabiliy of both events occurring simultaneously?)
4. Are the events dependent or independent? Explain.
SA2
Each statement below may or may not be correct. For each statement, identily if i's corrector incorrect and explain why. If itis
incorrect, rewrite the statement to be correct.
0.01 is always a better value to use for a than 0.05 because it gives a lower rate of false positives.
b. For a computed 95% confidence interval for, there is a 95% chance that jis contained in the interval.
«. You can decide whether or not to include an interaction term in a model by checking if the two predictors are correlated in the
data.
4. If two events are independent, then they are also mutually exclusive,SA3
We have a dataset of CO2 uptake levels in a cerlain grass species under different temperature condi
ns, Here are the variables:
‘+ uptake : numeric response measuring amount of CO2 uptake of each sample
‘+ Type : categorical predictor (levels: “Quebec’,“Mississippi") denoting location where the sample was originally from
‘+ Treatment : categorical predictor (levels: “nonchilled’, ‘chile denoting temperature treatment applied to the sample
‘+ cone : numeric predictor denating level of ambient CO2 the sample was kept in.
Below isthe output of a multiple linear regression ft and the diagnostic plots. You may reference this output as justiication in your
answers below, but PLEASE clearly state which numbers you are referring to. Please show all work for full points.
a” Estimate Std. Error t value Pr(>|t|)
‘ih (Intercept) 27.620528 1.627945 16.965 < 22-16 *
8 TypeMississipai -8,380952 1.851185 -5.068 2.59¢-06 ***
‘tH Treatmentchilled -3,580952 1.51185 1.934 0,056,
48 Typetississipal:Treatmentehilled -6.557143 2.617972 2,505 0.0143 *
48 Residual standard error: 5,999 on 79 degrees of freedom
‘th Multiple R-squared: 0.7072, Adjusted R-squared: 0.6923
sty Festatistic: 47.69 on 4 and 79 DF, p-value: < 2.26-16
S708, wate 2 i
g cpr Pre esas] Fe
i ° i
“0 A i
48 = :
2
‘2. Construct a 95% confidence interval forthe interaction term and interpret it. You may use the normal approximation to the
distribution due to large sample size (1.0. use 1.96 as your critical value)
b. What response would you expect to observe on average for a new sample of Mississipp! type, nonchilled treatment, and with
Concentration 500? Write an expression for the answer (you do NOT need to simply the expression to a single number!)
«, Approximately what proportion of the change in response is explained by the change in predictors?
4, Looking atthe diagnostic plots, is there evidence of model assumption violations? Explain.SA4
You are a scientist working for big pharma developing a lest for a disease, Define the following variables:
+ Letp denote the prevalence of the disease in the population of interest, ie, P(disease)
+ Leta denote the false positive rate ofthe disease, ie. P(positive test | no disease) (this is also equivalent to t~specificity)
+ Let s denoie the sensitivity ofthe tes, ic. P(positive test | disease) (thisis also false negative rate)
‘Suppose you gather a sample of n subjects fora study. Assume the sample is very representative of the overall population of interest.
(On average, how many subjects would you expect to be in each category below? (Write an expression giving the expected average
COUNT of subjects in each category out of n total subjacts in the sample.)
2, True positives
b. True negatives
«©. False positives
4d. False negatives
SAS
This question is based on the manifest (Le. list of passengers) ofthe Titanic. Here are the relevant columns:
‘+ survived This is a categorical response, 1 indicating the passenger survived, 0 indicating the passenger died
+ Pclass This is a categorical predictor indicating the class of the passenger's ticket (e. 1st class, 2nd class, 3rd class)
+ Sex This is also a categorical predictor indicating the sex of the passenger
+ Age This is @ numercal predictor
** Estimate Std. Error z value Pr(>lz1)
‘wh (Intercept) 3.777013 8.401123 9.416 < 20-16
th Pclass2. — -1.309799 0.278066 -4.710 2.470-06 ++"
wt Pclass3 —-2,580625 9.281442 -9.169 < 20-16 77
4H Soxnale —-2.522781 8.207391 -12.168 < 20-16
th age ~0.036985 0.007656 -4.831 1.360-26 ***
‘a. Which of the predictors appear to be the most significant in this model? Give an interpretation of one of these coeticonts.
b. Give a 95% confidence interval for the male coefficient and interpret the interval
. Suppose you are a 20-year old male passenger in tst class. Write an expression for your predicted log-odds of survival
4. Convert the log-odds of survival from part cto a probability of survivalSA6
Suppose you are given a vector data with n = 100 observations X, of a poisson process (.®. each observation is @ count of the
number of occurences of some event ina fxed interval), Write a sequence of instructions using pseudo-code (i.e. a mix of some R
code as well as some English descriptions is allowed) that will compute a 95% confidence for) interval using a Monte-Carlo
based mothod.
‘Your psoudo-code should be specific enough so that someone who has a BASIC understanding of the R programming language and
its functions but has NO formal training in statistics should be able to follow your instructions and produce the correct computation.
Also, every stepicommand/instruction should at LEAST mention what specific R function to use, Any R expressions you write
do NOT have to perfectly evaluate in an R console to receive full credit, but your response SHOULD show both a clear
understanding of the statistical methodology as well as at least a BASIC understanding of R functions and syntax.
‘As an example, saying something ike “compute a point estimate of lambda’ would be considered too vague, but saying something
Tike "use mean() to find the mean of data and slore as 1anbéa_hat "is acceptable. Also note: There are multiple possible solutions
to this problem, and your number of steps may differ from someone else's number of steps even if you use the same method. This is
completely ok, as long as both responses are clear, complete, and specific enough. You may also add additional
‘annotation/commentary in each step to help clarify your thought process to the graders, such as in step 1 below which has already
been done for you to help get you started,
1, Use nean() to find the mean of data and store as lanbds_hat . This is our sample estimate of I(X) — A.