Question Answer Comments
1 C Regression models are global, in that they assume the same model fits all the data. NN techniques, only look locally at data around the point of interest.
2 D RMSE and Adjusted R-squared are one-to-one functions of one another, so equivalent for choosing a model.
3 B One of the facts presented in class -- the equivalent p-values slide in the class 2 notes.
4 B The crossvalidated RMSE or R^2 could be used to choose between competing models. The crossvalidated RMSE is a legitimate measure of out-of-sample predictive strength
5 E By the definition of the PRESS statistic.
6 C A line is bounded by +/- infinity, whereas the proportions must lie between 0 and 1.
7 E This is the equivalent of the t-stat for an individual regression coefficient in regular multiple regression.
8 E The difference on the logit scale is 0.2. By default JMP will model the probability of "No". The odds for women "not quit" are exp(0.2) = 1.22 higher, and therefore "quit odds" are 1/1.22 = 0.82 lower.
9 B Bonferroni: 0.05/50 = 0.001,
10 D This is when trees tend to require many splits to pick up the smooth relationship.
11 C The AUC is 0.55. A little better than chance.
12 B Logit: -0.29361 + 75 * 0.00868322 = 0.3576. Probability = exp(0.3576)/(1 + exp(0.3576)) = 0.588
13 B exp( 40 * 0.00868322) = 1.415
14 A A slope of 0 on the logit scale equals an odds of exp(0) = 1. As the confidence interval for the slope doesn't contain 0, the CI for the odds doesn't contain 1.
15 B Using log(distance) has a lower AICc, so it is preferred.
16 C -2 * (ll(small) - ll(big)) =-2 * (-2346.828 - -2269.1638)
17 A The group with the highest coefficient. My first time is -(0.622 + 0.1388 - 0.4577) = -0.3031. So "Every year" is highest.
18 C The definition of interaction: The impact of X1 on Y depends on X2.
19 C FP = 694/(694+ 1120) = 0.383. FN = 666/(666 + 929) = 0.418.
20 D The step history shows you which variable was chosen first.
21 D From the negative signs on the categorical variable coefficients in the "current estimates" Chile is always in the lower group.
22 B From Bonferroni: 0.1/11 = 0.009, so B is closest.
23 A AIC is like using a p-value threshold of 0.16, so this p-value rule of 0.05 is stricter.
24 A We are likely to be over-optimistic in the quality of fit.
25 C The test has a high p-value, (0.518) so doesn't reject the null. There is therefore no evidence of a lack-of-fit, so the model is calibrated.
26 B R-squared and RMSE are very close for the two modeling approaches.
27 B 5 -- one for each terminal node.
28 A Given the width of the confidence interval, the observed difference between crossvalidated and overall R-squared and SSE is not material.