Biostatistics Quiz for Students
Biostatistics Quiz for Students
1. If the lower quartile is farther from the median than the upper quartile, then the
distribution is negatively skewed. True
2. Sample is fixed numbers whose values are usually unknown. False
3. Mid-Range is measure of dispersion and most commonly affected by extreme valve.
Flase
4. A continuous random variable has measurable probability associated with each value
False
5. The degree of freedom of paired t-test is n-2. False
6. Dependent variable for linear regression should be numeric. False
7. The variance and standard deviation are the most superior and widely used measures
of dispersion True
8. The larger the sample size, the narrower the confidence interval and the more precise
our estimate True
9. A large p-value implies that the probability of the value observed, occurring just by
chance is low, when the null hypothesis is true. True
10. If there are real differences among groups’ means, the between groups variation will be
larger than the within variation. T rue
11. When the standard deviation is not known and the sample size is small uses t-test but f-
test is used to compare two population variance. True
12. A follow up time is right censored if we know that the event of interest took place at
unknown time prior to the actual observed time. False
Part II Matching
23. In an unpaired samples t-test with sample sizes n1= 11 and n2= 11, the value of tabulated
t should be obtained for:
(a) To collect sample data and use them to formulate hypotheses about a population
(b) To draw conclusion about populations and then collect sample data to support the
conclusions
26.Given IQ scores are approximately normally distributed with a mean of 100 and
standard
deviation of 15, the proportion of people with IQs above 130 is:
a. 95% b. 68% c. 5% d. 2.5%
27. Suppose you conduct a significance test for the population proportion and your p-value
is 0.184. Given a 0.10 level of significance, which of the following should be your
conclusion?
28. Loosely speaking, what does the Central Limit Theorem say?
(b) Measures of central tendency should always be computed with and without outliers.
(a) It describes all the non-sampling errors that occur when sampling from a population.
(b) It describes how a statistic's value will change from sample to sample.
30. One example we looked at in class involved HIV positive rates among rural Chinese
farmers. One of the variables studied was education level, which was recorded as \
illiterate," \primary," and\secondary." What type of variable is education level in this
example?
31. A researcher divided subjects into two groups according to gender and then selected
members from each group for her sample. What sampling method was the researcher
using?
33. Assume that the population that we want to conduct a research is on patients who are
following ART in Gondar. From the information we have, we assumed these patients have
similar characteristics with respect to the study variable. In order to select a sample of
patients from all 5000 patients who are on follow up, which sampling technique is more
appropriate?
34 If most of the measurements in a data set are of approximately the similar magnitude
except for a few measurements that are quite a bit larger, how would the mean and median
of the data set compare and what shape would a histogram of the data set have?
a. The mean would be smaller than the median and the histogram would be skewed with a
long left tail.
b. The mean would be larger than the median and the histogram would be skewed with a
long right tail.
c. The mean would be larger than the median and the graph would be skewed with a long
left tail.
d. The mean would be smaller than the median and the histogram would be skewed with a
long right tail.
e. The mean would be equal to the median and the histogram would be symmetrical.
35. Many professional schools require applicants to take a standardized test. Suppose that
1000 students write the test, and you find that your mark of 63 (out of 100) was the 73rd
percentile. This means:
a. At least 73% of the people got 63 or better. b. At least 270 people got 73 or better.
c. At least 270 people got 63 or better. d. At least 27% of the people got 73 or
worse.
36. If a null hypothesis is rejected at the 0.05 level of significance for a two-tailed test, you
37. The relationship between level of significance, degree of freedom and the tabulated
value of t from the t-distribution is
A. the value of t has direct relation with degree of freedom and inverse relation with level
of significance ( α )
B. when we increase degree of freedom and keep level of significance ( α ) constant then
the value of t will decrease
C. the value of t has direct relation with both degree of freedom and level of significance (
α )
d. When we increase level of significance ( α ) and keep degree of freedom constant then the
value of t will decrease
a. The sample mean is more sensitive to extreme values than the median.
b. The sample range is more sensitive to extreme values than the standard deviation.
c. The sample standard deviation is a measure of spread around the sample mean.
d. The sample standard deviation is a measure of central tendency around the median.
39. Which one of the following is true about positively skewed distribution?
A. Mean is smaller than median B. Majority of scores is at the right end of the curve
C. Median is greater than mode D. Few extreme scores are scattered at the left end
A. failing to reject a true null hypothesis B. failing to reject a false null hypothesis
41. Which one of the following statement is true about 95% confidence interval for the
population proportion?
B. Is more likely to contain the population proportion than the 99% confidence interval.
d. Can be used to give an indication of whether the sample proportion is a precise estimate
of the population proportion.
42.The study was conducted to check whether there is association between asthma and
smoking. At the end, it was found that the p-value for the variable is 0.03. Then what is the
conclusion to be made by the researcher?
a. There is no association between smoking and asthma
d. We can’t say anything from the given information unless additional information is given
43. The following are survival times in months recorded for six tumor-bearing rats being
observed after radiation therapy: 1, 4, 1.7, 2.3, 2.9 and 3.2. If the observed value of 3.2
months is mistakenly recorded as 32 months what will be the effect on the summary
statistics of this study?
44. for constructing grouped frequency distribution table, which of the following is not
true?
A. The classes must be mutually exclusive B. The number of classes should not be
too many
C. The intervals of the classes can be open-ended D. The classes should be of equal width
A. Each unit in the sampling frame has an equal chance of being selected
B. The starting point should be chosen at random from the first sampling interval
46. In a study of the prevalence of HIV among adolescents in Ethiopia, a random sample of
adolescents in Lideta Kifle Ketema was included. Which one of the following statement is
correct?
47. Which of the following provides the best definition of a frequency when the term is
applied to a dataset?
a. The number of occurrences for a range of values that a variable takes in a data set
b. The number of occurrences for zero values that a variable takes in a data set
c. The number of occurrences for one, or a range of values that a variable takes in a data
set
d. The number of occurrences for the mean value that a variable takes in a data set
e. The number of occurrences of inappropriate values that a variable takes in a data set
A. 50% of the unranked scores b.25% of the ranked scores c. 75% of the rank scores
49. For a set of data that follow a normal distribution how many scores can one expect to
find within one standard deviation on each side of the mean, that is two standard
deviations in total?
50. The standard error of the mean (SEM) is related to sample size, specifically:
c. As sample size increases, SEM becomes less stable d. As sample size increases, SEM
decreases
d. Comparison of a sample mean of zero to that of a population one over a time period
52. The two independent sample t -statistic is suitable in the following situation:
a. Comparison of two independent sample means where the samples are <30
b. Comparison of two independent sample means where the samples are >30 or normally
distributed
c. Comparison of two independent sample means where the samples are exponentially
distributed
53. The p value (two sided) associated with the two independent samples t statistic, assumes
the following:
c. Mean of sample one is less than that of sample two D. Mean of sample one is greater
than that of sample two
54. When carrying out a Wilcoxon matched- pairs statistic on a small dataset (i.e. n<50),
what method of p-value computation is the most appropriate?
55. The one parameter model in simple linear regression attempts to?
c. Fit the data within the 95% CI limit, by transforming the x values
55. The coefficient of determination can be interpreted a number of ways. Which of the
following is one of them?
<65 60
65-74 270
75-84 540
85-94 420
95-104 150
105-114 45
>115 15
56. Those with DBP 95 mm/HG or above were considered to be hypertensive. Thus
hypertensive men felt
A. above the 86th percentile b. below the 14th percentile c. below the 86th percentile D
none
57. The distribution of DBP for 800 females in this population is nearly symmetric with the
same standard deviation as that for the males. The mean for females was 79 mmHg. Thus
we may conclude:
A. the median will be the same for both sexes B. the proportion that is hypertensive is
the same for both sexes
C. the variability of DBP is higher for females D. the variability of DBP is smaller for
fameless E. none
59. Suppose we compared 2 random samples taken from the California all-discharge
database describe as Sample A is a random sample with 100 discharges. Sample B is a
random sample with 2,000 discharges. What can be said about the relationship between the
Sample standard error in Sample A (SEA) relative to the sample standard error of length-
of stay value in Sample B (SEB)?
d) Not enough information given to determine relationship between the two standard
errors.
c. There is more than one independent variable d. There are multiple dependent and
independent variables
61. Statistical Power is affected by several factors which of the following is false:
a. Effect size, increasing it, increases power b. Sample size, increasing it, increases
power
c. Type one error (α), increasing it, increases power d. Type two errors (β), increasing it,
increases power
a. Is used to analyze survival data when individuals in the study are followed for varying
lengths of time.
c. Always assumes that the relative hazard for a particular variable is constant at all times
d. Uses the log rank statistic to compare two survival curves
e. Relies on the assumption that the explanatory variables (covariates) in the model are
normally distributed.
64. Simple linear regression has a number of sample data assumptions, what are they?
A fisheries researcher wishes to test for a difference in mean weights of a single species of
fish caught by fishermen in three different lakes in Nova Scotia. The significance level for
the test will be 0.05. Complete the following partial ANOVA table and use it to answer
questions 31.1 to 31.4
Source Df Ss Ms F
Within 17.03
group
Between 9
group
Total 31.23
c. µ1 = µ2 = µ3 d. µ1 = µ2 = µ3 = 0 e. None of these.
66. The value of FDATA for this test is: a. 8.52 b. 5.39 c. 2.00 d.
0.1854
67. The value of FCRIT for this test is: a. 3.5874 b. 3.8625 c. 3.9824 d. 4.2565
68. If you pooled all the individuals from all three lakes into a single group, they would
have a standard deviation of: a. 1.257 b. 1.580 c. 3.767 d. 14.19
69. What is the appropriate interpretation of this test?
a. Reject H0: All three fish populations have different mean weights.
b. Reject H0: Exactly two of the three fish populations have the same means.
c. Reject H0: At least one of the fish populations differs from the others in terms of their
mean weight.
d. Fail to reject H0: The mean weights of the fish in these three populations are the same
d. Fail to reject H0: There is insufficient evidence for differences in mean weights of the
fish from these three populations.
70. Having two sets of data, we can compare their scattering as follows:
A. For approximately equal average values, the one with a higher standard deviation is
more scattered
B. For approximately equal standard deviation values, the one with a higher average is
more scattered
C. For approximately equal standard deviation values, the one with a lower average is
more scattered
D. If both the averages and standard deviations differ much between the series, we can
compare scattering using the coefficient of variation E. all
71. The relationship between number of beers consumed (x) and blood alcohol content (y)
was studied in 16 male college students by using least squares regression. The following
regression equation was obtained from this study: yˆ= -0.0127 + 0.0180x the above
equation implies that:
C. the two correlations are equally strong, since 1.0 - 0.7 = 0.3
B. The predictions it makes will be correct on average, but we will not be certain of the
RMSE
A) Are always denote figures B) May be a single fact C) Are only mathematically correct
76. All of the following are true about measure of fertility except
A. the numerator of both CBR & GFR is number of live birth in year
B.NRR is always lower than GRR & less by half from TFR
D.GRR like TFR assume hypothetical cohort the women pass from birth to reproductive
life without expiring mortality E.GRR measures the production of female
A. is any process which generate well defined outcome B .is any subset of experiment
C .is conduction of any random experiment D. is the set of all possible outcome of
experiment
E. is an experiment which can be repeated any number of times under the same condition
the same results
78. All are true about Poisson distribution probability except
A. are always right skewed B. is nearly symmetry when the mean is small
C. the mean, variance & lambda are equal D. events happen randomly& independently
in time at constant rate
E. has not theoretical maximal valve but the probability towards to zero
80. Suppose that x has Poisson distribution probability with parameter lambda=4.7 then
which is true?
81. Which one is not true among the normal probability distribution?
A. IS the same as study unit B.IS the units on which information is collected
D. is the list of all the units in population from which a sample is to be picked
A. has only right tails B. the x2 curves gets more bell shaped C. measure independently
of each other’s
D. degree of freedom equal to (R-1) (C-1) E. the expected frequency must be at least five
F. all
85. All are true about t-distribution test except
D. the, n, should less than 30 and t less than 5 E. all are true except, a
D. the hypothesis has an intervention effect E. type ¡ (alpha) error is made when HA is
false but accepted. F. c &d
87. All are true about the probability sampling methods except
B. UN like simple random sampling, systemic sampling can be conducted without sampling
frame
D. like multistage random sampling, cluster sampling involves picking a random sample all
units in the clusters
E. the main disadvantage of stratified sampling is require more administrative effort than
simple random sampling
88. If Var (2x+3) =16, then SD (3X) is equal to A. 4 B. 6 C.2 D.8 E. 6√2
B. Non over lapping intervals that covers the entire data valve
C .consist s one or more variable D. is inferior to frequency polygons for comparing two
or more sets of data
E .is important in depicting the shape and location of central tendency F.B&E
90. All are not true about Interval scale of measurement except
A) All mathematic operation is possible B) Comparison is possible C) There is true
zero point
A. only tells the association b/n categories B. if RR <1, then the risk is preventable
D. odd ratio is calculated from cross sectional, case control, & retrospective cohort study
92. If the mean, median and mode of a distribution are 5, 6, 7 respectively, then the
distribution is:
93. Which of the following measures of variability is not dependent on the exact value of
each observation?
94. The median of a frequency distribution is found graphically with the help of:
94. The mean deviation about median from the data: 340, 150, 210, 240, 300, 310, and 320
is:
95. The classes in which the lower limit or the upper limit is not specified are known as:
a) Open end classes b) Close end classes c) Inclusive classes d) Exclusive classes
Answer questions 96 and 97 based on the following data which were extracted from the
annual death records of three adjacent districts (1998 Eth. Cal)
Name of district X Y Z
c) The crude death rate of district Z was less than the crude death rate of district Y d) All
of the above
97. Which of the following best explains the proportion of deaths due to malaria in District
(y)?
a) The proportionate mortality ratio was 30%. b) The cause specific mortality rate was
30%.
c) The fatality rate was about 30%. d) District Y had seen the highest infant mortality
rate.
C. Dispersion=> spread out of the value of the variable D. Location=> average value of the
variable E. All
99. Of the 140 children, 20 lived in owner occupied houses, 70 lived in council houses and
50 lived in private rented accommodation. Appropriate graphical presentation will be:
a. Simple bar chart b. multiple bar chart c. Component bar chart d. Histogram e.
b and c
100. Which of the following is NOT a possible value of the correlation coefficient?
A. The line goes through more points than any other possible line, straight or curved
B. The line goes through more points than any other possible straight line.
C. The same number of points is below and above the regression line.
A. displays residuals of the explanatory variable versus residuals of the response variable.
E. displays the explanatory variable on the x axis versus the response variable on the y axis.
Answer Question 103-108 A variety of summary statistics were collected for a small sample
(10) of bivariate data, where the dependent variable was y and an independent variable
was x.
SSE = 505.98
104 The least squares estimate of b1 equals a. 0.923 b. 1.991 c. -1.991 d. -0.923
105. The least squares estimate of b0 equals a. 0.923 b. 1.991 c. -1.991 d. -0.923
106. The sum of squares due to regression (SSR) is a. 1434 b. 505.98 c. 50.598 d.
928.02
108. The point estimate of y when x = 0.55 is a. 0.17205 b. 2.018 c. 1.0905 d. -2.018 e.
-0.17205
(c) Measure the degree to which two variables are linearly associated
(d) Obtain the expected value of the independent random variable for a given value of the
dependent
Variable
110. When the possible outcomes of an experiment are equally likely to occur, this we
apply:
(a) Relative probability (b) Subjective probability (c) Conditional probability (d) Classical
probability
A. the null hypothesis H0 is rejected if p <0.05 B. the null hypothesis H0 is rejected if p>
0.05
C. the alternate hypothesis H1 is rejected if p> 0.05 D. the null hypothesis H0 is accepted
if p <0.05
C. the probabilities of the two outcomes can change from one trial to the next
114. A variable that takes on the values of 0 or 1 and is used to incorporate the effect of
qualitative variables in a regression model is called
115. The ANOVA procedure is a statistical approach for determining whether or not
a. the means of two samples is equal b. the means of two or more samples
are equal
c. the means of more than two samples is equal d. the means of two or more
populations are equal
a. making inferences about a single population variance b. testing for goodness of fit
c. testing for the independence of two variables d. All of these alternatives are
correct.
A. The parametric test, distribution based but arbitrary used in the case of the
nonparametric test.
C. complete information about the population can get from Parametric test but not
nonparametric test.
E. none
Answer question 118. Accordingly the Leukemia Survival of X described as the following
table
C. The 95 % CI of EXP ( β ¿ b/n 1.353 and 45.832 D. The equation of fit logit p(x) =-1.946
+ 2.046 x E. all
119. In a simple random sample (SRS) of n = 100, X- residents, 38 of them said that they
had attended a football game this year. Which value is the closest to the margin of error for
a 95 percent confidence interval for the proportion of X residents who have attended a
game this year?
120. The mean of a binomial distribution is 10 and the number of trials is 30 then
probability of failure of an event is A. 0.333 B. 0.666 C. 0.9 D.
0.25
A. bar charts are not used for time series data B. histograms are used to display
discrete data
C. bar charts are based on area under the curve D. histograms do not have spaces
between consecutive columns
125. In which properties of good estimation related to estimator uses information from all
observation in the sample? A. Sufficiency B. Efficiency c. Consistency D.
un-biasness E. All
126. All of the following are possible values of probability except: a. 0.99 b. 0.1
c. -0.77 d. 0
A. greatly affected by extreme valve B. easily computed from open end intervals
C. more variation among individual valve D. less precise the estimation e. all
130. The total fertility rate=5 and the ratio of male to female =500:1, then the GRR is equal
to
Answer Question 132 and 133 based on the following data, suppose that in a certain
malarias area post experience indicates that the probability of a person with a high grade
fever will positive for malaria is 0.7 consider random selected three patients in the same
area
132. The probability at most two patients positive for malarias a. 0.441 b. 0.343 c.
0.657 d. 0.189 e. 0.027
166. A researcher is interested in the travel time of UOG students to college. A group of 50
students is interviewed. Their mean travel time in 16.7 minutes. For this study the mean of
16.7 minutes is an example of a (n) A. Parameter B. Statistic C. Population D. Sample
167. A sports psychologist was interested in the effects of a six-week imagery intervention
on an athlete’s ability to execute a sport-specific skill such penalty taking in football. How
might you define the imagery variable?
168. A population has a mean of μ=35 and a standard deviation of σ=5. After 3 points are
added to every score in the population, what are the new values for the mean and standard
deviation?
A. μ=35 and σ=5 B. μ=35 and σ=8 C. μ=38 and σ=5 D. μ=38 and σ=8
169. A research report summarizes the results of a t-test by stating: t(35)=5.2, p<0.05.
Which of the following is a correct interpretation of this report?
A. The H0 was not rejected and the probability of a Type I error is less than .05.
B. The H0 was not rejected and the probability of a Type II error is less than .05.
C. The H0 was rejected and the probability of a Type I error is less than .05.
D. The H0 was rejected and the probability of a Type II error is less than .05
170. Which of the following is true about a 95% confidence interval of the mean of a given
sample?
A. 95 out of 100 sample means will fall within the limits of the confidence interval.
B. There is a 95% chance that the population mean will fall within the limits of the
confidence interval.
C. 95 out of 100 population means will fall within the limits of the confidence interval.
D. There is a .05 probability that the population mean falls within the limits of the
confidence interval.
171. In an independent t-test output of SPSS, the Levene’s test result is p = .006. What can
we infer from this number?
172. Which of the following statements is the most accurate description for the concept of
standard deviation?
A. The total distance from the smallest score to the highest score.
B. The square root of the total distance from the smallest score to the highest score.
C. The squared average distance between all scores and the mean.
175. Among the following sets of data 4, 4,4,4,4 and 4, which one is true?
E. none
176. The following are percentages of fat found in 5 samples of each of two brands of baby
food:
Which of the following procedures is appropriate to test the hypothesis of equal average fat
content in the two types of ice cream?
177. A process by which we estimate the value of dependent variable on the basis of one or
more independent
a variable is called: (a) Correlation (b) Regression (c) Residual (d) Slope
178. The dividing point between the region where the null hypothesis is rejected and the
region where it is not
rejected is said to be: (a) Critical region (b) Critical value (c) Acceptance region (d)
Significant region
179. Suppose we had a much simpler regression, obtained with the command >>> regress
(gdp85 ~ gdp60)
180. An absolute measure of dispersion which expresses variation in the same units as the
original data is the:
181. How does the computation of a sample variance differ from the computation of a
population variance?
182 The algebraic sum of the deviations of a set of n values from their mean is a) 0 b) n –
1 c) n d) n + 1 e. none
Based on the following table answer question, six paintings were ranked by two judges.
A 2 2
B 1 3
C 4 4
D 5 6
E 6 5
F 3 1
183. The Spearman’s rank correlation coefficient is A. 0.71 B. 0.29 C. 0.42 D. 0.58
a. The p-value provides insight on how clinically important the study result is.
d. Assessing multiple comparisons will not change the value of each individually calculated
p-value.
185. A_____________ suitable MCT when the data pertains to rates and time
186. What are the factors affect and its effect on power of a test?
187. What are the criteria that we use to look into the validity of a chi-squared test? When
do you use correction for continuity in chi-squared test?
189. Why do we sometimes need to transform data before doing further analysis? List the
different techniques of data transformation with the conditions that we need on the data.
190. What are the differences between paired and independent samples?
191. A cross sectional survey was carried out among women of age group 20-60 yrs. to
determine whether there is an association between history of multiple sexual partners and
cervical cancer. Can you conclude from the survey results shown below that there is no
association between the two?
D. Make a conclusion
192. Based the ff two SPSS out going to investigate whether there is a difference in sodium
level in the diet between men and women. Answer the question
e) The mean sodium for males 3073.7 (95% CI 3006.6 to 3140.9) and females 2692.1 (95%
CI 2634.7 to 2729.7). Is there a difference between the sodium levels between males and
females explain you answer?
f) Using the SPSS output below, describe the relationship between sex and sodium intake?
g) As a result of the SPSS output, do you accept or reject your null hypothesis?
193. Based on the ff data answer question, in a cancer drug trial, 37 patients were
randomized to the treatment group and 32 patients to the control group. Their survival
times (until death) are measured in months and some observations are censored.
(Variables: group, sex, and age)
194. A study was conducted to investigate the possible cause of gastroenteritis outbreak
following a lunch served in a high school cafeteria. Among the 225 students who ate the
sandwiches, 109 became ill. While, among the 38 students who did not eat the sandwiches, 4
became ill.
195. A survey is planned to determine what proportion of the medical students have
regularly chewed khat. If no estimate of p is available and a pilot sample cannot be drawn,
what sample size would be required if a 95% confidence is desired, and d=0.04 is to be
used?
196. A physical therapist wished to estimate, with 99% confidence, the mean maximal
strength of a particular muscle in a certain group of individuals. He assumes that strength
scores are normally distributed with a variance of 144. , A sample of 15 subjects who
participated in the experiment yielded a mean of 84.3 what are 95% CI of the population
mean age?
197. Based on the ff data answer question, Culture and Gonodectin (GD) test results for
240 Urethral Discharge Specimens
Negative 8 48 56
198. Suppose that a cohort study of 400 smokers and 600 non-smokers documented the
incidence of hypertension over a period of 10 years. The following table summarizes the
data at the end of the study period:
Hypertension
Yes No Total
Yes 120 280
Smoking No 30 570
Total
Based on the above information, calculate and interpret the following measures of
association:
199. The following table is reproduced from a paper comparing the health and lifestyle of
people who live in traditional houses with those who live in improved ‘Habitat’ housing in
Malawi. Use the table to answer the following questions.
a) What statistical test would have been most appropriate for looking at the association
between education and type of housing?
b) Why do you think the authors have given the median instead of the mean as a summary
of average area of land?
c) What should the authors have given with the median to describe the variability in land
ownership?
d) What test would have been used to obtain the P value for comparing land ownership
between those with habitat and traditional houses?
e) Calculate the odds ratio for the effect of type of housing upon whether people have had
an illness in the past 4 weeks. Show your working
f) The authors continued on to do a multivariate analysis to look at the effect of type of
housing on illness after adjustment for confounders such as water source. The adjusted
odds ratio for type of housing was 0.55 (95% confidence interval 0.34 to 0.75). Interpret
this result i.e. what does this mean?
200. The following table shows the association between urban or rural site of residence and
having trichuris infection in children in Jimma, using CROSSTABS in SPSS. Use it to
answer the following questions.
a) What is the exposure variable in this study?
d) Which hypothesis (or statistical) test would you use to compare the percentage with
Trichuris infection in urban and rural children?
e) From the SPSS output, what is the P value for this association?
g) From the SPSS output, what are the odds ratio and the 95% confidence interval for
Trichuris infection in rural compared to urban children?