Introduction to Bio statistics
Batch- 2020
1. a) What is Biostatistics?
Biostatistics:
Biostatistics is the branch of statistics that applies statistical methods and principles to
problems and data in biology, medicine, and public health.
It involves the design of biological experiments, collection and analysis of data, and
interpretation of results.
Example:
If a researcher wants to know whether a new vaccine reduces the incidence of
disease, biostatistics helps in designing the trial, collecting data from participants, and analyzing
results to conclude effectiveness.
1. b) Write the statistical symbol of sample mean, population mean, and correlation coefficient.
Answer:
Sample Mean (xˉ ): Represents the average of sample data.
Symbol: xˉ
Example: Mean of sample data 2, 4, 6 is (2+4+6)/3=4(2+4+6)/3 = 4(2+4+6)/3=4.
Population Mean (μ): Represents the average of entire population data.
Symbol: μ
Example: If the entire population glucose levels are known, their average is μ.
Correlation Coefficient (r): Measures strength and direction of linear relationship between two
variables.
Symbol: r
Example: If height and weight have r=0.8r = 0.8r=0.8, it indicates a strong positive relation.
1. c) Importance of studying biostatistics in public health
Answer: Ans. See solution in 2021-1(c)
2. a) What is Sampling? Classify sampling methods.
Answer: Ans. See solution in 2021-7(a)
2. b) Compare Stratified and Cluster Sampling
Answer: See solution in 2023-1(b)
See solution in 2022-4(b)
2. c) Procedure of Simple Random Sampling
Answer:
1) Define the target population.
2) Prepare a complete list (sampling frame).
3) Assign numbers to each unit.
4) Use random method (lottery or random number table).
5) Select desired sample size.
6) Ensure every unit has equal chance.
7) Record selected samples.
8) Proceed with data collection.
9) Avoid selection bias.
10) Verify randomness.
11) Repeat if needed.
12) Document process.
13) Ethical clearance.
14) Obtain consent.
15) Analyze results.
Example: Drawing 50 names from 500 using a computer randomizer.
3. a) Define Null hypothesis and alternative Hypothesis.
Answer:See solution in 2023-8©
Null Hypothesis (H₀ ): A statement that there is no effect or no difference. It is the hypothesis
that a researcher tries to disprove.
Example: H₀ : The new drug has no effect on blood pressure.
Alternative Hypothesis (H₁ or Ha): A statement that indicates the presence of an effect or
difference.
Example: H₁ : The new drug lowers blood pressure.
3. b) Write down steps of Hypothesis test.
Answer:
i. State H₀ and H₁ : Example: H₀ : μ = 100, H₁ : μ ≠ 100
ii. Select significance level (α): Commonly 0.05
iii. Choose test statistic: t-test, z-test, etc.
iv. Compute test statistic value: Based on sample data.
v. Find critical value or p-value: From statistical tables.
vi. Compare test statistic to critical value:
If test statistic > critical value → reject H₀ .
vii. Make decision: Reject or fail to reject H₀ .
viii. Interpret result: Draw conclusion in context.
3. c) What do you mean by Type I and Type II error? Explain with examples.
Answer:
Type I Error (α): Rejecting H₀ when it is true.
Example: Concluding a drug works when it doesn’t.
Type II Error (β): Failing to reject H₀ when it is false.
Example: Missing that the drug is effective.
Type Definition Example
Type I Error (False Positive) Rejecting true H₀ Saying drug works but it doesn’t
Type II Error (False Negative) Accepting false H₀ Missing real drug effect
4. a) What is sample size determination?
Answer:
Sample size determination is the process of calculating the number of observations needed in a
study to ensure valid and reliable results. It balances accuracy, cost, and feasibility.
Example: If you want 95% confidence and 5% margin of error in a survey, you compute the
sample size accordingly.
4. b) Factors considered during sample size calculation
Answer
1) Significance level (α)
2) Power of study (1-β)
3) Effect size
4) Population size
5) Variability (standard deviation)
6) Precision (margin of error)
7) Design effect
8) Sampling method
9) Response rate
10) Dropout rate
11) Availability of subjects
12) Ethical considerations
13) Budget constraints
14) Study duration
15) Regulatory requirements
4. c) Sample size calculation
Given:
Proportion (p) = 0.50
Z = 1.96
Accuracy (d) = 0.05
Formula:
n=Z2p⋅ (1−p)d2
Substitute values:
n=(1.96)2⋅ 0.5⋅ (1−0.5)(0.05)2
=0.96040.0025
=384.16
=384.16
Answer: Minimum sample size = 385 (rounded up).
5. a) What is sampling distribution? Write characteristics.
Answer:
Sampling distribution is the probability distribution of a statistic (e.g., mean) obtained from
repeated samples.
Characteristics :
1) Based on random samples.
2) Centered around population parameter.
3) Spread depends on sample size.
4) Larger samples → smaller spread.
5) Standard deviation called standard error.
6) Often normal (by Central Limit Theorem).
7) Used in hypothesis testing.
8) Helps compute confidence intervals.
9) Reflects sampling variability.
10) Basis for p-values.
11) Allows standardization (z-scores).
12) Shapes vary with statistic.
13) Becomes narrower with larger samples.
14) Used in estimation.
15) Crucial in inferential statistics.
5. b) Assumptions for Central Limit Theorem (CLT)
Answer:
1) Samples are independent.
2) Samples are random.
3) Sample size is sufficiently large (n ≥ 30 usually).
4) Population variance is finite.
5) Data can come from any distribution (for large n).
6) No extreme outliers.
7) If sample is small, population must be normal.
8) Observations are identically distributed.
9) Mean and variance are defined.
10) CLT applies to sums and averages.
11) Used for proportions if np ≥ 5 and n(1-p) ≥ 5.
12) Sampling without replacement needs correction
13) Applicable to repeated sampling.
14) Basis for parametric tests.
15) Ensures sampling distribution of mean is normal.
5. c) Standard error calculation
Given:
Mean = 5.1 mmol/L
SD = 2.3 mmol/L
Sample size (n) = 16
Formula:
SE=SDn
SE=2.316
=2.34
=0.575 mmol/LSE
=0.575mmol/L
Answer: Standard error = 0.575 mmol/L
6. a) Define probability with example.
Probability:
Probability is the measure of the likelihood that an event will occur, expressed as a number
between 0 and 1
Probability=Favorable outcomes / Total outcomes
Example:
If a die is rolled, the probability of getting a 4 is 16\frac{1}{6}61 since there is 1
favorable outcome out of 6.
6. b) Properties of probability (15 Points)
Answer:
1) Range from 0 to 1.
2) Probability of impossible event = 0.
3) Probability of certain event = 1.
4) Sum of probabilities = 1.
5) Additive for mutually exclusive events.
6) Multiplicative for independent events.
7) Conditional probability defined.
8) Complementary probability: P(A') = 1 - P(A).
9) P(A ∪ B) = P(A) + P(B) - P(A ∩ B).
10) P(A ∩ B) ≤ P(A), P(B).
11) For independent A, B: P(A ∩ B) = P(A) × P(B).
12) Law of total probability applies.
13) Bayes' theorem links conditional probabilities.
14) Can be represented graphically.
15) Used in risk estimation.
6. c) Rules of probability
Answer:
1) Addition Rule:
For mutually exclusive events A and B:
P(A or B)=P(A)+P(B)
2) Multiplication Rule:
For independent events A and B:
P(A and B)=P(A)×P(B)
3) Complementary Rule:
(not A)=1−P(A)
4) Conditional Probability:
P(A∣ B)=P(A∩B)P(B)
5) Bayes’ Theorem:
P(A∣ B)=P(B∣ A)⋅ P(A)P(B)
Example: If P(A) = 0.3 and P(B) = 0.2 (independent), then
P(A∩B)=0.3×0.2=0.06 P(A ∩ B) = 0.3 \times 0.2 = 0.06 P(A∩B)=0.3×0.2=0.06
7. a) What are the measures of central tendency?
Answer:
Measures of central tendency are statistical tools used to describe the center or average of a
data set.
The three main measures are:
Mean (Average): Sum of all values divided by the number of values.
Example: For data 2, 4, 6 → Mean = (2+4+6)/3 = 4.
Median: The middle value when data is ordered.
Example: For data 2, 4, 6 → Median = 4.
Mode: The value that occurs most frequently.
Example: In data 2, 2, 3 → Mode = 2.
7. b) Strength and weakness of mean and median
Ans: See solution in 2022-6©
7. c) Calculate mean, median and mode: 4, 8, 7, 2, 5, 2, 9, 11
Ans: See solution in 2023-7©
8. a) What are the measures of dispersion?
Ans: See solution in 2023-7 (a)
8. b) Differentiate standard deviation and standard error
Point Standard Deviation (SD) Standard Error (SE)
Measures data spread
1. Definition Measures precision of sample mean
2. Formula
3. Represents Variability in data Variability in sample mean
4. Sample size effect No change Decreases with larger sample
7. Graphically shows Spread Precision
8. Related to confidence interval No Yes
9. Use in hypothesis testing Indirect Direct
10. Increases with variability Yes Yes
11. Calculated from Sample data SD and sample size
12. High value meaning Data widely spread Estimate is imprecise
13. Low value meaning Data clustered Estimate is precise
14. Common in Descriptive stats Inferential stats
15. Example Glucose SD 2 mmol/L SE 0.5 mmol/L
8. c) Calculate SD: 2, 7, 5, 6, 4, 2, 6, 3, 6, 9
Step 1: Find Mean
xˉ=2+7+5+6+4+2+6+3+6+910
=5010=5\bar{x}
= \frac{{2+7+5+6+4+2+6+3+6+9}}{10}
= \frac{50}{10}
= 5xˉ
=102+7+5+6+4+2+6+3+6+9
=1050
=5
Step 2: Find squared deviations
(2−5)2=9, (7−5)2=4, (5−5)2=0, (6−5)2=1, (4−5)2=1, (2-5)^2 = 9, \quad (7-5)^2 = 4,
\quad (5-5)^2 = 0, \quad (6-5)^2 = 1, \quad (4-5)^2 = 1,
(2−5)2=9,(7−5)2=4,(5−5)2=0,(6−5)2=1,(4−5)2=1,
(2−5)2=9,(6−5)2=1,(3−5)2=4,(6−5)2=1,(9−5)2=16(2-5)^2 = 9, \quad (6-5)^2 = 1, \quad (3-5)^2 = 4,
\quad (6-5)^2 = 1, \quad (9-5)^2 = 16(2−5)2=9,(6−5)2=1,(3−5)2=4,(6−5)2=1,(9−5)2=16
Step 3: Sum of squared deviations
9+4+0+1+1+9+1+4+1+16
=469+4+0+1+1+9+1+4+1+16
= 469+4+0+1+1+9+1+4+1+16
=46
Step 4: Variance
4610−1
=469
=5.111\frac{46}{10 - 1}
= \frac{46}{9}
= 5.11110−146
=946
=5.111
Step 5: SD
5.111
=2.26\sqrt{5.111}
= 2.265.111
=2.26
Answer: SD ≈ 2.26
9. a) Points to consider in selecting statistical test
1) Type of variable (categorical, continuous)
2) Number of groups
3) Sample size
4) Distribution (normal or not)
5) Paired or unpaired data
6) Level of measurement (nominal, ordinal, interval)
7) Homogeneity of variance
8) Research question
9) Independence of observations
10) Directionality (one-tailed or two-tailed)
11) Parametric or non-parametric
12) Ethical considerations
13) Effect size
14) Presence of confounding variables
15) Software availability
9. b) Three differences between parametric and non-parametric tests
Ans: See solution in 2021-6(b)
9. c) Assumptions of t-test (15 Points)
Ans: See solution in 2021-6©
10. Write short notes
a) Pie diagram
1. Definition of Pie diagram :
A pie diagram (or pie chart) is a circular graph divided into sectors, each representing a
proportion of the whole. It visually displays data in percentages or fractions.
2 . Features:
Circle = 100%
Each sector = category proportion
Popular in public health for showing disease prevalence
3 . Example:
In a survey:
40% had diabetes
30% had hypertension
30% had both
A pie chart would show three sectors proportional to 40%, 30%, and 30%.
4 . Strengths:
i. Easy to understand
ii. Good for categorical data
iii. Visually impactful
5 . Weaknesses:
i. Not precise
ii. Hard with many categories
iii. Cannot show trends
b) Confidence interval
Ans. See solution in 2021-10(a)
c) Odds ratio
1 . Definition of Odds ratio (OR):
The odds ratio (OR) is the odds of disease in exposed group compared to odds of
disease in unexposed group.
commonly used in case-control studies.
OR=odds of disease in exposed group/odds of disease in unexposed group
2 . Formula:
OR= (a/c)/(b/d)
=(a×d)/(b×c)
where a= cases exposed,
b= control exposed,
c= cases unexposed,
d= control unexposed, all are cell frequencies in a 2×2 table.
3. Interpretation:
OR = 1 → no association
OR > 1 → positive association
OR < 1 → negative association
4. Example:
Example 1: In a study of smoking and lung cancer:
40 cases smoked (a), 10 cases didn’t (c)
20 controls smoked (b), 30 didn’t (d)
OR=(40×30)/(20×10)
=1200/200
=6
Interpretation : Smokers have 6 times higher odds of developing lung cancer.