Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views12 pages

2022 Biostatisticss

The document provides an introduction to biostatistics, covering key concepts such as data, information, and various methods of data collection. It explains biostatistics, inferential statistics, variables, sampling methods, and measures of central tendency, along with their applications in medical science. Additionally, it discusses hypothesis testing, graphical presentation of data, estimation, and confidence intervals.

Uploaded by

sahadat.nstu2015
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views12 pages

2022 Biostatisticss

The document provides an introduction to biostatistics, covering key concepts such as data, information, and various methods of data collection. It explains biostatistics, inferential statistics, variables, sampling methods, and measures of central tendency, along with their applications in medical science. Additionally, it discusses hypothesis testing, graphical presentation of data, estimation, and confidence intervals.

Uploaded by

sahadat.nstu2015
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Introduction to bio-statistics

MPH1401
Batch -2022.

1.a) What is data?


Answer:
Data :
i. Data is defined as a collection of facts, figures, observations, or measurements obtained
during research or investigation.
ii. In statistics, data is the raw material that forms the basis of analysis
iii. Data refers to raw facts and figures that are unprocessed and unorganized, which can be used
for reasoning, discussion, or calculation."
Example:
The weights of 10 patients measured in kilograms: 56 kg, 62 kg, 59 kg, etc., are data.
Blood pressure readings collected from patients are also data.

1.b) What is the difference between data and information?


Answer: Here are the 15 key points comparing data and information:
Point Data Information
1. Definition Raw facts and figures Processed, organized data
2. Nature Unprocessed Processed
3. Meaning Has no meaning until analyzed Carries meaning
Blood pressure readings: 120/80,
130/85 Average BP in a population is 125/82
4. Example
Patient records Hospital mortality rate,Diagnosis
Test results
5. Usefulness Less useful directly Useful for decision making
6. Form Numbers, text, or symbols Structured form
7. Dependency Independent Dependent on data
8. Processing Not processed After processing
9. Decision making Cannot make decisions directly Basis for decisions
10. Storage Stored in raw form Stored as reports, summaries
12. Accuracy May contain errors Generally refined
14. Collection Through surveys, experiments Through analysis
15. Presentation Tables, lists Graphs, reports
1.c) Narrate different methods of data collection.
Answer:

Data collection methods in biostatistics include:


1. Surveys: Using questionnaires to collect information.
Example: Community health survey.

2 . Interviews: Face-to-face or telephonic conversation.


Example: Interviewing patients about symptoms.

3 . Observation: Watching and recording events.


Example: Observing handwashing practices.

4. Experiments: Manipulating variables and recording results.


Example: Clinical drug trials.
5. Registers/Records: Using existing hospital or lab records.
Example: Reviewing patient admission logs.
6. Census: Collecting data from every unit in the population.
Example: National population census.

7. Sampling: Collecting data from a portion of the population.


Example: Surveying 500 people in a city.

8. Focus Groups: Group discussions to collect qualitative data.


Example: Discussing barriers to vaccination.
9. Telephonic surveys: Data collected over phone calls.

10. Web-based surveys: Online questionnaires.


Example: Google Forms survey.
11. Case Studies: In-depth study of a single case or subject.
Example: Rare disease patient history.

12. Field experiments: Conducting studies in real-world settings.

13. Longitudinal studies: Collecting data over a long period.


14. Cross-sectional studies: Collecting data at a single point in time.

15. Secondary data collection: Using existing published data.


Example: WHO health reports.

2.a) What is biostatistics?


Answer:
Biostatistics:
i. Biostatistics is the branch of statistics that applies statistical methods to biological,
medical, and public health problems.
ii. "Biostatistics involves the application of statistical techniques to analyze and interpret
data in the fields of medicine, biology, and health sciences."
iii. Example:
Analyzing whether a new drug lowers blood pressure better than existing
drugs using statistical tests.
2.b) Briefly describe inferential statistics.
Inferential statistics:
i. Inferential statistics is the part of statistics that allows us to make predictions,
generalizations, or decisions about a population based on a sample.
ii. "Inferential statistics involves using sample data to make inferences about a larger
population."
iii. Example:
Testing whether a vaccine is effective by analyzing a sample of 1000 people and
applying results to the general population.

2.c) Explain the different uses of biostatistics in medical science.


Answer: Here are uses of biostatistics in medical science:
1) Clinical trials analysis.
2) Drug efficacy evaluation.
3) Epidemiological studies.
4) Public health policy planning.
5) Disease surveillance.
6) Identifying risk factors.
7) Hospital management.
8) Quality assurance in healthcare.
9) Genetics and genomics analysis
10) Diagnostic test evaluation.
11) Predicting disease outbreaks.
12) Health economics studies.
13) Bioinformatics.
14) Medical research publications.
15) Determining sample sizes.

3.a) What is variable?


Variable:
i. A variable is any characteristic or attribute that can take different values among
subjects in a study.
ii. "A variable is a measurable trait or characteristic that can vary from one individual to
another."
iii. Example:
Height, weight, blood pressure.

3.b) Write the characteristics of variables.


Characteristics of variables :
1) Vary among subjects.
2) Can be measured.
3) Classified as qualitative or quantitative.
4) Used for comparison.
5) Central to research.
6) Determines data type.
7) Can be dependent or independent.
8) May be continuous or discrete.
9) Subject to manipulation in experiments.
10) Basis for statistical analysis.
11) Can be observable.
12) Affects study outcomes.
13) Helps in hypothesis testing
14) Recorded systematically.
15) Used to establish relationships.
3.c) Differentiate between quantitative and qualitative variables.

Point Quantitative Variable Qualitative Variable


1. Definition Measurable numerical values Descriptive categories
Height, weight Gender, blood type,
2. Example
BP reading Disease present/absent
3. Types Discrete, continuous Nominal, ordinal
4. Measurement Numbers Labels
5. Analysis Mean, median Frequencies
6. Graph Histogram Bar chart
7. Mathematical ops Possible Not possible
8. Scale Ratio, interval Nominal, ordinal
9. Use Exact measurement Classification
10. Units Present Absent
11. Variation Measured Counted
13. Statistical test t-test Chi-square
14. Level Continuous scale Categorical scale
15. Coding Not necessary Sometimes needed

4.a) What is sample and population?


Answer:
Population: The entire group of individuals or items under study.
Example: Population: All diabetic patients in India.

Sample: A subset of the population selected for study.


Example: Sample: 500 diabetic patients selected from Delhi hospitals.

4.b) Compare stratified and cluster sampling.


Answer:
Point Stratified Sampling Cluster Sampling
1. Definition Dividing population into strata Dividing into clusters
2. Selection Randomly from each stratum Randomly selecting clusters
3. Purpose Ensure representation Cost-effective sampling
4. Example Urban and rural groups Villages as clusters
5. Homogeneity Within strata Within clusters
6. Heterogeneity Between strata Between clusters
7. Accuracy High Moderate
8. Cost Higher Lower
9. Complexity More complex Simpler
10. Bias Less More
11. Use case Health survey Immunization survey
12. Representation Good Limited
13. Statistical power High Moderate
14. Time More time-consuming Faster
15. Analysis Stratification needed Adjust for cluster effect
4.c) Write the usefulness of simple random sampling.
Answer:
The usefulness of simple random sampling.
1) Every unit has equal chance.
2) Minimizes selection bias.
3) Simple to understand.
4) Easy to implement with random tables.
5) Statistical analysis is straightforward.
6) Results are generalizable.
7) Unbiased estimator.
8) Basis for inferential statistics.
9) Suitable for large populations.
10) Applicable in clinical trials.
11) Randomization ensures fairness.
12) Helps in accurate confidence intervals.
13) Useful for pilot studies.
14) Allows for probability theory application.
15) Less risk of systematic errors.

5.a) What is standard normal distribution?


i. The standard normal distribution is a normal distribution with a mean of 0 and
standard deviation of 1.
ii. "A standard normal distribution is a symmetric bell-shaped distribution where data is
standardized into Z-scores."
iii. Example:

Z = (X - mean) / standard deviation

5.b) Write the characteristics of normal distribution.

The characteristics of normal distribution.


1) Bell-shaped curve.
2) Symmetrical around mean.
3) Mean = median = mode.
4) Total area under curve = 1.
5) 68% data within ±1 SD.
6) 95% within ±2 SD.
7) 99.7% within ±3 SD.
8) Tails extend infinitely.
9) Unimodal distribution.
10) Skewness is zero.
11) Kurtosis is 3 (mesokurtic).
12) Applicable in biological data.
13) Basis for many tests.
14) Z-scores standardize data.
15) Sample means follow normal distribution.
5.c) Differentiate between histogram and bar chart.

Point Histogram Bar Chart


1. Data type Quantitative Qualitative
2. Bars Touch each other Separate
3. X-axis Continuous Categorical
4. Use Frequency distribution Compare categories
5. Example Blood pressure data Gender count
6. Shape Continuous curve Discrete bars
7. Scale Numerical Nominal/ordinal
8. Axis Both axes numerical Only Y is numerical
9. Grouping Data grouped in intervals Data grouped by categories
10. Purpose Show distribution Show comparison
11. Gap No gap Gap present
12. Statistical use Show normality Show frequencies
13. Data interval Needed Not needed
14. Frequency Shown by bar area Shown by bar height
15. Application Medical data Survey results

6. a) What is measure of central tendency?


Measure of central tendency:
i. "Measure of central tendency is a statistical measure that identifies the center or
typical value in a data set."
ii. Types:
 Mean (average)
 Median (middle value)
 Mode (most frequent)
Example:
Heights: 150, 160, 170 cm
Mean = 160 cm
Median = 160 cm
Mode (if repeated value exists).

6.b) Write a situation where you would prefer median rather than mode.
Answer:

Situation:
When data is skewed or has extreme outliers.
Example:
Income data in a population (where a few people earn extremely high salaries) — median gives
better central value than mode.
6 .c) Write the strength and weakness of mode and median.
Answer:

Strength:
Strength

Mode Median
Easy to understand Not affected by outliers
Can be used for categorical data Useful for skewed data
Identifies most common item Divides data into halves
Mode of disease occurrence Median survival time

Weakness: Weakness:

Mode Median
May not be unique Ignores extreme values
Cannot be used for numerical analysis Cannot be used for mathematical operations
Unstable in small samples Difficult to compute in large data
Multiple modes confuse analysis Median not show variability

7.a) What is alternate hypothesis?


Answer:
i. Alternate hypothesis:
"An alternate hypothesis (denoted as H₁ or Ha) is a statement in hypothesis testing that
suggests a statistical difference or effect exists between groups or variables.
ii. In simpler terms, it is what the researcher aims to prove.
iii. It is the opposite of the null hypothesis (H₀ ), which states no effect or difference.
iv. Example:
H₀ : Drug A has no effect on blood pressure.
H₁ : Drug A lowers blood pressure.

7.b) What is the steps in hyphothesis testing?

Steps in Hypothesis Testing:


 State the hypotheses (Null and Alternative).
 Set the significance level (α) – Commonly 0.05.
 Choose the appropriate test (e.g., t-test, z-test).
 Calculate the test statistic.
 Find the p-value or compare with the critical value.
 Make a decision:
If p-value < α → Reject H₀
If p-value ≥ α → Fail to reject H₀
7.c) Write the purpose of hypothesis testing in scientific research.
Answer: :
1) Validate research assumptions.
2) Test effectiveness of treatments.
3) Make scientific decisions.
4) Generalize from sample to population.
5) Quantify uncertainty.
6) Control Type I and Type II errors.
7) Guide public health policy.
8) Improve clinical practices.
9) Provide evidence-based conclusions.
10) Support or reject claims.
11) Assess relationship between variables.
12) Establish cause and effect.
13) Aid in funding approval.
14) Benchmark against existing standards.
15) Enhance reproducibility in science.

8.a) What is graphical presentation of data?


"Graphical presentation is the visual display of data using charts, plots, and
diagrams to facilitate understanding and comparison."
Example:
Bar charts showing disease prevalence by gender.

8.b) Methods of presenting qualitative and quantitative data.


Answer:
Qualitative Data Quantitative Data
1. Bar chart Histogram
2. Pie chart Box plot
3. Frequency table Scatter plot
4. Pictogram Line graph
5. Word cloud (modern) Stem-and-leaf plot
6. Cross-tabulation table Frequency polygon
7. Stacked bar chart Dot plot
8. Pareto chart Cumulative frequency curve

8.c) Discuss graphical methods of data display. Draw stem-and-leaf diagram.


Answer:

Graphical methods include:


i. Bar chart (for categories)
ii. Histogram (for intervals)
iii. Pie chart (proportion display)
iv. Line graph (trend over time)
v. Scatter plot (relationship)
vi. Box plot (spread and outliers)
vii. Frequency polygon (distribution)
viii. Stem-and-leaf plot (raw data display)

Stem-and-Leaf Diagram
Data: 6, 24, 7, 9, 15, 10, 31, 6, 23, 7, 13, 11, 4.
Ordered Data: 4, 6, 6, 7, 7, 9, 10, 11, 13, 15, 23, 24, 31
Stem Leaf
0 466779
1 0135
2 34
3 1

Stem = tens place; Leaf = units place.

9.a) What is estimation and parameter?


Answer:
Estimation: Process of inferring population parameters from sample statistics.
Parameter: A numerical value that describes a characteristic of the population.
Example:
Sample mean is used to estimate population mean (parameter).

9.b) Briefly describe confidence interval.


confidence interval (CI):
"A confidence interval (CI) is a range of values derived from sample data, within which
we are fairly certain (usually 95%) the true population parameter lies."
Example:
If mean BP = 120 mmHg, CI (115-125 mmHg) → We are 95% confident the true mean is
between 115 and 125.

9.c) Interpret p-value as per level of significance.


Answer:
P-value measures the probability that observed results are due to chance.
Level of significance (α): Usually 0.05.

P-value Interpretation
P ≤ 0.05 Statistically significant (reject H₀ )
P > 0.05 Not significant (fail to reject H₀ )
Example:

P = 0.03 → significant → effect exists.


P = 0.08 → not significant.

10. Write Short Notes


a) Correlation (Short Note)

Correlation (Short Note)


1 . Definition:
"Correlation is a statistical technique that measures the degree and direction of
relationship between two variables."
2 . Types:
i. Positive correlation: Both increase (e.g., height and weight)
ii. Negative correlation: One increases, other decreases (e.g., exercise and weight)
iii. Zero correlation: No relationship.

3 . Coefficient (r):
Ranges from -1 to +1.
 +1 → perfect positive
 -1 → perfect negative
 0 → no correlation.

4 .Methods:
 Pearson’s correlation (for continuous data)
 Spearman’s rank correlation (for ordinal data)

5 . Uses in Medical Science:


i. Correlation between smoking and lung cancer.
ii. Correlation between BMI and blood pressure.
7 . Graphical display:
Scatter plot.

8 . Importance:
 Helps in identifying related variables.
 Basis for further regression analysis.

9 . Example:
Study finds r = 0.8 between cholesterol and heart disease risk → strong positive
correlation.

b) Linear Regression (Short Note)


1 . Definition:
"Linear regression is a statistical method that models the relationship between a
dependent variable and one or more independent variables by fitting a linear equation."

2 . Equation:
Y = a + bX
Where
Y = dependent variable,
X = independent variable,
a = intercept,
b = slope)

3 . Types:
i. Simple linear regression (one X variable)
ii. Multiple linear regression (multiple X variables)
5 . Assumptions:
 Linear relationship
 Independence
 Homoscedasticity
 Normal distribution of residuals

6 . Uses:
 Predicting outcomes
 Identifying risk factors
 Controlling confounding variables

7 . Medical Example:
Regression to predict blood glucose based on age, weight, and physical activity.

8 . Example:
Predicting blood pressure (Y) based on weight (X).

c) Central Limit Theorem (Short Note)


1. Definition:
"Central Limit Theorem (CLT) states that the sampling distribution of the sample mean
approaches a normal distribution as the sample size becomes large, regardless of the shape
of the population distribution."

2. Key Points:
 Works when sample size n > 30.
 Mean of sampling distribution = population mean (μ).
 Standard deviation = population SD / √n.

3. Importance:
 Basis for inferential statistics.
 Allows use of Z-tests and t-tests.
4. Applications:
 Estimating confidence intervals.
 Hypothesis testing.

5. Medical Relevance:
 Allows generalization from sample clinical trials to entire patient populations.

6. Example:
Measuring average height from 100 samples — even if height distribution is skewed,
sample means will be normally distributed.

You might also like