Biostatistics and Research Methodoly
Biostatistics and Research Methodoly
Type-I Error (α): Occurs when a true null hypothesis is rejected. It is also known as a
false positive.
Type-II Error (β): Occurs when a false null hypothesis is accepted. It is known as a
false negative.
Definition: It measures the average deviation of individual values from the mean in a
dataset.
Formula:
4. In a distribution, the Mode and Mean are 25 and 28 respectively. Find the
value of Median.
Equally likely events are those which have the same probability of occurring.
Example: Getting any number from 1 to 6 on a fair die.
A pie chart is a circular statistical graphic divided into slices to illustrate numerical
proportions. Each slice represents a category's contribution to the total.
Where:
n = number of trials,
x = number of successes,
p = probability of success.
10. Give the mode for the following data set: 200, 205, 205, 201, 199, 195, 202,
205, 205, 207 with justification.
Mode = 205, because it appears 4 times, which is more frequent than any other
number.
Chi-square test
Mann-Whitney U test
Wilcoxon Signed-Rank test
Kruskal-Wallis test
LSD is a post-hoc test used in ANOVA to compare pairwise differences between means. It
determines which group means are significantly different after finding an overall significant
difference.
SPSS
R
GraphPad Prism
SAS
Microsoft Excel
18. Write about Pie Chart.
A pie chart represents categorical data in the form of slices of a circle, where each slice
indicates the proportion of each category in the total dataset. It is useful for visualizing
percentage distribution.
Importance: Plagiarism violates academic ethics and can lead to legal consequences
and loss of credibility in research.
Types:
o Direct Plagiarism
o Self-Plagiarism
o Mosaic Plagiarism
o Accidental Plagiarism
Types of ANOVA:
o One-way ANOVA
o Two-way ANOVA
o Repeated Measures ANOVA
Use: To compare means of three or more groups to determine if at least one group
is significantly different.
5-MARKS
1. Write a brief note on Student t-test.
The Student’s t-test is a parametric statistical test used to determine whether there is a
significant difference between the means of two groups. It was developed by William Sealy
Gosset under the pseudonym "Student" in the early 1900s.
Assumptions of t-test:
The Least Significant Difference (LSD) is a statistical test used in Analysis of Variance
(ANOVA) to compare two group means when more than two groups are analyzed. It helps
determine which specific group means are significantly different from each other after
finding a significant F-ratio in ANOVA.
1. Post-Hoc Test:LSD is a post-hoc (after the fact) method used only if the overall
ANOVA indicates a statistically significant difference among group means.
2. Pairwise Comparisons:LSD compares each pair of treatment means to assess
whether the difference between them is statistically significant.
3. Assumptions:The data must meet the assumptions of ANOVA:
Normal distribution of residuals.
Homogeneity of variances.
Independent observations.
tα/2, dferror = critical value from the t-distribution at a given level of significance.
MSE = Mean Square Error from ANOVA.
n = number of observations per group (assuming equal group sizes).
Limitations:
Increases risk of Type I error (false positives) when many comparisons are made.
Should only be used if ANOVA F-test is significant.
3. Write a note on the different properties of a Normal Distribution Curve.
The Normal Distribution Curve, also known as the Gaussian distribution, is one of the most
important probability distributions in statistics and biostatistics. It is widely used in
pharmaceutical research, clinical trials, and population studies because many biological
variables (like blood pressure, height, weight, etc.) follow a normal distribution pattern.
1. Symmetrical Shape:
o The curve is perfectly symmetrical about the mean.
o This means that the values on the left side of the mean are a mirror image of
the values on the right side.
o Therefore, the mean = median = mode.
2. Bell-shaped Curve:
o The shape of the curve is bell-like, which means most of the data points
cluster around the central peak (mean), and the frequency decreases as we
move away from the mean.
o The curve approaches the X-axis asymptotically (i.e., it never touches the
axis).
3. Area under the Curve:
o The total area under the normal distribution curve is equal to 1 (or 100%),
representing the entire population or data set.
o This area corresponds to the probability distribution of the variable.
4. Empirical Rule (68-95-99.7 Rule):
o The spread of data in a normal distribution follows a specific pattern:
68.27% of data lies within ±1 standard deviation (σ) from the mean.
95.45% lies within ±2σ.
99.73% lies within ±3σ.
o This helps in identifying outliers and understanding data variability.
5. Tails of the Curve:
o The tails of the curve extend indefinitely in both directions and never touch
the X-axis.
o This suggests that extreme values (very high or very low) are possible but
have a very low probability.
6. Unimodal:
o The curve has a single peak, indicating that there is only one mode, or one
most frequent value.
7. Mean and Standard Deviation Define the Curve:
o The mean (μ) determines the location (center) of the curve.
o The standard deviation (σ) determines the spread (width) of the curve.
o A larger standard deviation results in a wider and flatter curve, whereas a
smaller σ gives a taller and narrower curve.
4. Write a note on ANOVA and write the F-ratio table for One-Way ANOVA.
Definition:
ANOVA (Analysis of Variance) is a statistical method used to compare the means of three or
more groups to determine whether there is any statistically significant difference among
them. It helps test hypotheses about population means based on sample data.
Where:
k = number of groups
N = total number of observations
SSb = sum of squares between groups
SSw = sum of squares within groups
SSt = total sum of squares
5. Write short notes on:
Example:A clinical trial where a new antibiotic is tested for its efficacy in treating bacterial
infections compared to a standard antibiotic.
1. Participant Observation: The researcher observes and may even participate in the
daily life of the group being studied.
2. In-Depth Interviews: Used to gather detailed information about participants’ beliefs,
practices, and experiences.
3. Natural Setting: Research is conducted in the real-world setting of the participants,
not in a laboratory.
4. Descriptive and Narrative: Emphasis is on providing rich, detailed descriptions rather
than statistical analysis.
Example:A researcher lives in a tribal village to study traditional healing practices and how
villagers use medicinal plants for treating diseases.
6. Write notes on Karl Pearson method of Correlation Analysis.
Correlation is a statistical technique that measures the degree and direction of a linear
relationship between two quantitative variables. When two variables are correlated, it
means that changes in one variable are associated with changes in another....Karl
Pearson's Method of Correlation:Karl Pearson's correlation coefficient (denoted as r) is
the most widely used method to measure the strength and direction of a linear relationship
between two continuous variables.
Formula:
Where:
X=[6,2,8,4,10]
Y=[7,9,5,6,8]
✅Regression of Y on X
✅Regression of X on Y
8. Write a note on the advantages of 2² factorial design.
A 2² factorial design is a type of experimental design used in research where two factors
(independent variables) are studied, and each factor has two levels (usually "high" and
"low" or "present" and "absent"). This design is denoted as 2², meaning 2 factors raised to 2
levels each, resulting in 4 experimental conditions (2 × 2 = 4 combinations).
In Biostatistics, the term population refers to the entire group of individuals, items, or
events having one or more common characteristics, from which data can be collected for
analysis.1.It is the complete set of observations or outcomes that a researcher is interested
in studying.2.For example, all patients suffering from diabetes in India form a population if a
study is being conducted on diabetes control.....There are two types of population:1.Target
Population – The entire group about which information is desired.2.Accessible Population –
The portion of the target population that is available for study.
Correlation is a statistical tool used to measure and describe the strength and direction of
a relationship between two quantitative variables. It does not imply causation, but simply
whether and how strongly pairs of variables are related.
1. Scatter Diagram:
o A graphical method.
o Points are plotted on a graph; their pattern indicates the type and strength of
correlation.
2. Karl Pearson’s Method (Pearson’s r):
o Measures linear correlation.
o Assumes data is normally distributed.
3. Spearman’s Rank Correlation:
o A non-parametric method.
o Used when data is ordinal or not normally distributed.
It is applicable in experiments or studies where the outcomes can be categorized into two
mutually exclusive events (e.g., yes/no, success/failure, cured/not cured).
Where:
Mean (μ) = n × p
Variance (σ²) = n × p × q
✅Applications in Pharmacy:
Standard Deviation (SD) is a measure of the spread or dispersion of a set of data from its
mean. It shows how much variation exists from the average (mean) value.
✅Formula:
✅Characteristics:
Always non-negative.
A low SD indicates that the data points are close to the mean.
A high SD indicates greater variability.
✅Applications in Pharmacy:
Standard deviation quantifies variability: A low SD indicates that data points are
close to the mean, whereas a high SD shows they are spread out over a wider range.
It is an essential tool in biostatistics to assess the reliability and consistency of
results.
Used to measure error margins, design confidence intervals, and perform
hypothesis testing.
Helps determine the normal distribution of biological data, like blood pressure,
cholesterol levels, or drug concentration levels in a sample.
13. Discuss testing of hypothesis.
1. Formulate Hypotheses (H₀ and H₁):Clearly define the null and alternative hypothesis
based on the research question.
2. Select the Significance Level (α):1.Common values: 0.05 (5%), 0.01 (1%)2.It
represents the probability of rejecting H₀ when it is actually true (Type I error).
3. Choose the Appropriate Test:Depending on the data type and distribution, choose
the correct statistical test:t-test, z-test, chi-square test, ANOVA, etc.
4. Compute the Test Statistic:
Use the selected test formula to calculate the test statistic using the sample data.
5. Determine the Critical Value or p-value:
o Compare the test statistic with a critical value from standard tables
o Or calculate the p-value (probability of obtaining the observed result)
6. Make a Decision:
o If p ≤ α, reject H₀ (evidence supports H₁)
o If p > α, fail to reject H₀ (insufficient evidence to support H₁)
1. Range:
o Difference between the highest and lowest value.
o Simple but affected by extreme values.
o Formula: Range = Max − Min
2. Interquartile Range (IQR):
o Spread of the middle 50% of the data.
o Formula: IQR = Q3 − Q1
o Useful in skewed data.
3. Variance:
o Average of the squared deviations from the mean.
o Formula:
o Units are squared, so less intuitive.
4. Standard Deviation (SD):
o Square root of variance.
o Provides dispersion in original units.
o The most commonly used measure.
5. Coefficient of Variation (CV):
o Expresses SD as a percentage of the mean.
o Useful for comparing variability between datasets.
o Formula: CV=SD/Mean × 100
✅Importance in Pharmacy:
A factorial design is an experimental setup used in research where two or more factors
(independent variables) are studied simultaneously to observe their individual as well as
combined effects on the dependent variable. It is widely used in pharmaceutical and clinical
research to understand the interaction between different treatments or
interventions...Definition:A factorial design is a type of experimental design that includes
more than one independent variable (factor), and each factor has two or more levels. All
possible combinations of factor levels are tested.
Advantages:
Disadvantages:
Key Features:
Advantages:
Disadvantages:
Regression analysis is a powerful statistical tool used to examine the relationship between
a dependent variable (outcome) and one or more independent variables (predictors). It
helps in understanding how the value of the dependent variable changes when any one of
the independent variables is varied, while the others are held fixed.
Key Components:
Minitab is a powerful statistical software package used for data analysis, particularly in the
fields of biostatistics, pharmaceutical research, quality control, and Six Sigma projects. It
provides tools for statistical analysis, graphical representation, regression, hypothesis
testing, control charts, and more. It is widely used in academic research, pharmaceutical
industries, and healthcare settings to aid in data-driven decision-making.
Definition:The Wilcoxon Rank Sum Test is a non-parametric statistical test used to compare
two independent groups when the data is not normally distributed. It is an alternative to
the independent sample t-test.
Purpose:1.To test whether two independent samples come from the same
distribution.2.Used when the sample sizes are small and data is ordinal or not normally
distributed.
Procedure:1.Combine the data from both groups and rank them from smallest to
largest.2.Assign ranks; if there are tied values, assign the average rank.3.Calculate the sum
of ranks for each group.4.Use the smaller of the two rank sums as the test
statistic.5.Compare the test statistic to critical values or use p-value for decision-making.
Interpretation:1.If the calculated U is less than or equal to the critical value from the U
distribution table, the null hypothesis is rejected.2.Null Hypothesis (H₀): The two groups
have the same distribution.3.Alternative Hypothesis (H₁): The distributions of the two
groups differ.
Example Application:
Comparing the pain relief scores between two different analgesic drugs when data is non-
normally distributed.
Q.1 (B) 2. Kruskal-Wallis Test:
Procedure:1.Combine all group data and rank them together.2.Assign average ranks to tied
observations.3.Calculate the sum of ranks for each group.4.Compute the Kruskal-Wallis
statistic using the formula.5.Compare with chi-square distribution to determine significance.
Formula:
Where:
Interpretation:
Post-hoc Test:
If the Kruskal-Wallis test is significant, pairwise comparisons using Dunn’s test or similar
post-hoc tests are performed to identify which groups differ.
Example Application:
Introduction:
1. Advancement of Knowledge:
o Research helps in discovering new facts, updating existing theories, and filling
knowledge gaps.
o It allows scientists to explore unknown phenomena and develop better
understanding in areas such as pharmacokinetics, drug delivery, and
therapeutics.
2. Development of New Drugs and Therapies:
o Essential for discovering novel drug molecules, combination therapies, and
advanced formulations.
o Ensures safety and efficacy through pre-clinical and clinical trials.
3. Evidence-Based Practice:
o Pharmacists and healthcare professionals rely on research to implement
treatments that are proven to be effective.
o It ensures rational use of medicines and improves patient outcomes.
4. Problem Solving:
o Research identifies and solves problems in drug manufacturing, storage,
administration, and side effects.
o It helps in overcoming drug resistance and finding cost-effective alternatives.
5. Policy Formulation:
o Governments and healthcare bodies use research data to draft health
policies, regulations, and guidelines.
o Examples include vaccine rollouts, drug scheduling, and pricing policies.
6. Improving Quality of Life:
o Research contributes to innovations like controlled release systems, targeted
delivery, personalized medicine, etc., which directly enhance patient comfort
and recovery.
7. Academic and Industrial Growth:
o Promotes scientific temper among students and professionals.
o Drives pharmaceutical industry growth through innovation and patent
generation.
Conclusion:
Introduction:
Plagiarism is the act of using someone else’s work, ideas, or expressions without giving
proper credit, presenting them as one’s own. It is considered a serious ethical and academic
offense in research and education.
Types of Plagiarism:
1. Direct Plagiarism:
o Copying text verbatim from a source without citation.
2. Self-Plagiarism:
o Reusing one’s previously published work without acknowledgment.
3. Mosaic Plagiarism:
o Mixing copied phrases with original work without proper citation.
4. Accidental Plagiarism:
o Forgetting to cite sources or using incorrect citation due to lack of
knowledge.
Consequences of Plagiarism:
1. Academic Penalties:
o Students may face disqualification, failed grades, or expulsion.
2. Legal Repercussions:
o Copyright infringement can lead to lawsuits and monetary fines.
3. Loss of Credibility:
o Researchers found guilty of plagiarism lose trust, reputation, and professional
standing.
4. Retraction of Published Work:
o Journals may retract plagiarized papers, damaging both the author's and
institution's reputation.
Preventive Measures:
1. Proper Citation:
o Use standard referencing styles like APA, MLA, Vancouver, etc.
2. Use of Plagiarism Checkers:
o Software like Turnitin, Grammarly, or iThenticate help detect unoriginal
content.
3. Paraphrasing with Understanding:
o Understand the source material and write in your own words.
4. Awareness and Training:
o Educating students and researchers about ethics in writing and publication.
Q3. Write Short Notes on: a) Histogram b) Cubic Graph c) Contour Plot Graph
a) Histogram
Definition:
A histogram is a graphical representation of the distribution of a dataset. It is a type of bar
chart that shows the frequency of data points within specified intervals, called bins.
Characteristics:
Purpose:
Example:
A histogram showing the number of students scoring within various score ranges in a test
(e.g., 0–10, 11–20, etc.)
Applications in Pharmacy:
Definition:
A cubic graph refers to a graph that represents a cubic function, generally in the form:
y=ax3+bx2+cx+d
Characteristics:
Purpose:
Definition:A contour plot (or counter plot) is a two-dimensional graph that shows three-
dimensional data in 2D format using contour lines. Each contour line connects points of
equal value.
Applications in Pharmacy:
Example:
A contour plot showing the effect of pH and temperature on the solubility of a drug.
Q.4 To study the performance of three different formulations of a drug and three different
water temperatures, the following dissolution readings were obtained with the equiment
Water Temperature A B C
Cold Water 47 45 50
Warm Water 39 42 52
Hot Water 44 36 48
Performe a 2 Way ANOVA, using 5% level of significance ( Given- F0.05 (2,4) = 6.94 )
Ans- 1. Introduction:
Two-Way ANOVA (Analysis of Variance) is used to determine whether there are any
statistically significant differences between the means of two independent variables
(factors).
In this question:
We aim to analyze whether the formulation type and water temperature have a significant
effect on the drug’s dissolution.
2. Data Table:
SSE=TSS−SSR−SSC=213.56−33.56−130.89=49.11
Total df = N - 1 = 9 - 1 = 8
Row (Water Temp) df = r - 1 = 3 - 1 = 2
Column (Formulations) df = c - 1 = 3 - 1 = 2
Error df = (r - 1)(c - 1) = 2 × 2 = 4
5. Mean Squares (MS):
6. F-Ratios:
Given:
F<sub>0.05</sub>(2,4) = 6.94
8. Conclusion:
At 5% level of significance, neither the water temperature nor the formulation has a
statistically significant effect on the dissolution rate of the drug.
Both F-values are less than the critical value of 6.94.Thus, we fail to reject the null
hypothesis for both factors.
Source SS df MS F
Total 213.56 8
Q5. Explain in detail about measures of central tendency.
Measures of central tendency are statistical tools used to describe the center point or
typical value within a dataset. These measures provide a single value that attempts to
represent an entire distribution of data. The three most commonly used measures of central
tendency are the mean, median, and mode. Each of these provides a different type of
information and is used based on the nature of the data and the purpose of the analysis.
The mean, often referred to as the arithmetic average, is calculated by summing all the
values in a dataset and then dividing the total by the number of values. It is the most
commonly used measure and is appropriate when the data is normally distributed.
However, the mean is highly sensitive to outliers or extreme values, which can distort the
true representation of the dataset. For instance, in pharmaceutical studies where one
patient may have an abnormally high response to a drug, the mean may not reflect the
response of the majority.
The median is the middle value of a dataset when it is arranged in ascending or descending
order. If the number of observations is odd, the median is the central value, whereas if the
number is even, it is the average of the two middle values. The median is particularly useful
when the data is skewed or contains outliers, as it is not affected by extreme values. For
example, in a study of income levels or recovery times where some values may be
disproportionately high or low, the median gives a better central measure.
The mode is the value that occurs most frequently in a dataset. Unlike the mean and
median, the mode can be used for both numerical and categorical data. It is useful in
identifying the most common observation in a dataset. In pharmaceutical research, the
mode can be used to determine the most frequently observed side effect of a drug or the
most common dosage prescribed.
The process of sampling begins with the identification of the target population, which is the
entire group of individuals or elements about whom information is sought. A sample is then
selected from this population in such a way that it accurately reflects the characteristics of
the population. The accuracy and reliability of the study outcomes largely depend on how
well the sample represents the population.
There are two main types of sampling: probability sampling and non-probability sampling.
In probability sampling, every member of the population has a known and equal chance of
being selected. This category includes techniques such as simple random sampling, stratified
sampling, systematic sampling, and cluster sampling. These methods minimize bias and
allow for generalization of results to the population. For instance, in stratified sampling, the
population is divided into subgroups (strata) such as age or gender, and samples are drawn
from each stratum, ensuring representation of all segments.
On the other hand, non-probability sampling does not offer each member of the population
a known chance of selection. It includes methods like convenience sampling, judgmental
sampling, quota sampling, and snowball sampling. Although easier and faster to conduct,
these methods are more prone to bias and may not yield results that are generalizable to
the population.
In pharmacy, sampling is widely used in clinical trials, drug efficacy studies, quality control
processes, and surveys. By using appropriate sampling techniques, researchers can ensure
that their findings are statistically valid and reflective of the broader population.
Q7. Explain about report writing and presentation of data.
Report writing and data presentation are essential components of the research process,
enabling researchers to communicate their findings clearly and effectively to various
stakeholders such as scientists, clinicians, policymakers, and the public. A well-structured
report not only documents the methodology and outcomes of a study but also provides
insights, interpretations, and recommendations based on the data collected.
Report writing in biostatistics follows a systematic format. It typically begins with a title that
reflects the core subject of the research, followed by an abstract summarizing the
objectives, methods, results, and conclusions of the study in a concise manner. The
introduction section provides the background and significance of the study, defining the
research problem and objectives. This is followed by the materials and methods section,
which describes the study design, sampling techniques, tools used for data collection, and
statistical methods applied.
The results section presents the findings in a clear and objective manner, often using tables,
graphs, and charts for better comprehension. It avoids interpretation, which is reserved for
the discussion section, where the results are analyzed in the context of existing literature
and hypotheses. The conclusion summarizes the key findings and their implications. The
report ends with references to previous research studies and appendices that may include
raw data, questionnaires, or additional material.
In pharmaceutical and healthcare research, clear report writing and accurate data
presentation ensure that the findings are credible, reproducible, and useful in decision-
making. Poorly written reports or misrepresented data can lead to misinterpretation and
potential harm, especially in clinical settings.
Q8. Explain in detail about designing the methodology.
Designing the methodology of a research study is one of the most critical stages in the
research process. It involves the detailed planning and structuring of the entire study in a
way that ensures the validity, reliability, and reproducibility of the findings. A well-designed
methodology helps in answering the research questions accurately and systematically while
minimizing bias and errors.
The first step in methodology design is to clearly define the objectives and hypotheses of
the study. The objectives should be specific, measurable, and achievable within the scope of
the study. Once the objectives are clear, the next step is to select an appropriate study
design. Depending on the nature of the research, this may include observational designs
(such as cross-sectional, cohort, or case-control studies) or experimental designs (such as
randomized controlled trials or laboratory experiments).
The selection of the study population is another crucial element. The population should be
clearly defined, and criteria for inclusion and exclusion must be specified to ensure
consistency. The method of sampling should also be determined, with attention paid to
whether probability or non-probability sampling techniques are more appropriate. The
sample size must be calculated using statistical formulas based on the desired confidence
level, margin of error, and expected variability in the data.
Data collection methods must then be outlined. This includes deciding on the tools and
instruments to be used, such as questionnaires, laboratory tests, interviews, or electronic
data collection methods. The reliability and validity of these tools must be established to
ensure they measure what they are intended to.
Identifying and defining the variables involved in the study is equally important. The
independent variables (those manipulated or categorized), dependent variables (outcomes
measured), and any confounding variables (that might interfere with results) should be
clearly described. Controlling for confounders is essential to establish a cause-effect
relationship.
The methodology should also address ethical considerations, especially in studies involving
human subjects. This includes obtaining informed consent, maintaining confidentiality, and
seeking approval from an Institutional Ethics Committee (IEC). In clinical trials, following
Good Clinical Practice (GCP) guidelines is mandatory.
Finally, the data analysis plan must be established, detailing the statistical tests to be used
and the software that will aid in analysis. This includes specifying whether data will be
analyzed using descriptive statistics, inferential statistics, or both. A well-thought-out
timeline and budget plan should also be included in the methodology to ensure proper
resource allocation and adherence to deadlines.In conclusion, designing the methodology is
the backbone of any scientific study. It requires careful planning and attention to detail to
ensure that the results obtained are accurate, valid, and applicable. In pharmaceutical
research, an appropriately designed methodology can mean the difference between a
successful drug development program and one that fails due to flawed data.