1 BIOSTATISTICS & RESEARCH METHODOLOGY
Biostatistics & Research Methodology
VIII SEM B. Pharm
Short Answers: 2 marks
1. Multiple regression.
Ans:
In multiple linear regression several independent variables are used to predict one
dependent variable.
Goal is to estimate one variable (dependent variable) based on several other
variables. (several independent variables)
Multiple linear regression is often used in social research as well as in market
research.
Example:
Several independent variables predict dependent variable
Simple linear regression:
Does the weekly working time have an influence on the salary of employees?
Multiple linear regression:
Do the weekly working time &age of employees have an influence on the salary of
employees?
2. One tailed and two tailed tests.
Ans:
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
2 BIOSTATISTICS & RESEARCH METHODOLOGY
One-Tailed Hypothesis Tests
One-tailed hypothesis tests are also known as one directional test and one-
sided tests because we can test for effects in only one direction.
In a one-tailed test, the region of rejection is either on the left or right of the
sampling distribution.
μ < μ0, μ > μ0
Two-Tailed Hypothesis Tests
Two tailed test will reject the null hypothesis if the sample mean is
significantly higher or lower than the hypothesized mean.
In two-tailed test, the region of rejection is on both the sides of the sampling
distribution.
A significance test in which alternative hypothesis has two ends, is called
two-tailed test.
μ ≠ μ0
3. Pharmaceutical examples for optimization techniques.
Ans:
The term Optimize is “to make perfect”.
It is defined as a process or methodology of making something (a design, system,
or decision) as perfect, as functional, as effective as possible.
Good optimized process always gives the best product.
Pharmaceutical Example:
In the formulation development
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
3 BIOSTATISTICS & RESEARCH METHODOLOGY
Make very small changes in formulation repeatedly.
The result of changes are statistically analyzed.
If there is improvement, the same step is repeated until further change doesn’t
improve the product.
In this example, a formulator can change the concentration of binder and get the
desired hardness.
4. Degrees of freedom.
Ans:
Degrees of Freedom:
Degrees of freedom is the number of independent observations of the variable.
The number of independent observations is different for different for different
statistics.
Suppose we are asked to select any five observation. There is no restriction on
the selection of these observations. Hence the degree of freedom is 5.
Suppose we want to select five observations whose sum is 100. Here four
observations can be selected freely but the fifth observation is automatically
selected by the restriction of total 100.
We are not free to select all the five observations, but our freedom is restricted
to the selection of only 4 observations. Thus the degree of freedom for selecting
‘N’ observations when one such restriction is given is N-1.
If two such restrictions are given, the degree of freedom will be N-2.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
4 BIOSTATISTICS & RESEARCH METHODOLOGY
The degrees of freedom can be calculated to help ensure the statistical validity
of chi-square tests, t-tests etc.
5. Standard error of mean and its significance.
Ans:
The standard error of the mean (SEM) is a measure of the variability of the
mean.
SEM is more concerned with making inferences about the mean of a
distribution rather than with individual mean value.
SEM is also called as measures of sampling error.
Standard error of mean gives the range of deviation from the mean with in
which means of infinite number of large population would lie.
Statistically standard deviation of mean values is equal to the standard
deviation calculated from the sample population divided by the square root of
the N.
Significance:
As as the sample size increases SEM decreases.
Larger sample size reduces the standard error that means it reduces the
variability, so reduces the SD.
Standard deviation of means is smaller than the S.D. of the individual data
points.
How sample size changes the standard error of mean
SD = 150, n=135, SEM= 150 = 12.9
√135
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
5 BIOSTATISTICS & RESEARCH METHODOLOGY
SD = 150, n=500, SEM= 150 = 6.71
√500
6. Two methods of sample size calculation in research study.
Ans:
To compute sample size in comparative research study, error, error, Δ
difference between groups and σ standard deviation must be specified.
1. The statistical formula for sample size determination for simple comparative
research study is
N ( ) ( Z Z ) 2
2
Percent of errors One sided (Z ) Two sided Z Z
1% 2.32 2.58 2.32
5% 1.65 1.96 1.65
10% 1.28 1.65 1.28
20% 0.84 1.28 0.84
2. The sample size (n) can be calculated by using Equation
Assume there is a large population but that we do not know the variability in
the proportion that will adopt the practice; therefore, assume p=.5 (maximum
variability). Furthermore, suppose we desire a 95% confidence level and ±5%
precision. The resulting sample size is demonstrated
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
6 BIOSTATISTICS & RESEARCH METHODOLOGY
7. Examples of application of regression models in stability testing.
Ans:
8. Wilcoxon Rank Sum test.
Ans:
This is used to calculate the study results when observations have been
obtained from two independent groups.(parallel study design)
This test is an alternative to the two independent sample t test.(un paired t-
test)
Mann–Whitney U test, is a nonparametric test of the null hypothesis, for
randomly selected values X and Y from two populations.
The test compares two independent group populations.
The Wilcoxon rank sum test is applicable if the data are at least ordinal (i.e., the
observations can be ordered).
Assumptions for the Mann Whitney U Test
1. The Wilcoxon rank sum test is applicable if the data are at least ordinal.
2. The independent variable should be two independent, categorical groups.
3. Observations should be independent.
4. Observations are not normally distributed.
5. Samples size is small.
9. Normal distribution of data.
Ans:
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
7 BIOSTATISTICS & RESEARCH METHODOLOGY
Normal distribution of data, also known as the Gaussian distribution, is
a probability distribution that is symmetric about the mean, showing that data
near the mean are more frequent in occurrence than data far from the mean.
In graph form, normal distribution will appear as a bell curve.
There is a rule that, if sample sizes are large enough, a sampling distribution will
be normally distributed. This is called the central limit theorem. If we know that
the sampling distribution is normally distributed, we can make better inferences
about the population from the sample.
Normal distribution have following characteristics:
1. The mean, median, and mode all have the same value (see ).
2. The curve is symmetric around the mean.
3. The kurtosis is zero.
4. The tails of the distribution get closer and closer to the x-axis as the values
move away from the mean, but the tails never quite touch the x-axis.
5. The distribution is completely defined by the mean and standard deviation.
6. One standard deviation above and below the mean includes 68.26% of the
values in the population; two standard deviations above and below the mean
include 95.46% of the values, while three standard deviations include
99.73%.
10. Types of Observational study designs.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
8 BIOSTATISTICS & RESEARCH METHODOLOGY
Ans:
Observational studies/design:
1. The population is observed without any interference by the investigator.
2. Individuals can be observed prospectively, retrospectively, or currently.
3. Treatment and exposures occur in a “noncontrolled” environment.
Why Observational studies
1. Cheaper
2. Faster
3. Can examine long-term effects
4. Hypothesis-generating
5. Sometimes, experimental studies are not ethical
Types of Observational study designs:
1) CASE-CONTROL STUDIES (DSIGNING methodology)
2) Cohort Study Design:
3) Cross sectional studies
11. Sample size calculation for confidence interval.
Ans:
The problem of estimating the number of samples needed to estimate the mean
with a known precision by means of the confidence interval is easily solved using
the formula for the confidence interval.
In a clinical study, a suitable sample size may be chosen to estimate the true
proportion of successes within certain specified limits.
A two-sided confidence interval with confidence coefficient “P” for a proportion
is N= Z 2 p pq ( w / 2) 2
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
9 BIOSTATISTICS & RESEARCH METHODOLOGY
12. Power of a study.
Ans:
Power quantifies the ability of the study to find true differences of various values
of.
Power = 1- =Power of the study which is probability of rejecting H 0 WHEN IT IS
FALSE.
Higher the power of the study, the more is the chance that the study will detect the
positive result.
1. Hence it is of interest of the sponsor to increase the power (decrease the chance
of Type II error) while designing the study.
2. If the observed difference between the treatment groups is significantly high
enough for us to reject the null hypothesis, we say that our results are
statistically significance.
3. Statistically significance measured by quantity called the p-value
4. The larger the error, the weaker is the power. Remember that is an error
resulting from accepting H0 when H0 is false.
5. Therefore, 1- is the probability of rejecting H0 when H0 is false.
6. From an idealistic point of view, the power of a test should be calculated before
an experiment is conducted.
7. In addition to defining the properties of the test, power is used to help compute
the sample size.
8. The larger the sample size, the larger the power.
9. The larger the difference to be detected (Ha), the larger the power. A large
sample size will be needed in order to have strong power to detect a small
difference.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
10 BIOSTATISTICS & RESEARCH METHODOLOGY
10. The larger the variability (s.d.), the weaker the power.
11. Power is a function of N, , Δ and σ.
A simple way to compute the approximate power of a test is to use the formula
for sample size
N ( ) ( Z Z ) 2
2
The power of the study =
Z ( )( N Z )
Once having calculated Z, the probability determined directly from Table IV.2
is equal to the power.1- .
In the problem discussed above with Δ=5, σ= 7, N =12and Z = 1.96:
5
Z ( )( 12 1.96)
7
The area above Z=0.51 is approximately 31%. The power is 1- . Therefore,
the power is 69%.
13. Pharmaceutical examples for data analysis using SPSS.
Ans:
Pharmaceutical Example:
Paired t-test:
Used when we have dependent samples – matched, paired or tied somehow
Repeated measures
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
11 BIOSTATISTICS & RESEARCH METHODOLOGY
Person Painfree (time Placebo Difference
in sec)
1 60 55 5
2 35 20 15
3 70 60 10
4 50 45 5
5 60 60 0
M 55 48 7
SD 13.23 16.81 5.70
Open SPSS
Open file “SPSS Examples” (same as before)
Go to:
“Analyze” then “Compare Means”
Choose “Paired samples t-test”
Choose the two IV conditions you are comparing. Put in “paired variables box.”
Paired Samples Test
Paired Differences
95% Confidence
Interval of the
Std. Error Difference
Mean Std. Deviation Mean Lower Upper t df Sig. (2-tailed)
Pair 1 PAINFREE - PLACEBO 7.0000 5.7009 2.5495 -7.86E-02 14.0786 2.746 4 .052
14. Factorial design.
Ans:
Factorial experiment is an experiment whose design consist of two or more
factor each with different possible values or “levels”.
FD technique introduced by “Fisher” in 1926.
Factorial design applied in optimization techniques.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
12 BIOSTATISTICS & RESEARCH METHODOLOGY
Factors can be “Quantitative” (numerical number) or they are qualitative.(They
may be names rather than numbers like Method 1, site B)
Factorial design depends on independent variables (factors) for development
of new formulation.
Ex: Factorial Design : 22 , 23 , 32 , 33
22 FD = 2 Factors , 2 Levels = 4 experimental conditions
23 FD = 3 Factors , 2 Levels = 2X2X2=8 runs
24 FD = 4 Factors , 2 Levels = 2X2X2X2=16 runs
25 FD = 5 factors , 2 Levels = 2X2X2X2X2=32 runs
35 FD = 5 factors , 3 Levels = 3X3X3X3X3=243 runs
A factorial design is one involving two or more factors in a single experiment.
So a 2x2 factorial will have two levels or two factors and a 2x3 factorial will
have three factors each at two levels.
++ +-
-+ --
15. Degrees of freedom.
Ans: Refer previous question answer.
16. Report writing in research study.
Ans:
A research report is: a written document or oral presentation based on a written
document that communicates the purpose, scope, objective(s), hypotheses,
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
13 BIOSTATISTICS & RESEARCH METHODOLOGY
methodology, findings, limitations and finally, recommendations of a research
project to others.
The following outline is the suggested format for writing the research report:
Title page
Summary of findings
Table of contents
List of tables
List of figures
Introduction
Background to the research problem
Objectives
Hypotheses
Methodology-Data collection
Sample and sampling method
Statistical or qualitative methods used for data analysis
Sample description
Findings
Results, interpretation and conclusions.
Any research report contains
1. descriptions on methodology,
2. results obtained,and recommendations made.
Types of Research Report:
Two types of reports:-
– Technical Report: suitable for a target audience of
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
14 BIOSTATISTICS & RESEARCH METHODOLOGY
researchers, research managers or other people familiar with and interested
in the technicalities such as research design, sampling methods, statistical
details etc.,
– Popular Report: suitable for:
A more general audience, interested mainly in the research findings as it is
non-technical in nature.
The writing style is designed to facilitate easy and rapid reading and understanding
of the research findings and recommendations.
17. Assumptions in chi square test.
Ans:
1. Only frequency data can be used.
2. One or more categories.
3. Independent observations.
4. Adequate sample size (at least 10).
5. Simple random sample.
6. All observations must be used
18. Confidence interval.
Ans:
19.Difference between ANOVA and student t test.
20. Characteristics of Normal distribution data.
Ans:
Refer previous question answer.
21. Applications of nonparametric tests.
Ans:
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
15 BIOSTATISTICS & RESEARCH METHODOLOGY
These statistical tests are serve as an alternative to parametric tests that can
be employed only if the underlying data satisfies certain criteria and
assumptions.
These statistical method are used when data sets are not normally
distributed or there is a skewed data.
The main reasons to apply the non-parametric tests are the underlying data
do not meet the assumptions about the population or analyzed data is
ordinal or nominal.
The main advantages of Non-parametric tests are as follows:
1. Non-parametric tests deliver accurate results even when the sample size is
small.
2. Non-parametric tests are suitable for all types of data such as nominal,
ordinal, interval or the data which has outliers.
3. These tests are more frequently applied than parametric tests when data are
not normally distributed.
4. These tests are easily understandable and have short calculations.
Limitations of Non-parametric tests:
1. Less efficient as compared to parametric tests.
2. The results may or may not provide an accurate answer because they are
distribution free.
The important non-parametric tests are
1. Sign tests
2. Wilcoxon Signed Rank sum test
3. Wilcoxon Rank sum test(Mann-Whitney U test)
4. Kruskal Wallis test
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
16 BIOSTATISTICS & RESEARCH METHODOLOGY
5. Friedman Test
6. Chi-Square test
22.Chi-Square test.
Ans:
Chi Square test is applied to those problems in which we study whether the
frequency with which a given events has occurred, it significantly different
from the one as expected theoretically.
The chi-square (I) test is used to determine whether there is a significant
difference between the expected frequencies and the observed frequencies
in one or more categories.
Do the numbers of individuals or objects that fall in each category differ
significantly from the number we would expect.
Chi Square test is the comparing proportions. If we wish to test the difference
of proportions (in terms of percentages) we could use chi-square test.
The test was first used in testing statistical hypothesis by Karl Pearson I the
year 1900. The symbol for chi-square is ꭕ2 not X2
It is defined as
Where
O is the Observed Frequency in each category
E is the Expected Frequency in the corresponding category
Df is the “degree of freedom” (n-1)
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
17 BIOSTATISTICS & RESEARCH METHODOLOGY
ꭕ2 is Chi Square
23. Power of a study.
Ans:
Refer previous question answer.
24. Confidence interval
Ans:
Refer previous question answer.
25. Probability.
Ans:
Probability is a number associated with an event that is intended to represent
its “likelihood,” “chance of occurring,” “degree of certainty,” and so on.
It is a numerical value of uncertainty. Probability is simply how likely something
is to happen.
Probability is concerned with numerical descriptions of how likely an event is to
occur, or how likely it is that a proposition is true.
Example:
1. There is 80% chance that India will win match.
2. There is 10% chance that price of gold will rise tomorrow.
3. There are 50 % chance then 3rd wave of corona virus will spread among
community.
Probability is the relative frequency (express in percentage) of an event.
26. Applications of SAS
Ans:
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
18 BIOSTATISTICS & RESEARCH METHODOLOGY
Applications of SAS?(Statistical Analysis Software)
Automated safety reporting
Producing patient profiles
Building clinical data management systems
Developing EDC(Electronic data capture) systems
Range-checking an entire database
Implementing CDISC models
Finding the right level of tolerance for clinical data acceptance
Randomization
27. Standard error of mean
Ans: Refer previous question answer.
28. Features of normal distribution pattern.
Ans:
Normal distribution have following characteristics:
1. The mean, median, and mode all have the same value (see ).
2. The curve is symmetric around the mean.
3. The kurtosis is zero.
4. The tails of the distribution get closer and closer to the x-axis as the values
move away from the mean, but the tails never quite touch the x-axis.
5. The distribution is completely defined by the mean and standard deviation.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
19 BIOSTATISTICS & RESEARCH METHODOLOGY
6. One standard deviation above and below the mean includes 68.26% of the
values in the population; two standard deviations above and below the mean
include 95.46% of the values, while three standard deviations include
99.73%.
29. Optimization techniques
Ans:
The term Optimize is “to make perfect”.
It is defined as follows: choosing the best element from some set of available
alternatives.
It is defined as a process or methodology of making something (a design, system,
or decision) as perfect, as functional, as effective as possible.
Good optimized process always gives the best product
Optimization is the success secrete of developing best product.
Optimization is the only difference between excellent formulation & ordinary
formulation.
Advantages:
1. Yield the “best solution” within the domain of study.
2. Require several experiments to achieve an optimum formulation.
3. Can trace and rectify “problem“ in a remarkably easier manner.
30. Report writing in research study.
Ans: Refer previous question answer
31. When is median more important than mean as a measure of central tendency
Ans:
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
20 BIOSTATISTICS & RESEARCH METHODOLOGY
Median is used more importantly than mean to represent the centre value in a
data set when the data are not symmetrical, (not normally distributed) for
instance the “skewed” distribution. The median is less affected by outliers and
skewed data.
32. Degrees of freedom.
Ans: Refer previous question answer
33. 22 and 23 designs.
Ans:
A factorial design is one involving two or more factors in a single experiment. So
a 2x2 factorial will have two levels or two factors and a 2x3 factorial will have
three factors each at two levels.
Such an experiment allows the investigator to study the effect of each factor on
the response variable, as well as the effects of interactions between factors on
the response variable.
22 FD = 2 Factors , 2 Levels = 4 experimental conditions
23 FD = 3 Factors , 2 Levels = 2X2X2=8 runs
Example:
Research carrying out an TLC experiment in order to separate the drug
constituents & get a good resolution spots.
2X2 factorial Design:
Only 2 factors are considered
Only 2 levels are considered
2x2=4 Runs are possible
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
21 BIOSTATISTICS & RESEARCH METHODOLOGY
Factors(only 2 factors) Result
Chamber Temp Ethanol in Mobile Resolution
0
C phase in %
Levels 30 55 ?
(Only two
50 55 ?
levels)
30 60 ?
50 60 ?
34. Power of a study.
Ans: Refer previous question answer.
35. Probability.
Ans: Refer previous question answer.
36. Applications of student’s’ t test.
Ans:
In inferential statistics t-test is applied in order to calculate the clinical study
results, when two groups with two sets of measurements are compared.
1. To compare the mean of a sample with population mean.
2. To compare the mean of one sample with the mean of another independent
sample.
3. To compare between the values (readings) of one sample but in 2 occasions.
37. Standard error of mean.
Ans: Refer previous question answer.
38. One tailed and two tailed tests.
Ans: Refer previous question answer.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
22 BIOSTATISTICS & RESEARCH METHODOLOGY
39. Applications of non-parametric tests.
Ans: Refer previous question answer.
40. Confidence interval.
Ans: Refer previous question answer.
41. Pharmaceutical examples of optimization techniques.
Ans: Refer previous question answer.
42. Characteristics of normal distribution.
Ans: Refer previous question answer.
43. Standard error of mean.
Ans: Refer previous question answer.
44. Histograms.
Ans:
This is a graphical method for displaying the shape of data frequency
distribution.
It is particularly useful when there is large number of observations& it is used
for quantitative data.
The histogram, sometimes known as a bar graph, is one of the most popular
ways of presenting and summarizing data.
Variable characters of different groups are indicated on the X-axis while
frequency i.e. number if observations is marked on the vertical line (y-axis).
Class intervals for histograms should be of equal width.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
23 BIOSTATISTICS & RESEARCH METHODOLOGY
When the intervals are of equal width, the height of the bar is proportional to
the frequency of observations in the interval.
Eight to twenty equally spaced intervals usually are sufficient to give a good
picture of the data distribution.
Example
Histogram refers to the visual presentation used for summarizing the discrete
or the continuous data.
45. Report writing in research study.
Ans: Refer previous question answer.
46. Wilcoxon Rank Sum test.
Ans: Refer previous question answer.
47. Differentiate between sample and population parameter.
Ans:
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
24 BIOSTATISTICS & RESEARCH METHODOLOGY
Population Parameter Sample
A numerical feature of the A statistics is a numerical value
entire population is called
obtained from the sample
parameter
observation.
The true value of a parameter Statistical value can be calculated by
the sample mean, standard deviation
is an unknown constant.
etc as they are obtained from the
sample observations
48. Power of study.
Ans: Refer previous question answer.
49. Descriptive and interferential statistics.
Ans:
Descriptive Statistics:
Descriptive Statistics is used to present, organize, and summarize data.
Descriptive statistics are usually considered a very basic way to present
data.
Mean, median, mode, and standard deviation are all types of numerical
representation of descriptive statistics.
Inferential Statistics
Inferential statistics provide the ability to generalize the results/out come of
the study from the sample study to the appropriate population.
50. Classification of clinical study designs.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
25 BIOSTATISTICS & RESEARCH METHODOLOGY
CLINICAL STUDY DESIGN
Analyticall Descriptive
Community Survey
Experimental Nonexperimental
Cohort Case control Cross sectional
Study Study Study
Randomized
Clinical Trial Non randomized
Clinical Trial
51. Factorial design.
Ans: Refer previous question answer.
52. Power of study
Ans: Refer previous question answer.
53. Confidence interval
Ans: Refer previous question answer.
54. Define blinding in clinical study.
Ans:
Treatment blinding or masking is an effective way to increase the objectivity of
the clinical research outcomes.
When the treatments are masked, the bias or expectations of the observer are
not likely to influence the measurements taken.
Blinding is intended to minimize the potential bias.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
26 BIOSTATISTICS & RESEARCH METHODOLOGY
Treatment blinding is classified in to single blinding, double blinding, triple
blinding and un-blinding.
Types of blinding in a clinical study:
1. Single blind
The patients on the study are unaware of the treatment they receive.
2. Double blind
Both the patient and the investigator responsible for assessing the outcome,
are unaware of which treatment is being administered because investigators
can also be influenced by their expectations.
3. Triple blind study:
Some randomized controlled trials are considered triple-blinded, although
the meaning of this may vary according to the exact study design.
The most common meaning is that the subject, researcher and person
administering the treatment (often a pharmacist) are blinded.
Or patient, researcher and statistician are blinded.
4. Un blinding: Individuals may only be unblinded in the event of a serious or
alarming adverse event and even then, only if the un blinding is considered
by the investigator or the clinical monitor to be clinically necessary for the
optimum care of the patients.
55. Differentiate SD and SEM.
Ans:
Sl. SD SEM
No
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
27 BIOSTATISTICS & RESEARCH METHODOLOGY
1 The standard deviation is a measure of The standard error of the mean
the spread of scores within a set of data (SEM) is a measure of the
variability of the mean.
2 Standard deviation value is greater Standard error of mean value is
than the S.EM smaller than the S.D
3 SD is more concerned with making SEM is more concerned with
inferences about how far each making inferences about the mean
individual data spread from mean of a distribution rather than with
value. individual values.
x x
2
Sample standard deviation s
n 1
56. Difference between nominal and ordinal type of data.
Ans:
Sl No Nominal data Ordinal data
1 Qualitative variables are Ordinal data is type of quantitative data is
measured on a nominal measured in terms of order, rank or scaling.
scale.
EX: Hair color, eye color,
religion, favorite movie,
gender
2 They can be coded to Ordinal data that only show relationship to
appear numeric, but their each other in order, but they do not give us
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
28 BIOSTATISTICS & RESEARCH METHODOLOGY
numbers are meaningless, actual values to tell how much difference there
as example Male: 1, is present among order.
Female: 2
57. Define scatter plots.
Ans:
A scattered plot reveals relationships or association between two variables. A
scatter plot is a plot of the values of Y versus the corresponding values of X.
Vertical axis: variable Y-usually the response variable.
Horizontal axis: Variable X-usually some variable may be related to the
response.
Scattered plots can provide answer of the following questions:
1. Are variable X and Y related?
2. Are variable X and Y linearly related?
3. Are variable X and Y non-linearly related?
4. Does the variation in Y change depending on X.?
5. Are these outliers.
Example: Linear relationship between X and Y variables.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
29 BIOSTATISTICS & RESEARCH METHODOLOGY
58. p-value
Ans:
If the observed difference between the treatment groups is significantly
high, enough for us to reject the null hypothesis, we say that our results are
statistically significant. Statistically significant is measured by a quantity
called the P-Value.
If the statistical test results in rejection of the null hypothesis, we say that,
the difference is significant at α level.
If P Value < α, then reject the null Hypothesis
If P value ≥ α, then do not reject the null hypothesis.
1. When p value > .10 → the observed difference is “not significant”
2. When p value ≤ .10 → the observed difference is “marginally significant”
3. When p value ≤ .05 → the observed difference is “significant”
4. When p value ≤ .01 → the observed difference is “highly significant”
59. Mann Whitney U tests.
Ans:
Mann Whitney U tests is used to calculate the study results when
observations have been obtained from two independent groups.(parallel
study design)
This test is an alternative to the two independent sample t test.(un paired t-
test)
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
30 BIOSTATISTICS & RESEARCH METHODOLOGY
Mann–Whitney U test, is a nonparametric test of the null hypothesis, for
randomly selected values X and Y from two populations.
The test compares two independent group populations.
The Wilcoxon rank sum test is applicable if the data are at least ordinal (i.e.,
the observations can be ordered).
It tests the equality of the distributions of the two treatments.(median)
Under the null hypothesis H0, the variables of both populations are equal.
(i.e. that they both have the same median).
The alternative hypothesis H1 is that the variables of both populations are
not equal.
Assumptions for the Mann Whitney U Test
6. The Wilcoxon rank sum test is applicable if the data are at least ordinal.
7. The independent variable should be two independent, categorical groups.
8. Observations should be independent.
9. Observations are not normally distributed.
10. Samples size is small.
Calculation for Wilcoxon Rank Sum Test:
1. First, the observations from both groups are pooled and ranked.
2. Identical observations are given a rank equal to the average of the ranks.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
31 BIOSTATISTICS & RESEARCH METHODOLOGY
3. In this procedure, the signs of the observations are taken into account for
ranking. For example, a value of -1 has a lower rank than 0.5, which has a
lower rank than 1. After ranking the pooled data, the observations are
returned to their respective treatment groups. The observations are then
replaced by their corresponding ranks.
4. The sum of the ranks of the smaller sample is the basis for the statistical test.
60. Advantages of Design space
61. Explain one way analysis of variance.
Ans:
One way ANOVA:
The one-way ANOVA is used to find out the statistically significant difference
between the mean of more than two independent groups. (parallel study
design).
The typical null hypothesis is H0: µ1=µ2=µ3, where µ1 refers to treatment 1
and so on. H1: µ1≠µ2 ≠ µ3
Assumptions
1. Normal distribution
2. Sample independent
3. Data either interval or ratio type
ANOVA formula
F= BMS
WMS
Where
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
32 BIOSTATISTICS & RESEARCH METHODOLOGY
F= ANOVA Coefficient
MST: Mean sum of square between treatment
MSE: Mean sum of square due to error
Lists of Steps for ANOVA:
1. State hypothesis
2. Calculate d.f
3. Calculate sum of squared deviations
4. Calculate mean square
5. Calculate F statistics
6. Compare F value
7. State the conclusions
62. Classification of clinical study designs
Ans: Refer previous question answer.
63. Power of study.
Ans: Refer previous question answer.
64. Define coefficient of variation.
Ans:
Coefficient of variation/Relative measure of Dispersion/Relative standard
deviation:
The standard deviation is useful as a measure of variation within a given set
of data. When one desires to compare the dispersion in two sets of data,
however, comparing two standard deviation may lead to fallacious results.
In these situations, there is measure of relative variation rather than
absolute variation. The variability of the data may often be better described
as a relative variation rather than as an absolute variation such as that
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
33 BIOSTATISTICS & RESEARCH METHODOLOGY
represented by the standard deviation or range. One common way of
expressing the variability, which takes in to account its relative magnitude, is
the ratio of the standard deviation to the mean. This ratio often expressed
as a percentage is called the coefficient of variation (C.V) or RSD.
This CV is useful in comparing the relative difference in variability between
two or more samples or determining which group has the largest relative
variability of values from the mean.
CV = Standard deviation ×100.
Mean
65. Comparison of means between three or more distinct/independent groups
which parametric and non-parametric test can be used in inferential statistics?
Ans:
Parametric test: One way-Analysis of variance (ANOVA)
Non parametric test: Kruskal-Wallis test.
66. Sign test.
Ans:
The sign test is probably the simplest of the nonparametric tests. The sign
test is a test of the equality of the medians of two comparative groups.
This test is used for paired data with an underlying continuous distribution,
and can be applied to ranked or higher level data such as continuous interval
and ratio-type data.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
34 BIOSTATISTICS & RESEARCH METHODOLOGY
The pairs are matched, and differences of the measurements for each pair
tabulated. The differences are then categorized only with regard to the sign
of the difference.
If positive and negative signs are observed to occur with approximately equal
frequency, we can conclude that the treatments have a similar median.
If either positive (+) or negative (_) signs predominate, there is evidence that
one treatment has a higher median than the other.
67. Pearson’s Correlation
Ans:
Correlation helps to determine whether there is an association between
two variables and also indicate the strength of the association.
Two variables are said to be correlated if the change in one variable there
will change in other variable.
For example:
If we could predict the dissolution of tablet based on tablet hardness, we
say that dissolution and hardness are correlated.
Pearson’s Correlation
The measure of the strength (or degree or magnitude) of the relationship is
the correlation coefficient, often referred to as Pearson correlation
coefficient which is often erroneously interpreted as a measure of linearity.
Pearson Correlation Coefficient
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
35 BIOSTATISTICS & RESEARCH METHODOLOGY
Correlation coefficient “r” allow us to state mathematically the relationship
that exists between two variables.
The correlation coefficient may range from -1 through 0.00 to +1.00.
If r is not equal to 0, some relationship exists.
The value of r is important in determining the strength of the relationship
A positive 1 depicts a perfect positive linear relationship, indicating that as one
variable changes the other changes in the same direction.
Likewise, a negative 1 indicates a perfect negative linear relationship in which
as one variable changes the other changes in an inverse fashion.
If r will be equal to 0, meaning that there is no correlation or linear relationship.
68. Standard Error of Mean
Ans: Refer previous question answer.
69. Advantages of Data visualization methods
Ans:
Advantages:
1. Figure or graphical illustration is far more efficient in presenting the evidences
for the conclusion rather than a long statement in the text.
2. Graphical presentation helps to condense the bulky data.
3. Easy to understand by everybody.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
36 BIOSTATISTICS & RESEARCH METHODOLOGY
4. Better analysis.
5. Quick action.
6. Identifying patterns.
7. Finding errors.
70. Central composite design:
Ans:
It is also called as Box-Wilson central composite design.
The CCD model is an integral part of response surface methodology.
Central composite design (CCD) was applied as an optimization tool of Response
Surface Method.
The CCD involves 2n factorial runs, 2n axial runs and nc center runs.
The total experimental runs conducted are computed by Eq
N=2n + 2n + nc
where n is number of independent variables (factors)
nc is number of center points.
N is the overall total of experimental runs.
In CCD, statistical analysis ANOVA is used to determine P value (statistical
significance) .Good CCD should have P value less than 0.05 or 5%.
If p value is <5%, that indicates experimental values are perfectly matched with
predicted values.
71. Define bias in clinical study.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
37 BIOSTATISTICS & RESEARCH METHODOLOGY
Ans:
It is a systematic error which leads to deviation of results from the true values.
Bias may occur in the way we select our patients, measure or out comes or
analyze our data which will lead to results that are inaccurate.
Types of bias: Bias may be classified as
Selection bias, Patients bias, Measurement bias, Investigators bias, analysis bias,
Statistician bias.
72. Role of sample size in calculation of confidence interval
Ans: Refer previous question answer.
73. Characteristics of normal distribution
Ans: Refer previous question answer.
74. Advantages and disadvantages Pie charts.
Ans:
Advantages:
Constructions of these charts are relatively simple.
Pie charts are effective for displaying the relative frequencies of a small number
of categories.
Disadvantages Pie charts:
No more than 6 sectors should be used.
It is not always easy to differentiate two segments that are reasonably close in
size in pie charts.
75. Explain: Range, Interquartile range and Variance
Ans:
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
38 BIOSTATISTICS & RESEARCH METHODOLOGY
Range: The range is the difference between the highest and lowest scores in a data
set and is the simplest measure of spread. So we calculate range as:
Range = maximum value - minimum value
Interquartile range: The interquartile range (IQR) is a measure of variability,
based on dividing a data set into quartiles. Quartiles divide a rank-ordered data set
into three equal parts. The values that divide each part are called the first, second,
and third quartiles; and they are denoted by Q1, Q2, and Q3, respectively.
Q1 is the "middle" value in the first half of the rank-ordered data set.
Q2 is the median value in the set.
Q3 is the "middle" value in the second half of the rank-ordered data set.
The interquartile range is equal to Q3 minus Q1.
For example, consider the following numbers: 1, 3, 4, 5, 5, 6, 7, 11. Q1 is the middle
value in the first half of the data set. Since there are an even number of data points
in the first half of the data set, the middle value is the average of the two middle
values; that is, Q1 = (3 + 4)/2 or Q1 = 3.5. Q3 is the middle value in the second half
of the data set. Again, since the second half of the data set has an even number of
observations, the middle value is the average of the two middle values; that is, Q3
= (6 + 7)/2 or Q3 = 6.5. The interquartile range is Q3 minus Q1, so IQR = 6.5 - 3.5 =
3.
Variance: The variance is a numerical value used to indicate how widely individuals
in a group vary. If individual observations vary greatly from the group mean, the
variance is big; and vice versa.
Variance is defined by the following formula: (σ2) = Σ (xi - µ)2/N
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
39 BIOSTATISTICS & RESEARCH METHODOLOGY
76. Standard Error of Mean
Ans: Refer previous question answer.
77. Control Space
78. Inclusion & exclusion criteria
Ans:
Inclusion & exclusion criteria
The factors that allow someone to participate in a clinical trial are called
inclusion criteria and those that disallow someone from participating are
called exclusion criteria.
Using inclusion and exclusion criteria is an important principle of medical
research that helps to produce reliable results.
These criteria are based on such factors as age, gender, the type and stage
of disease, previous treatment history and other medical conditions. Before
joining a clinical trial a participant must qualify for the study.
79. Define histogram
Ans: Refer previous question answer.
80. Define discrete and continuous variables.
Ans:
Discrete variables:
Variables such as number of children in a household are called discrete variables
since the possible scores are discrete points on the scale. For example, a household
could have three children or six children.
Continuous variables:
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
40 BIOSTATISTICS & RESEARCH METHODOLOGY
Variables such as "time to respond to a question" are continuous variables since
the scale is continuous and not made up of discrete steps. The response time could
be 1.64 seconds.
81. Pie charts.
Ans:
Pie chart or Sector diagram
Pie charts are popular ways of presenting categorical data (discrete data) like
blood groups, age groups, causes of mortality, social groups in a population.
The frequencies of the groups are shown in circle.
Degrees of angle denote the frequency & area of the sector.
It gives comparative difference at a glance.
As a rule of thumb, no more than 6 sectors should be used.
Pie charts are effective for displaying the relative frequencies of a small
number of categories.
Disadvantages:
1. An important point about pie charts is if they are based on a small number of
observations, it can be misleading to label the pie slices with percentages. In this
case, it is better to alert the user of the pie charts to the actual number involved.
2. Another problem with pie charts is that it is not always easy to differentiate two
segments that are reasonably close in size.
The circle (or pie) represents 100%, or all of the results.
For example, the pie chart represents of the visits to the physician by different
category.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
41 BIOSTATISTICS & RESEARCH METHODOLOGY
82. Types of correlation.
Ans:
Positive Correlation:
When two variables move in the same direction then the correlation between
these two variables is said to be Positive Correlation.
When the value of one variable increases, the value of other value also increases
at the same rate.
For example the training and performance of employees in a company.
Negative Correlation:
In this type of correlation, the two variables move in the opposite direction.
When the value of a variable increases, the value of the other variable decreases.
For example, the relationship between price and demand.
Perfect Positive Correlation:
When there is a change in one variable, and if there is equal proportion of change
in the other variable say Y in the same direction, then these two variables are said
to have a Perfect Positive Correlation.
Perfectly Negative Correlation:
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
42 BIOSTATISTICS & RESEARCH METHODOLOGY
Between two variables X and Y, if the change in X causes the same amount of
change in Y in equal proportion but in opposite direction, then this correlation is
called as Perfectly Negative Correlation.
Zero Correlation
When the two variables are independent and the change in one variable has no
effect in other variable, then the correlation between these two variable is known
as Zero Correlation.
83. What is Control Space
Ans:
84. Difference between ANOVA and student t-test.
Ans:
t-test ANOVA
T-test is a statistical method of ANOVA is a statistical
analyzing data from designed method of analyzing data
experiments, whose objective from designed
is to compare two sample experiments, whose
group means. objective is to compare
two or more sample group
In inferential statistics t-test is
means.
applied in order to calculate
the clinical study results, when ANOVA is used to compare
two groups with two sets of mean values when more
measurements are compared. than two groups are
involved in the clinical
study
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
43 BIOSTATISTICS & RESEARCH METHODOLOGY
t-test is applied for small sample ANOVA is used for large sample
size (up to 30 ) size.
85. What factors qualifies mode to be the best measure of central tendency?
Ans:
The mode is the least used of the measures of central tendency and can only
be used when dealing with nominal data/categorical data. For this reason,
the mode will be the best measure of central tendency (as it is the only one
appropriate to use) when dealing with nominal data.
Normally, the mode is used for categorical data where we wish to know
which the most common category.
The mean and/or median are usually preferred when dealing with all other
types of data but this does not mean that mode is never used with these data
types.
86. Define α and β error.
Ans:
Type I error (or α error)
Reject H0 when it is true. This first type of error is called the Type I error.
Type I error results in incorrectly claiming that the test product is superior
to the reference, when in reality, there is no difference between the two
products.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
44 BIOSTATISTICS & RESEARCH METHODOLOGY
Type I error is also called as the public risk.
Type II error (or β error)
Fail to reject H0 when it is false. This second type of error is called the Type
II error.
Type II error results in incorrectly failing to reject the null hypothesis, when
in reality, the test product is indeed superior to the reference.
Type II error is called as sponsor’s risk.
87. Degree of freedom.
Ans: Refer previous question answer
88. Classify observational and experimental studies.
Ans:
89. What is interventional study?
Ans:
In an intervention study, the subjects are selected from one population with a
particular characteristic present; then, immediately after baseline, the total study
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
45 BIOSTATISTICS & RESEARCH METHODOLOGY
group is split up into a group that receives the intervention and a group that does
not receive that intervention (control group). The comparison of the outcomes of
the two groups at the end of the study period is an evaluation of the intervention.
For instance, smokers can be divided into those who will be subject to a smoking-
cessation program and those who will not be motivated to stop smoking.
Interventions have the intention to improve the condition of an individual or a
group of individuals.
Examples
(a)to promote a healthier lifestyle (avoiding smoking, reducing alcohol drinking,
increasing physical activity, etc.),
(b) to prevent HIV-transmission,
90. List the characteristics of observational studies.
Ans:
The population is observed without any interference by the investigator.
Individuals can be observed prospectively, retrospectively, or currently.
Treatment and exposures occur in a “noncontrolled” environment.
Why Observational studies
6. Cheaper
7. Faster
8. Can examine long-term effects
9. Hypothesis-generating
10. Sometimes, experimental studies are not ethical
91. Define coefficient of variation.
Ans: Refer previous question answer
92. Characteristics of normal distribution.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
46 BIOSTATISTICS & RESEARCH METHODOLOGY
Ans: Refer previous question answer
93. Define semi logarithmic plots.
Ans:
Several important kinds of experiments in the pharmaceutical sciences result in
data such that the logarithm of the response (Y) is linearly related to an
independent variable, X.
The semi logarithmic plot is useful when the response (Y) is best depicted as
proportional changes relative to changes in X, or when the spread of Y is very large
and cannot be easily depicted on a rectilinear scale.
Semilog graph paper has the usual equal interval scale on the X axis and the
logarithmic scale on the Y axis.
For example, the distance between 1 and 10 will exactly equal the distance
between 10 and 100 on a logarithmic scale. In particular, first-order kinetic
processes, often apparent in drug degradation and pharmacokinetic systems, show
a linear relationship when log C is plotted versus time.
94. Application of Post Hoc tests
Ans:
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
47 BIOSTATISTICS & RESEARCH METHODOLOGY
95. Type I and Type II errors in hypothesis testing.
Ans: Refer previous question answer
96. Design Space
Ans:
97. Degrees of freedom.
Ans: Refer previous question answer
98. Define surrogate & direct end point.
Ans:
Direct end point:
A direct endpoint is one that measures the therapeutic effect directly.
Surrogate end point:
A surrogate endpoint is one that is measured in place of the biologically
definitive or clinically most meaningful (direct) endpoint. It tracks the
progress or extent of the disease after direct end point is calculated.
Investigators choose a surrogate when the definitive endpoint is inaccessible
due to cost, time or difficulty of measurement. A good surrogate endpoint
can be measured relatively simple and without invasive procedures. Thus
using a surrogate endpoint would enable a more rapid evaluation of the
treatment effect, resulting in trials with smaller sample size and lead to
accelerated approvals.
99. Relationship between sample size and power of the study.
Ans:
The larger the sample size in the clinical study, the larger is the power of the
study.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY
48 BIOSTATISTICS & RESEARCH METHODOLOGY
A large sample size will be needed in order to have strong power to detect a small
difference.
Smaller sample size, larger is the variability (s.d.)in the clinical study and the
weaker is the power.
VIII SEM B.PHARM PES COLLEGE OF PHARMACY