MATHEMATICAL CONCEPTS IN
STATISTICAL ANALYSIS
An overview of key statistical concepts and applications.“
"
PRESENTED BY:MOHAMMED SHAHDAB
CLASS: 3 ‘B’
BRANCH:ISE
DATE:2 DECEMBER,2024
TOPICS
1. Probability Distribution
2. Joint Probability Distribution
3. Markov Chain
4. Sampling Distribution
5. Standard Error and Level of Significance
6. Central Limit Theorem (CLT)
7. Confidence Intervals
8. Analysis of Variance (ANOVA)
9. Completely Randomized Design (CRD)
10. Randomized Block Design (RBD)
1. Probability Distribution
A probability distribution is a statistical function that
describes the likelihood of various outcomes in an
experiment. It provides the probabilities of the
occurrence of different possible values of a random
variable.
Types of Probability Distributions:
Discrete Probability Distribution: Deals with discrete
random variables (e.g., Binomial, Poisson distributions).
Continuous Probability Distribution: Deals with continuous
random variables (e.g., Normal, Exponential
distributions).
Example Problem:
Binomial Distribution
Suppose a coin is tossed 10 times, and we are interested in finding the
probability of getting exactly 6 heads.
Given:
Number of trials (n) = 10
Number of successes (k) = 6
Probability of success on a single trial (p) = 0.5
Probability of failure (q) = 1 - p = 0.5
The binomial probability is given by the formula:
P(X=k)=(nk)pk(1−p)n−K
Solution:
P(X=6)=(10 6)(0.5)6(0.5)4=210×0.510=0.205 Thus, the probability of getting
exactly 6 heads in 10 tosses is 0.205.
2. Joint Probability Distribution
Joint probability distribution describes the probability of two or more random
variables occurring at the same time. The joint probability of two events is
denoted as P(A∩B)P(A \cap B)P(A∩B).
Example Problem:
Let’s consider two dice rolled. What is the probability that the sum of the dice is
7, and one die shows a 4?
Total possible outcomes when two dice are rolled = 36 (6 faces on each die).
Favorable outcomes for a sum of 7 with one die showing 4:
(4, 3)
Thus, there is only one favorable outcome:
P(sum is 7 and one die shows 4)=1/36
3. Markov Chain
A Markov chain is a stochastic process where the future
state depends only on the current state and not on the
previous history. This is known as the memoryless
property.
Transition Matrix:
A Markov chain is defined by a transition matrix PPP,
where each element Pij represents the probability of
transitioning from state i to state j.
Example Problem:
Consider a simple two-state Markov chain
with the following transition matrix:
P=[0.8 0.3]
[0.2 0.7]
This matrix represents the probability of transitioning
between two states A and B. State A has a 0.8
probability of staying in state A and a 0.2 probability of
transitioning to state B, while state B has a 0.3 probability
of transitioning to state A and a 0.7 probability of staying
in state B.
We can use this matrix to predict future states by
multiplying the state vector by the transition matrix.
4. Sampling Distribution
A sampling distribution describes the distribution of a
sample statistic (e.g., sample mean, sample variance)
based on multiple samples from a population.
Key Concepts:
Central Limit Theorem: As the sample size increases, the
sampling distribution of the sample mean will approach
a normal distribution, regardless of the shape of the
population distribution.
Standard Error: The standard deviation of a sampling
distribution.
Example Problem:
Suppose we take a sample of 30 students from a population of students
with a mean height of 65 inches and a standard deviation of 4 inches.
What is the standard error of the mean?
Given:
Population mean μ=65\mu = 65μ=65 inches
Population standard deviation σ=4\sigma = 4σ=4 inches
Sample size n=30n = 30n=30
The standard error (SE) is given by:
SE=σ/sqrt.n=4/root 30=0.73 inches
Thus, the standard error of the sample mean is 0.73 inches.
5. Standard Error and Level of
Significance
The standard error (SE) measures the
variability of a sample statistic and is used
in hypothesis testing.
The level of significance (α) is the
threshold for rejecting the null hypothesis.
Common levels are 0.05 (5%) and 0.01
(1%).
Example Problem:
A new drug is being tested for effectiveness. The sample mean blood
pressure reduction is 10 mmHg with a sample standard deviation of 4
mmHg. The null hypothesis is that the drug has no effect, so the population
mean is 0 mmHg.
Test the hypothesis at the 5% significance level.
Given:
Sample mean xˉ=10
Population mean μ=0
Sample standard deviation s=4
Sample size n=25
We calculate the test statistic using the formula for the t-statistic:
t=xˉ−μ/s/(rootn) =10-0/4/sqrt.25=10/0.8=12.5
Now, compare this value to the critical t-value from the t-distribution for 24
degrees of freedom (df = n - 1) at α = 0.05. The critical value for a one-
tailed test is approximately 1.711.
Since 12.5 > 1.711, we reject the null hypothesis and conclude that the
drug has a significant effect on blood pressure reduction.
6. Central Limit Theorem (CLT)
The Central Limit Theorem states that the distribution of the sample mean
approaches a normal distribution as the sample size increases, regardless of
the shape of the population distribution.
Example Problem:
Suppose you have a population with a mean of 50 and a standard
deviation of 10. You take a sample of 25. What is the distribution of the
sample mean?
By the CLT, the sampling distribution of the sample mean will be
approximately normal with:
Mean μxˉ=μ=50
Standard error SE=σ/root(n) =10/root(25)=2
Thus, the sample mean distribution is normal with a mean of 50 and a
standard error of 2.
7. Confidence Intervals
A confidence interval is a range of values that is likely to contain the population
parameter, based on the sample statistic.
Example Problem:
We want to estimate the mean weight of apples in a farm. A sample of 50
apples has a mean weight of 200 grams and a standard deviation of 15 grams.
Construct a 95% confidence interval for the population mean.
Given:
Sample mean xˉ=20
Sample standard deviation s=15 grams
Sample size n=50
The 95% confidence interval Is:1.96
Thus, the 95% confidence interval is (195.84,204.16)
8. Analysis of Variance (ANOVA)
ANOVA is a statistical method used to test differences between two or
more group means. It tests the null hypothesis that all group means are
equal.
Example Problem:
Three groups of students are given different study methods, and their test
scores are compared. The data is summarized as follows:
Group Mean Variance Sample size
A 85 16 10
B 90 9 12
C 80 25 8
Perform an ANOVA test at α=0.05
9. Completely Randomized Design
(CRD)
Completely Randomized Design is an experimental design where each
treatment is assigned randomly to the subjects.
Example Problem:
We test three fertilizers on plant growth with 30 plants. The plants are
randomly assigned to one of three fertilizer groups. We use ANOVA to test if
there is a significant difference in growth across the three fertilizers.
10. Randomized Block Design (RBD)
Randomized Block Design (RBD) is a type of experimental design where the
experimental units are first grouped into blocks (groups that are similar to
each other), and within each block, treatments are randomly assigned. This
design helps control for variability within blocks, which can improve the
precision of the results.
In RBD, the main objective is to eliminate or reduce the variation caused by
extraneous variables (known as blocking factors) that are not of primary
interest but could affect the outcome. By grouping similar experimental
units together, you can isolate the effect of the treatment and reduce the
"noise" in your results.
Key Concepts:
1. Blocking:
Blocking is a technique where the variability among experimental units is
grouped into blocks that are more homogeneous or similar. For example, in
agricultural experiments, blocks might be based on soil type, lighting
conditions, or time of day. This allows for the reduction of variability
between blocks, focusing on the effect of the treatment within each block.
2. Randomization:
After grouping the experimental units into blocks, treatments are randomly
assigned to the experimental units within each block. This ensures that the
effects of treatments are independent and unbiased by any external factor
not controlled by blocking.
3.Treatment:
The different conditions or factors you are testing in the experiment, such as
different types of fertilizers, diets, or drugs, that are applied to the
experimental units within each block.
4. Replicates:
Replication refers to applying the treatments to multiple experimental units
within each block. Replicates increase the reliability of the conclusions
drawn from the experiment.
THANK YOU