0% found this document useful (0 votes)

8 views18 pages

Hypothesis Testing Notes (EDA)

Uploaded by

christianangelo.adlawan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views18 pages

Hypothesis Testing Notes (EDA)

Uploaded by

christianangelo.adlawan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Hypothesis Testing

Overview

Hypothesis testing is a statistical method used to determine whether a hypothesis about a

population parameter is true or false.

Hypothesis Testing Process

1. Formulate the null and alternative hypotheses: Define the null hypothesis (H0) and
alternative hypothesis (H1).

2. Choose a significance level (α): Typically 0.05.

3. Select a sample and calculate the test statistic: Calculate the sample mean, proportion, or
other relevant statistic.

4. Determine the critical region: Identify the region where the null hypothesis is rejected.

5. Calculate the p-value: Probability of observing the test statistic under H0.

6. Make a decision: Reject H0 if p-value < α or test statistic falls within the critical region.

7. Interpret the results: Discuss implications of accepting or rejecting H0.

Types of Hypothesis Tests

1. Z-test: For large samples, comparing means or proportions.

2. T-test: For small samples, comparing means.

3. Chi-squared test: For categorical data, testing independence or goodness-of-fit.

4. ANOVA: Comparing means across multiple groups.

5. Regression analysis: Testing relationships between variables.

Common Hypothesis Testing Formulas

1. Z-score: Z = (X̄ - μ) / (σ / √n)

2. T-statistic: t = (X̄ - μ) / (s / √n)

3. Chi-squared statistic: χ² = Σ [(observed - expected)² / expected]

4. p-value: Calculated using statistical software or tables.

Interpretation of Results

1. Reject H0: Significant difference/effect exists (p-value < α).

2. Fail to reject H0: No significant difference/effect (p-value ≥ α).

3. Type I error: Rejecting H0 when true (α).

4. Type II error: Failing to reject H0 when false (β).

Statistical Inference of Two Samples

Overview

Statistical inference for two samples involves comparing the characteristics of two groups to
determine if there's a significant difference between them.

Types of Two-Sample Tests

1. Independent Samples T-Test: Compares means of two independent groups.

2. Paired Samples T-Test: Compares means of two related groups (e.g., before-after).

3. Mann-Whitney U Test: Compares medians of two independent groups (non-parametric).

4. Wilcoxon Signed-Rank Test: Compares medians of two related groups (non-parametric).

5. Chi-Squared Test: Compares proportions of two independent groups.

Assumptions

1. Independence: Samples are randomly selected and independent.

2. Normality: Data follows a normal distribution (for parametric tests).

3. Equal Variances: Variances are equal across groups (for parametric tests).

Hypothesis Testing

1. Null Hypothesis (H0): No significant difference between groups.

2. Alternative Hypothesis (H1): Significant difference between groups.

3. Significance Level (α): Typically 0.05.

4. Test Statistic: Calculated value (e.g., t-statistic, z-score).

5. p-value: Probability of observing the test statistic under H0.

Interpretation

1. Reject H0: Significant difference between groups (p-value < α).

2. Fail to reject H0: No significant difference between groups (p-value ≥ α).

3. Type I Error: Rejecting H0 when true (α).

4. Type II Error: Failing to reject H0 when false (β).

Common Formulas

1. Independent Samples T-Test: t = (X̄ 1 - X̄ 2) / sqrt((s1^2/n1) + (s2^2/n2))

2. Paired Samples T-Test: t = (X̄ d) / (sd / sqrt(n))

3. Mann-Whitney U Test: U = n1n2 + (n1(n1+1)/2) - R1

Problem Example
Coffee Shop Owner's Claim

A coffee shop owner claims that the average amount of coffee consumed per customer is 250ml.
A random sample of 36 customers shows an average consumption of 270ml with a standard
deviation of 50ml. Test the claim at a 5% significance level.

Given Values

1. Sample size (n): 36

2. Sample mean (X̄ ): 270ml

3. Sample standard deviation (s): 50ml

4. Population mean (μ): 250ml (claimed)

5. Significance level (α): 0.05

Hypothesis

1. H0: μ = 250 (null hypothesis)

2. H1: μ ≠ 250 (alternative hypothesis)

Calculation

1. Calculate the test statistic (t): t = (X̄ - μ) / (s / √n) t = (270 - 250) / (50 / √36) t = 20 / 8.33
t ≈ 2.4

2. Determine the degrees of freedom: df = n - 1 df = 36 - 1 df = 35

3. Find the critical t-value: Using a t-distribution table or software, find the critical t-value
for α = 0.05 and df = 35. t-critical ≈ ±2.030

4. Calculate the p-value: Using software or a t-distribution table, find the p-value associated
with t ≈ 2.4 and df = 35. p-value ≈ 0.023

Interpretation

Since the calculated t-value (2.4) exceeds the critical t-value (2.030), and the p-value (0.023) is
less than α (0.05), we:

1. Reject H0: The average amount of coffee consumed per customer is significantly
different from 250ml.

2. Conclude: The coffee shop owner's claim is incorrect. The actual average consumption is
likely higher than 250ml.

Software Output
Here's an example output from R: t.test(x = c(270),mu = 250, sd = 50, n = 36) One Sample t-test
data: c(270) t = 2.4, df = 35, p-value = 0.023 alternative hypothesis: true mean is not equal to 250
95 percent confidence interval: 253.411 286.589 sample estimates: mean of x 270 This output
confirms our manual calculations.

One-Sided vs Two-Sided Hypothesis Tests

One-Sided Hypothesis (Directional Test)

1. Alternative hypothesis (H1) is directional: Specifies the direction of the difference or

relationship.

2. Null hypothesis (H0) is opposite: States the absence of the specified effect or difference.

3. Tested in one direction: Only one tail of the distribution is considered.

4. Example: H0: μ ≤ 10, H1: μ > 10 (testing if the mean is greater than 10)

Two-Sided Hypothesis (Non-Directional Test)

1. Alternative hypothesis (H1) is non-directional: Doesn't specify the direction of the

difference or relationship.

2. Null hypothesis (H0) states equality: States the absence of any difference or relationship.

3. Tested in both directions: Both tails of the distribution are considered.

4. Example: H0: μ = 10, H1: μ ≠ 10 (testing if the mean is different from 10)

Key Differences

1. Directionality: One-sided tests have a directional alternative hypothesis, while two-sided

tests have a non-directional alternative hypothesis.

2. Null hypothesis: One-sided tests have a null hypothesis that specifies the absence of the
effect in one direction, while two-sided tests have a null hypothesis that specifies
equality.

3. Critical region: One-sided tests have a critical region in one tail, while two-sided tests
have critical regions in both tails.

4. p-value calculation: One-sided tests calculate the p-value using the area in one tail, while
two-sided tests calculate the p-value using the area in both tails.

When to Use

1. One-sided test: Use when:

 You have a prior expectation about the direction of the effect.

 The alternative hypothesis is directional.

 You want to detect an increase or decrease.

2. Two-sided test: Use when:

 You don't have a prior expectation about the direction of the effect.

 The alternative hypothesis is non-directional.

 You want to detect any difference (increase or decrease).

Examples

1. One-sided test: Testing whether a new medicine increases blood pressure (H0: μ ≤ 120,
H1: μ > 120).

2. Two-sided test: Testing whether a new exercise program affects blood pressure (H0: μ =
120, H1: μ ≠ 120).

Common Mistakes

1. Incorrectly specifying the direction of the alternative hypothesis.

2. Failing to consider the directionality of the test.

3. Misinterpreting the results of a one-sided test.

Additional Problem Example

Battery Lifespan Claim

A company claims that the average lifespan of its batteries is at least 500 hours. A random
sample of 25 batteries has a mean lifespan of 520 hours with a standard deviation of 30 hours.
Test the claim at a 5% significance level.

Given Values

1. Sample size (n): 25

2. Sample mean (X̄ ): 520

3. Sample standard deviation (s): 30

4. Population mean (μ): 500 (claimed)

5. Significance level (α): 0.05

Hypotheses

1. H0: μ ≤ 500 (null hypothesis)

One-Sample Hypothesis Testing

Sample Problem 1

Sample size (n): 25

Sample mean (X̄ ): 520
Sample standard deviation (s): 30
Population mean (μ): 500 (claimed)
Significance level (α): 0.05

Hypotheses

H0: μ ≤ 500 (null hypothesis)

H1: μ > 500 (alternative hypothesis, one-sided)

Test Statistic

t = (X̄ - μ) / (s / √n)
t = (520 - 500) / (30 / √25)
t = 20 / 6
t ≈ 3.33

Degrees of Freedom

df = n - 1
df = 25 - 1
df = 24

Critical Region

One-sided test, α = 0.05

Critical t-value ≈ 1.711 (using t-distribution table)

p-value

p-value ≈ 0.0013 (using t-distribution table or software)

Two-Sample Hypothesis Testing

Sample Problem 2

A researcher claims that the average height of adults in a population is 175 cm.
A random sample of 36 adults has a mean height of 178 cm with a standard deviation of 8 cm.
Test the claim at a 5% significance level.

Given Values

Sample size (n): 36

Sample mean (X̄ ): 178 cm
Sample standard deviation (s): 8 cm
Population mean (μ): 175 cm (claimed)
Significance level (α): 0.05

Hypotheses

H0: μ = 175 (null hypothesis)

H1: μ ≠ 175 (alternative hypothesis, two-sided)

Test Statistic
t = (X̄ - μ) / (s / √n)
t = (178 - 175) / (8 / √36)
t = 3 / 1.33
t ≈ 2.26

Degrees of Freedom

df = n - 1
df = 36 - 1
df = 35

Critical Region

Two-sided test, α = 0.05

Critical t-values ≈ ±2.030 (using t-distribution table)

p-value

p-value ≈ 0.029 (using t-distribution table or software)

Decision

Reject H0: Since t ≈ 2.26 > 2.030 and p-value ≈ 0.029 < α = 0.05.
Conclude: The researcher's claim is incorrect. The average height of adults is significantly
different from 175 cm.

Software Output

Here's an example output from R:

t.test(x = c(178), mu = 175, sd = 8, n = 36)
One Sample t-test data: c(178) t = 2.26, df = 35, p-value = 0.029
alternative hypothesis: true mean is not equal to 175
95 percent confidence interval: 174.41 181.59
sample estimates: mean of x 178

Interpretation

The test results indicate that the average height of adults is significantly different from 175 cm,
supporting the alternative hypothesis. This suggests that the researcher's claim may be incorrect.

Key Concepts and Formulas

Hypothesis Testing

Null Hypothesis (H0): μ = μ0 (population mean equals a known value)

Alternative Hypothesis (H1): μ ≠ μ0 (two-tailed), μ > μ0 (one-tailed, right), or μ < μ0 (one-tailed,
left)
Test Statistic: z = (x̄ - μ0) / (σ / √n) or t = (x̄ - μ0) / (s / √n)
p-value: Probability of observing the test statistic under H0
Critical Region: Range of values where H0 is rejected

Types of Tests

One-Sample Z-Test: Known population standard deviation (σ)

One-Sample T-Test: Unknown population standard deviation (s)
Two-Sample Z-Test: Comparing means of two independent samples
Two-Sample T-Test: Comparing means of two independent samples

Formulas

One-Sample Z-Test: z = (x̄ - μ0) / (σ / √n)

One-Sample T-Test: t = (x̄ - μ0) / (s / √n)
Two-Sample Z-Test: z = ((x̄ 1 - x̄ 2) - (μ1 - μ2)) / √((σ1^2/n1) + (σ2^2/n2))
Two-Sample T-Test: t = ((x̄ 1 - x̄ 2) - (μ1 - μ2)) / √((s1^2/n1) + (s2^2/n2))

Assumptions

Normality: Data follows a normal distribution

Independence: Observations are independent
Equal Variances: Variances are equal across groups (for two-sample tests)

Variance and Standard Deviation Testing

Hypothesis Testing

Null Hypothesis (H0): σ² = σ0² (population variance equals a known value)

Alternative Hypothesis (H1): σ² ≠ σ0² (two-tailed), σ² > σ0² (one-tailed, right), or σ² < σ0² (one-
tailed, left)
Test Statistic: χ² = (n - 1)s² / σ0²
p-value: Probability of observing the test statistic under H0
Critical Region: Range of values where H0 is rejected

Assumptions

Normality: Data follows a normal distribution

Independence: Observations are independent
Random Sampling: Sample is randomly selected

Test Procedure

Calculate sample variance (s²).

Choose significance level (α).
Determine degrees of freedom (df = n - 1).
Calculate test statistic (χ²).
Find p-value or critical χ²-value.
Make decision: Reject H0 if p-value < α or χ² > critical χ²-value.

Formulas

Sample Variance: s² = Σ(xi - x̄ )² / (n - 1)

Test Statistic: χ² = (n - 1)s² / σ0²
Standard Error: SE = √(s² / (2(n - 1)))

Types of Tests

One-Sample Chi-Square Test: Testing variance of one sample.

Two-Sample F-Test: Comparing variances of two independent samples.

Example Problems
Example Problem 1

Suppose we want to test whether the variance of exam scores is 100, given a sample of 25
students with a sample variance of 120.
H0: σ² = 100
H1: σ² ≠ 100
α = 0.05
df = 25 - 1 = 24
χ² = (24)(120) / 100 ≈ 28.8
p-value ≈ 0.001
Reject H0, conclude σ² ≠ 100

Example Problem 2

A manufacturer claims that the variance of the weights of its bags of flour is 0.04 kg².
A random sample of 25 bags has a sample variance of 0.06 kg².
Test the claim at a 5% significance level.

Given Values

Sample size (n): 25

Sample variance (s²): 0.06 kg²
Population variance (σ0²): 0.04 kg² (claimed)
Significance level (α): 0.05

Hypotheses

H0: σ² = 0.04 kg² (null hypothesis)

H1: σ² ≠ 0.04 kg² (alternative hypothesis, two-tailed)

Test Statistic

χ² = (n - 1)s² / σ0²
χ² = (25 - 1)(0.06) / 0.04
χ² = 24(0.06) / 0.04
χ² ≈ 36

Critical Region

Two-tailed test, α = 0.05

Critical χ²-values ≈ 12.401 and 39.364 (using χ²-distribution table)

p-value

p-value ≈ 0.012 (using χ²-distribution table or software)

Decision

Reject H0: Since χ² ≈ 36 > 39.364 is not true, but p-value ≈ 0.012 < 0.05.
Conclude: The manufacturer's claim is incorrect. The variance of the weights is significantly
different from 0.04 kg².

Interpretation
The test results indicate that the variance of the weights is significantly different from 0.04 kg²,
supporting the alternative hypothesis. This suggests that the manufacturer's claim may be
incorrect.

Testing on a Population Proportion

Hypothesis Testing

Null Hypothesis (H0): p = p0 (population proportion equals a known value)

Alternative Hypothesis (H1): p ≠ p0 (two-tailed), p > p0 (one-tailed, right), or p < p0 (one-tailed,
left)
Test Statistic: z = (p̂ - p0) / √(p0(1-p0)/n)
p-value: Probability of observing the test statistic under H0
Critical Region: Range of values where H0 is rejected

Assumptions

Random Sampling: Sample is randomly selected

Independence: Observations are independent
Large Sample Size: n ≥ 30 or np ≥ 5 and n(1-p) ≥ 5

Test Procedure

Calculate sample proportion (p̂ ).

Choose significance level (α).
Determine test statistic (z).
Find p-value or critical z-value.
Make decision: Reject H0 if p-value < α or z > critical z-value.

Formulas

Sample Proportion: p̂ = (Number of successes) / n

Test Statistic: z = (p̂ - p0) / √(p0(1-p0)/n)
Standard Error: SE = √(p0(1-p0)/n)

Types of Tests

One-Proportion Z-Test: Testing proportion of one population.

Two-Proportion Z-Test: Comparing proportions of two independent populations.

Example

Suppose we want to test whether the proportion of smokers in a population is 0.3, given a sample
of 100 individuals with 35 smokers.
H0: p = 0.3
H1: p ≠ 0.3
α = 0.05
p̂ = 35/100 = 0.35
z = (0.35 - 0.3) / √(0.3(1-0.3)/100) ≈ 1.38
p-value ≈ 0.168
Fail to reject H0, conclude p ≈ 0.3.

Problem Solving for Test on a Population Proportion

A company claims that 80% of its customers are satisfied with their service. A random sample of
200 customers found 154 satisfied customers. Test the claim at a 5% significance level.
Given Values: Sample size (n): 200, Number of successes (x): 154 (satisfied customers), Sample
proportion (p̂ ): 154/200 = 0.77, Population proportion (p0): 0.8 (claimed), Significance level (α):
0.05.
Hypotheses: H0: p = 0.8 (null hypothesis), H1: p ≠ 0.8 (alternative hypothesis, two-tailed).
Test Statistic: z = (p̂ - p0) / √(p0(1-p0)/n) → z = (0.77 - 0.8) / √(0.8(1-0.8)/200) → z ≈ -2.22.
Degrees of Freedom: Not applicable for one-proportion z-test.
Critical Region: Two-tailed test, α = 0.05, Critical z-values ≈ ±1.96.
p-value: p-value ≈ 0.026 (using z-distribution table or software).
Decision: Reject H0: Since z ≈ -2.22 < -1.96 and p-value ≈ 0.026 < α = 0.05. Conclude: The
company's claim is incorrect. The population proportion of satisfied customers is significantly
different from 80%.

Statistical Inference for Two Samples

Hypothesis Testing

Two-Sample T-Test: Compare means of two independent samples.

Two-Sample Z-Test: Compare proportions of two independent samples.
Wilcoxon Rank-Sum Test: Compare medians of two independent samples (non-parametric).
Mann-Whitney U Test: Compare distributions of two independent samples (non-parametric).

Confidence Intervals

Two-Sample T-Interval: Estimate difference between means.

Two-Sample Z-Interval: Estimate difference between proportions.

Assumptions

Independence: Samples are randomly selected and independent.

Normality: Data follows normal distribution (for parametric tests).
Equal Variances: Variances are equal across samples (for parametric tests).

Test Statistics

Two-Sample T-Test: t = (x̄ 1 - x̄ 2) / sqrt((s1²/n1) + (s2²/n2))

Two-Sample Z-Test: z = (p̂ 1 - p̂ 2) / sqrt((p1(1-p1)/n1) + (p2(1-p2)/n2))

Interpretation

p-value: Probability of observing test statistic under null hypothesis.

Confidence Interval: Range of values for population parameter.
Effect Size: Standardized difference between means (e.g., Cohen's d).

Common Tests

Paired T-Test: Compare means of paired samples.

Two-Sample Test of Proportions: Compare proportions of two independent samples.
Kruskal-Wallis Test: Compare means of three or more independent samples (non-parametric).

Example

Suppose we want to compare the average heights of males and females.

H0: μ1 = μ2 (null hypothesis)
H1: μ1 ≠ μ2 (alternative hypothesis)
α = 0.05
Sample sizes: n1 = 50 (males), n2 = 50 (females)
Sample means: x̄ 1 = 175.2 cm, x̄ 2 = 162.1 cm
Sample standard deviations: s1 = 5.5 cm, s2 = 4.8 cm
t ≈ 8.21
p-value ≈ 0.0001
Reject H0, conclude μ1 ≠ μ2.

Problem

A researcher wants to compare the average exam scores of students from two different teaching
methods. Method A (traditional) and Method B (online). A random sample of 25 students from
each method yielded:
Method A (Traditional): Sample size (n1): 25, Sample mean (x̄ 1): 80, Sample standard deviation
(s1): 10
Method B (Online): Sample size (n2): 25, Sample mean (x̄ 2): 85, Sample standard deviation (s2):
12
Significance level (α): 0.05.
Hypotheses: H0: μ1 = μ2 (null hypothesis), H1: μ1 ≠ μ2 (alternative hypothesis, two-tailed).
Test Statistic: Two-Sample T-Test: t = (x̄ 1 - x̄ 2) / sqrt((s1²/n1) + (s2²/n2)) → t = (80 - 85) /
sqrt((10²/25) + (12²/25)) → t ≈ -1.67.
Degrees of Freedom: df = n1 + n2 - 2 → df = 25 + 25 - 2 → df = 48.
Critical Region: Two-tailed test, α = 0.05, Critical t-values ≈ ±1.96 (using t-distribution table).
p-value: p-value ≈ 0.101 (using t-distribution table or software).
Decision: Fail to reject H0: Since t ≈ -1.67 < 1.96 and p-value ≈ 0.101 > α = 0.05. Conclude: No
significant difference between average exam scores of students from Method A and Method B.

Inference on the Difference in Means of Two

Normal Distributions
Hypothesis Testing

Null Hypothesis (H0): μ1 = μ2 (equal means)

Alternative Hypothesis (H1): μ1 ≠ μ2 (unequal means)
Test Statistic: Z = (x̄ 1 - x̄ 2) / sqrt((σ1^2 / n1) + (σ2^2 / n2))
Critical Region: Reject H0 if |Z| > Zα/2 (two-tailed test)

Confidence Interval

Confidence Interval: (x̄ 1 - x̄ 2) ± Zα/2 * sqrt((σ1^2 / n1) + (σ2^2 / n2))

Margin of Error: Zα/2 * sqrt((σ1^2 / n1) + (σ2^2 / n2)

Assumptions

Normality: Both populations are normally distributed.

Independence: Samples are independent.
Known Variances: Population variances (σ1^2, σ2^2) are known.

Example

Suppose we want to compare the mean heights of men and women.

Sample Data: Men (n1 = 100): x̄ 1 = 175.2 cm, σ1^2 = 10^2, Women (n2 = 100): x̄ 2 = 162.1 cm,
σ2^2 = 8^2.
Hypothesis Test: H0: μ1 = μ2, H1: μ1 ≠ μ2, Z = (175.2 - 162.1) / sqrt((10^2 / 100) + (8^2 / 100))
≈ 4.31, Reject H0 (p-value ≈ 0).
95% Confidence Interval: (175.2 - 162.1) ± 1.96 * sqrt((10^2 / 100) + (8^2 / 100)) ≈ (11.3, 14.1).

Key Considerations

Sample Size: Ensure adequate sample sizes (n1, n2) for reliable inference.
Variance Homogeneity: Verify equal variances (σ1^2 = σ2^2) for pooled variance estimates.
Non-Parametric Alternatives: Consider non-parametric tests (e.g., Wilcoxon rank-sum test) for
non-normal data.

Example

Suppose we want to compare the mean exam scores of two classes.

Sample 1 (Class A): x̄ 1 = 85, n1 = 50, σ1^2 = 100
Sample 2 (Class B): x̄ 2 = 80, n2 = 60, σ2^2 = 120
Hypothesis: H0: μ1 = μ2, H1: μ1 ≠ μ2, α: 0.05.
Solution: Test statistic: Z = (85 - 80) / sqrt((100 / 50) + (120 / 60)) ≈ 1.54.

Non-Parametric Alternatives
Overview

Consider non-parametric tests (e.g., Wilcoxon rank-sum test) for non-normal data.

Example: Comparing Exam Scores

Suppose we want to compare the mean exam scores of two classes.

Sample 1 (Class A): x̄ 1 = 85, n1 = 50, σ1^2 = 100
Sample 2 (Class B): x̄ 2 = 80, n2 = 60, σ2^2 = 120
Hypothesis:
H0: μ1 = μ2
H1: μ1 ≠ μ2
α: 0.05

Solution Steps

Test statistic: Z = (85 - 80) / sqrt((100 / 50) + (120 / 60)) ≈ 1.54

Critical region: Reject H0 if |Z| > 1.96
p-value: P(|Z| > 1.54) ≈ 0.123
Decision: Fail to reject H0 (p-value > 0.05)

Confidence Interval

Margin of error: 1.96 * sqrt((100 / 50) + (120 / 60)) ≈ 4.33

Confidence interval: (85 - 80) ± 4.33 ≈ (0.67, 8.33)

Inference on the Difference in Means

Hypothesis Testing

Null Hypothesis (H0): μ1 = μ2 (equal means)

Alternative Hypothesis (H1): μ1 ≠ μ2 (unequal means)
Test Statistic: t = (x̄ 1 - x̄ 2) / sqrt((s1^2 / n1) + (s2^2 / n2))
Degrees of Freedom: Typically, min(n1-1, n2-1)

Confidence Interval

Confidence Interval: (x̄ 1 - x̄ 2) ± tα/2 * sqrt((s1^2 / n1) + (s2^2 / n2))

Margin of Error: tα/2 * sqrt((s1^2 / n1) + (s2^2 / n2)

Assumptions

Normality: Both populations are normally distributed.

Independence: Samples are independent.
Equal Variances: Population variances (σ1^2, σ2^2) are equal.

Types of Tests

Pooled Variance Test: Assumes equal variances.

Welch's Test: Does not assume equal variances.
t-Test: Suitable for small samples.

Example: Comparing Heights of Men and

Women
Given Data

Men: x̄ 1 = 175.2 cm, n1 = 100, s1^2 = 10^2

Women: x̄ 2 = 162.1 cm, n2 = 100, s2^2 = 8^2
α = 0.05

Hypothesis

H0: μ1 = μ2 (equal means)

H1: μ1 ≠ μ2 (unequal means)

Solution: Pooled Variance Test

Pooled variance: sp^2 = ((n1-1)s1^2 + (n2-1)s2^2) / (n1+n2-2) = ((99100 + 9964) / 198) = 84.85
Standard error: SE = sqrt(sp^2 * (1/n1 + 1/n2)) = sqrt(84.85 * (1/100 + 1/100)) ≈ 1.38
Test statistic: t = (x̄ 1 - x̄ 2) / SE = (175.2 - 162.1) / 1.38 ≈ 9.42
Degrees of freedom: df = n1 + n2 - 2 = 198
Critical value: t0.025,198 ≈ 1.96
p-value: P(|t| > 9.42) ≈ 0

Conclusion

Reject H0. Mean heights differ significantly (p < 0.05).

Confidence Interval

Margin of error: ME = t0.025,198 * SE ≈ 1.96 * 1.38 ≈ 2.70

95% CI: (175.2 - 162.1) ± 2.70 ≈ (10.40, 13.00)

Simple Linear Regression

Definition and Goals

Definition: A statistical method to model the relationship between a dependent variable and an
independent variable (x).
Equation: y = β0 + β1x + ε (ε = error term)
Goals: Predict y values, identify relationships, and estimate coefficients (β0, β1).

Assumptions

Linearity, independence, homoscedasticity, normality, and no multicollinearity.

Correlation

Definition: Measures the strength and direction of a linear relationship between two variables.
Types: Pearson's r (parametric), Spearman's ρ (non-parametric), and Kendall's τ.
Interpretation: Values range from -1 (perfect negative correlation) to 1 (perfect positive
correlation).
Correlation coefficient (r): Measures strength and direction.

Differences Between Regression and Correlation

Purpose: Regression predicts y values, while correlation measures relationship strength.

Direction: Regression implies causality, whereas correlation suggests association.
Equation: Regression provides a predictive model, whereas correlation provides a coefficient.

Relationship Between Regression and

Correlation
Key Concepts

Correlation coefficient (r): Square root of R-squared (coefficient of determination) in simple

linear regression.
R-squared: Measures variability explained by the regression model.
Regression slope (β1): Related to correlation coefficient (r).

Statistical Tests

t-test: Evaluates regression coefficients.

F-test: Assesses overall model significance.
p-value: Indicates probability of observing results by chance.

Common Metrics

Mean Squared Error (MSE): Measures regression model accuracy.

Coefficient of Determination (R-squared): Evaluates model fit.
Root Mean Squared Error (RMSE): Measures model accuracy.

Problem Example: Bakery Sales Prediction

Data
A bakery wants to predict the number of bread loaves sold based on the number of hours
advertised (x).
Data:

Hours Bread
Advertised Loaves
(x) Sold (y)
2 100
4 150
6 200
8 250
10 300

Step-by-Step Solution

Calculate means: x̄ = (2+4+6+8+10)/5 = 6, ȳ = (100+150+200+250+300)/5 = 200

Calculate deviations:
x deviation y deviation
x- x̄ y- ȳ

x y x-6 y-200 (x-6)(y- (x-6)^2

200)
2 100 -4 -100 400 16
4 150 -2 -50 100 4
6 200 0 0 0 0
8 250 2 50 100 4
10 300 4 100 400 16

Calculate slopes (β1): β1 = Σ[(x-6)(y-200)] / Σ(x-6)2 = (400+100+0+100+400) / (16+4+0+4+16)

= 1000 / 40 = 25
Calculate intercept (β0): β0 = 200 - 25*6 = 200 - 150 = 50
Linear Regression Equation: y = 50 + 25x
Plot the graph using scatter plot
Interpret the result: Every additional hour advertised increases bread loaves sold by 25. The
bakery sells 50 loaves when no hours are advertised.

Empirical Models
Definition

Empirical models are mathematical models based on observed data, experience and statistical
analysis rather than purely theoretical assumptions. They describe relationships between
variables, predict outcomes and estimate parameters.

Types of Empirical Models

Linear Regression: Models linear relationships between variables.

Non-Linear Regression: Models non-linear relationships using polynomial or logarithmic
functions.
Time Series Models: Analyze and forecast data with temporal dependencies (e.g., ARIMA,
Exponential Smoothing).
Machine Learning Models: Algorithms like decision trees, random forests, and neural networks.
Econometric Models: Study economic relationships and forecast economic indicators (e.g., GDP,
inflation).
Statistical Models: Hypothesis testing and confidence intervals (e.g., t-tests, ANOVA).

Characteristics

Data-driven: Derived from observational data.

Pragmatic: Focus on predictive accuracy rather than theoretical purity.
Flexible: Can accommodate non-linear relationships and interactions.
Interpretable: Provide insights into variable relationships.

Applications

Forecasting: Predict future values of economic indicators, sales or demand.

Policy Evaluation: Assess impact of policy interventions.
Risk Analysis: Estimate probability of adverse events.
Optimization: Identify optimal settings for system performance.
Data Mining: Discover hidden patterns and relationships.

Advantages and Limitations of Empirical

Models
Advantages

Improved prediction: Better forecasting accuracy.

Practical insights: Inform decision-making.
Flexibility: Handle complex relationships.
Interpretability: Understand variable interactions.

Limitations

Data quality: Sensitive to data errors and biases.

Overfitting: Models may fit noise rather than underlying patterns.
Limited generalizability: Models may not apply outside the data range.
Assumptions: Require careful validation.

Common Empirical Modeling Techniques

Techniques

Least Squares Estimation

Maximum Likelihood Estimation
Cross-Validation
Bootstrap Resampling
Feature Engineering

Real-World Examples of Empirical Models

Examples

Demand forecasting for supply chain optimization.

Credit risk modeling for loan approval.
Economic forecasting for fiscal policy.
Customer segmentation for targeted marketing.
Quality control in manufacturing.

Example: Predicting Coffee Shop Sales

Problem Statement

A coffee shop owner wants to predict daily sales based on advertising expenditure.

Data

Independent variable (x): Advertising expenditure ($1,000s)

Dependent variable (y): Daily sales ($1,000s)
Sample size: 10 weeks
Data:

Week Advertising Sales

(x)
1 2 10
2 3 12
3 4 15
4 2 9
5 5 18
6 3 11
7 4 16
8 6 20
9 5 19
10 4 17

Empirical Model: Linear Regression

y = β0 + β1x + ε
Estimated Model:
β0: Intercept = 6
β1: Slope = 2.5
Estimated Model Equation: y = 6 + 2.5x

Interpretation

For every additional $1,000 spent on advertising, sales increase by $2,500.

Initial sales are $6,000 without advertising.

Limitations

Assumes linear relationship.

Ignores seasonality, competition and economic factors.

BSIT Quantitative Methods Guide
100% (3)
BSIT Quantitative Methods Guide
11 pages
Business Statistics
No ratings yet
Business Statistics
105 pages
Inferential Statistics: Positive Correlation
No ratings yet
Inferential Statistics: Positive Correlation
9 pages
Biostat Lec Part 5 (Hypothesis) - V2
No ratings yet
Biostat Lec Part 5 (Hypothesis) - V2
6 pages
Eda Final Topic
No ratings yet
Eda Final Topic
108 pages
Hypothesis Testing & T-Test Guide
No ratings yet
Hypothesis Testing & T-Test Guide
22 pages
Unit 3 (Hypothesis Testing)
No ratings yet
Unit 3 (Hypothesis Testing)
40 pages
Essay On Hypothesis Testing
100% (2)
Essay On Hypothesis Testing
4 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
7 pages
Unit 4 Statistical Testing and Modeling in R
No ratings yet
Unit 4 Statistical Testing and Modeling in R
25 pages
Advanced Statistic
No ratings yet
Advanced Statistic
33 pages
Hypothesis Testing Intro and Test For Means
No ratings yet
Hypothesis Testing Intro and Test For Means
10 pages
L15 Testing of Hypothesis
No ratings yet
L15 Testing of Hypothesis
42 pages
Inferential Statistics
No ratings yet
Inferential Statistics
35 pages
Hypothesis Testting3
No ratings yet
Hypothesis Testting3
7 pages
Chap 5
No ratings yet
Chap 5
36 pages
Testing On A Population Proportion
No ratings yet
Testing On A Population Proportion
3 pages
Lab 5
No ratings yet
Lab 5
7 pages
7 4 - Hypothesis-Testing
No ratings yet
7 4 - Hypothesis-Testing
4 pages
BI Lec 6 - Hypothesis Testing
No ratings yet
BI Lec 6 - Hypothesis Testing
22 pages
SPSS
No ratings yet
SPSS
26 pages
Test of Hypothesis
No ratings yet
Test of Hypothesis
8 pages
Tests of Significance
No ratings yet
Tests of Significance
60 pages
Introduction To Statistical Hypothesis Testing in R
No ratings yet
Introduction To Statistical Hypothesis Testing in R
8 pages
Chapter IX Hypothesis Testing
No ratings yet
Chapter IX Hypothesis Testing
31 pages
Stats Unit5
No ratings yet
Stats Unit5
26 pages
Hypothesis Testing: 10.1 Testing The Mean of A Normal Population
No ratings yet
Hypothesis Testing: 10.1 Testing The Mean of A Normal Population
13 pages
Business Statistics and Management Science Notes
No ratings yet
Business Statistics and Management Science Notes
74 pages
Topic2 3
No ratings yet
Topic2 3
52 pages
Chapter 4 Hypothesis Testing
No ratings yet
Chapter 4 Hypothesis Testing
40 pages
CAMI16 - Data Analytics
No ratings yet
CAMI16 - Data Analytics
55 pages
Hypothesis Testing Basics
No ratings yet
Hypothesis Testing Basics
22 pages
DS Module 04
No ratings yet
DS Module 04
5 pages
Hypothesis Testing: Two Sample Test For Means and Proportions
No ratings yet
Hypothesis Testing: Two Sample Test For Means and Proportions
37 pages
Presentation3 - Hypothesis Testing
No ratings yet
Presentation3 - Hypothesis Testing
29 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
9 pages
Hypothesis Testing Guide
No ratings yet
Hypothesis Testing Guide
23 pages
08 Parametric Tests
100% (1)
08 Parametric Tests
129 pages
Mod 5 Hypo1fin
No ratings yet
Mod 5 Hypo1fin
50 pages
ECE 069 Module 15
No ratings yet
ECE 069 Module 15
26 pages
Testing of Hypothesis - Fall
No ratings yet
Testing of Hypothesis - Fall
6 pages
Hypothesis Testing Final
No ratings yet
Hypothesis Testing Final
22 pages
Hypothesis Testing Guide
No ratings yet
Hypothesis Testing Guide
16 pages
Hypothesis Tests & Control Charts: by S.G.M
No ratings yet
Hypothesis Tests & Control Charts: by S.G.M
26 pages
Hypothesis Testing - Intro - Summer 2025
No ratings yet
Hypothesis Testing - Intro - Summer 2025
59 pages
An Introduction To Statistical Inference
No ratings yet
An Introduction To Statistical Inference
33 pages
T - Test
100% (3)
T - Test
32 pages
Hypothesis Testing Guide
No ratings yet
Hypothesis Testing Guide
62 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
86 pages
Huypothesis Testing Final Notes 2020 - 2021
No ratings yet
Huypothesis Testing Final Notes 2020 - 2021
33 pages
Eda Research
No ratings yet
Eda Research
11 pages
BKB - Stats
No ratings yet
BKB - Stats
6 pages
Dava Notes 3
No ratings yet
Dava Notes 3
59 pages
Statistics: Hypothesis Testing
100% (2)
Statistics: Hypothesis Testing
32 pages
Hypothesis Testing G
No ratings yet
Hypothesis Testing G
28 pages
2 Hypothesis-Testing
No ratings yet
2 Hypothesis-Testing
43 pages
T Test
No ratings yet
T Test
21 pages
Hypothesis Testing & ANOVA Guide
No ratings yet
Hypothesis Testing & ANOVA Guide
13 pages
Testing The CAPM
No ratings yet
Testing The CAPM
32 pages
Confidence Intervals for Small Samples
No ratings yet
Confidence Intervals for Small Samples
11 pages
Statistical Computation of Tolerance Limits
No ratings yet
Statistical Computation of Tolerance Limits
48 pages
Chapter 7 C
No ratings yet
Chapter 7 C
27 pages
Introduction To Probability Distributions
No ratings yet
Introduction To Probability Distributions
73 pages
Stats Chap07 Bluman
No ratings yet
Stats Chap07 Bluman
71 pages
Statistics Cheat Sheet
100% (1)
Statistics Cheat Sheet
4 pages
0 Passenger Satisfaction
No ratings yet
0 Passenger Satisfaction
9 pages
ch04 Sampling Distributions
No ratings yet
ch04 Sampling Distributions
60 pages
Sampling & Distribution Guide
No ratings yet
Sampling & Distribution Guide
44 pages
Stat - Prob-Q3-Module-8
No ratings yet
Stat - Prob-Q3-Module-8
21 pages
Albright & Winston Pp. 342-347 (Confidence Interval)
No ratings yet
Albright & Winston Pp. 342-347 (Confidence Interval)
6 pages
Statistical Data Analysis Summary
No ratings yet
Statistical Data Analysis Summary
44 pages
CH3330E Chapter2
No ratings yet
CH3330E Chapter2
37 pages
Understanding the t Statistic
No ratings yet
Understanding the t Statistic
42 pages
Authentic Materials Boost Listening Skills
No ratings yet
Authentic Materials Boost Listening Skills
17 pages
Psych 207 STATS Notes
No ratings yet
Psych 207 STATS Notes
59 pages
The Drongo's Guide To Uncertainty
No ratings yet
The Drongo's Guide To Uncertainty
8 pages
Topic Probability Distributions
100% (1)
Topic Probability Distributions
25 pages
Two-Sample Hypothesis Testing Guide
No ratings yet
Two-Sample Hypothesis Testing Guide
48 pages
UNIT-4: Reading Material On Hypothesis Testing With Single Sample
No ratings yet
UNIT-4: Reading Material On Hypothesis Testing With Single Sample
7 pages
Chapter 5 - Estimation
No ratings yet
Chapter 5 - Estimation
8 pages
(Ebook) An Introduction To Statistics by George Woodbury ISBN 9780534377557, 0534377556 Instant Download
No ratings yet
(Ebook) An Introduction To Statistics by George Woodbury ISBN 9780534377557, 0534377556 Instant Download
53 pages
Statistical Methods for Students
No ratings yet
Statistical Methods for Students
24 pages
STA301 Solved MCQs Lectures 23-45
No ratings yet
STA301 Solved MCQs Lectures 23-45
54 pages
Assigment - Statistics and Probability - ROJAS, Christofer Paul B. - 11 STEM TESLA PDF
No ratings yet
Assigment - Statistics and Probability - ROJAS, Christofer Paul B. - 11 STEM TESLA PDF
31 pages
AP Stats Study Guide 1 1 1
No ratings yet
AP Stats Study Guide 1 1 1
21 pages
570-Asm2-GBS1006-Tran Khanh Ly
No ratings yet
570-Asm2-GBS1006-Tran Khanh Ly
34 pages