0% found this document useful (0 votes)

17 views12 pages

Report

The document analyzes the relationship between genetic diversity and migration distance using linear regression. It finds a strong negative correlation, with diversity decreasing as distance increases. The regression model explains 84.65% of variability in diversity. However, some assumptions of the model are not fully met, as variability is greater at higher distances and some outliers exist. The regression equation predicts that for every 1,000,000 km increase in distance, diversity decreases by 8 units. It also provides a prediction for an unsampled population's diversity that migrated 15,000 km, including a 95% confidence and prediction interval. An alternative ANOVA model to test for differences in mean diversity between regions is also described and its assumptions are checked, finding normality is met

Uploaded by

Joaquin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views12 pages

Report

Uploaded by

Joaquin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Question 1.

Analyze the relationship between Diversity and Migration Distance using linear
regression.

The correlation between these two variables is obtained by the following sequence: Stats / Basic
Statistics / Correlation.

Table 1.

Correlation between Diversity and Migration Distance.

Pearson -0.920
correlation

P-value 0.000

There is a strong negative linear correlation between the Diversity and Migration Distance
variables. When the migration distance increases, the genetic diversity measure decreases and when the
migration distance decreases, the genetic diversity measure increases.

Scatterplot for Distance and Diversity

0.80

0.75

0.70
Diversity

0.65

0.60

0.55

0.50

0 5000 10000 15000 20000 25000 30000

Distance
Data from a study of human genetic diversity

Figure 1.
Scatterplot for migration distance and genetic diversity measure.

Table 2.

Regression Analysis

Regression Analysis: Diversity versus Distance

Analysis of Variance
D F-
Source F Adj SS Adj MS Value P-Value

Regression 1 0.112752 0.112752 242.66 0.000

Distance 1 0.112752 0.112752 242.66 0.000
Error 44 0.020445 0.000465
Lack-of- 43 0.020375 0.000474 6.81 0.297
Fit
Pure Error 1 0.000070 0.000070
Total 45 0.133197
Model Summary
R-
S R-sq sq(adj) R-sq(pred)

0.0215558 84.65 84.30% 80.15%

%
Coefficients
Term Coef SE Coef T-Value P-Value VIF

Constant 0.79861 0.00567 140.90 0.000

Distance - 0.000000 -15.58 0.000 1.00
0.000008
Regression Equation
Diversit = 0.79861
y - 0.000008 Distance
Fits and Diagnostics for Unusual Observations
Ob
s Diversity Fit Resid Std Resid

7 0.68880 0.62301 0.06579 3.24 R

8 0.61590 0.59751 0.01839 0.93 X
9 0.57060 0.58769 - -0.88 X
0.01709
10 0.50180 0.58712 - -4.40 R X
0.08532
R Large residual
X Unusual X

In Table 2 below, it is observed that the value of the determination coefficient is r2 = 0.8465.
Therefore, 84.65% of the total variability of diversity is explained by the regression model obtained.

a. Are there any problems with fitting and testing the relationship?

In Figure 1 above, the shape of the point cloud confirms the existence of a strong negative linear
relationship between the variables distance and genetic diverse measure. However, the variability of
diversity seems to be greater at high levels of distance, so the assumption of homoscedasticity seems not
to be fully met.

Additionally, in Table 2 above, in the section corresponding to Fits and Diagnostics for Unusual
Observations, four observations are presented; two of them result in large residuals while three of them
have been identified as unusual X; that is, they are outliers that can affect the quality of the estimated
regression model.

Finally, in the residual graph that is shown in Figure 2 below, it is observed that the values of the
residues at both ends move away from the line corresponding to the normal distribution, so that the
assumption of normality of the residues can be failing
Normal Probability Plot
(response is Diversity)
99

80
70
Percent

60
50
40
30
20

1
-0.10 -0.05 0.00 0.05
Residual

Figure 2.

Normal probability plot for residuals.

b. State the statistical model used and the Regression Equation

The statistical model corresponds to the following expression:

diversity =β 0+ β 1 distance+ u

Where

β 0: intercept parameter.

β 1: slope parameter.

u: unobserved random error or disturbance term.

The regression equation is given by:

^
Diversity=0.79861−0.000008∗^
Distance
When the migration distance increases by 1,000,000 kilometers, the genetic diversity measure decreases
by 8 units.

c. State the hypotheses tested and your conclusions on the significance of the regression equation.

When verifying the joint significance of the model, the following hypotheses are tested:

Ho: β0 = β2 = 0 vs. H1: Not all the βi are 0

Table 2 above presents the results of the Analysis of Variance. This table summarizes the results
of the joint significance test of the regression model. We compare the p-value in the far right of the table
with the level of significance of the test, which has been set at 0.05. Based upon the ANOVA results, we
reject the null (F = 242.66; df = 1, 44; p <.001). The migration distance have the ability to explain the
variation in the genetic diversity measure.

The section called Coefficients of Table 2 above, details the result of the hypothesis test of partial
significance of the coefficients of the regression model. The following hypotheses are tested:

For constant, Ho: β0 = 0 vs. H1: β0 ≠ 0

Based upon the t-test results, the null hypotheses are rejected (t = 140.9, p < .001). The
regression coefficient associated with the constant was not equal to zero and should be included in the
model to predict the genetic diversity measure.

For distance, Ho: β1 = 0 vs. H1: β1 ≠ 0

Based upon the t-test results, the null hypotheses are rejected (t = -15.58, p < .001). The
regression coefficient associated with the distance was not equal to zero and should be included in the
model to predict the genetic diversity measure.

d. Predict diversity for a single unsampled population that would have had to migrate 15,000km
and provide an interval in which you are 95% confident it should fall.

The prediction for diversity is detailed in Table 3 below. The point value predicted by the model
for the diversity is 0.683409. The 95% confidence interval is (0.675002, 0.691817). The 95% prediction
interval is (0.639160, 0.727658).

Table 3.

Confidence interval and prediction interval for diversity.

Prediction for Diversity

Regression Equation
Diversit = 0.79861
y - 0.000008 Distance
Settings
Settin
Variable g

Distance 15000
Prediction
Fit SE Fit 95% CI 95% PI

0.683409 0.0041718 (0.675002, (0.639160, 0.727658)

0.691817)

Question 2. An alternative approach would be to test for differences in mean diversity between the
regions (Africa, Asia, et cetera) by ANOVA.

a. State the model you are fitting,

As Montgomery (2013) points out, the following model will be adjusted (Montgomery, 2013, p.
69):

y ij =μi +ϵ ij

i = 1, 2, …, 6

j = 1, …, n

Where:

i: region.

j: number of observation.

y ij: j-th observation of the region i.

μi: mean of diversity for the i-th region.

ϵ ij : random error component.

It is necessary to emphasize that the number of observations taken within each region is different.
Therefore the design is unbalanced.
b. Check all the assumptions.

The assumptions of the model, as expressed by Montgomery (2013), are the following: “the
errors are normally and independently distributed with mean zero and constant but unknown variance”
(Montgomery, 2013, p. 80).

The assumption of normality is checked in the normal probability plot and the residue histogram
presented in Figure 3 below. It is observed that the normal probability plot is similar to the graph of
residues presented in Figure 2. Additionally, the histogram of the residuals is perfectly symmetrical and
the mean is zero. Therefore, the assumption of normality is met.

To verify the assumption of waste independence, the waste chart is used against the order in
which the data was collected. This is presented in the Versus Order chart, which is in Figure 3 below. No
systematic pattern is observed in the waste, so there is no evidence of non-compliance with this
assumption.

To verify that the assumption of homoscedasticity is fulfilled, the Versus Fits chart is used, which
is shown in Figure 3 below. It is observed that this graph presents a funnel pattern, which shows that the
variance decreases as the adjusted value increases and therefore, the constant variance assumption is not
fulfilled. Consequently, it is necessary to perform a transformation of the data.
Residual Plots for Diversity
Normal Probability Plot Versus Fits
99 0.10

90 0.05

Residual
Percent

50 0.00

-0.05
10

-0.10
1
-0.10 -0.05 0.00 0.05 0.10 0.60 0.65 0.70 0.75 0.80
Residual Fitted Value

Histogram Versus Order

30 0.10

0.05
Frequency

Residual
0.00

10
-0.05

-0.10
0
-0.10 -0.05 0.00 0.05 0.10 1 5 10 15 20 25 30 35 40 45
Residual Observation Order

Figure 3.

Residual plots for diversity.

c. State the hypotheses you tested and your conclusions

The following hypotheses are tested:

Ho: µ1 = µ2 = µ3 = µ4 = µ5 = µ6 vs. H1: µi = µj for at least one pair (i, j).

Based upon the ANOVA results showed in Table 4 below, we reject the null (F = 32.67; df = 5, 40;
p < .001). The treatments means differ; that is, the region significantly affects the mean of genetic
diversity measure.

Table 4.

Analysis of Variance and model summary.

Source DF Adj SS Adj MS F-Value P-Value

Regio 5 0.10700 0.02140 32.67 0.000

n 0
Error 40 0.02620 0.00065
5
Total 45 0.13320
Model Summary
R-
S R-sq sq(adj) R-sq(pred)

0.0255923 80.33 77.87% 70.62%

d. Provide an appropriate summary of which means are different from which others.

The 95% confidence intervals for the diversity mean according to the region is shown in Table 5.
With these intervals we can know which regions have different media. To do this, we observe which
confidence intervals overlap and which do not. Those intervals that do not overlap will correspond to
regions with different mean of diversity. This can be better appreciated in Figure 5. Additionally,
Tuckey's multiple comparisons test, which is detailed in Table 6, indicates that the regions of Africa,
Middle East and Europe have a similar average and form group A. In addition, the regions of Middle
East, Europe and Asia have a similar average and are grouped according to group B. The regions of Asia
and Oceania form group C and finally the Americas differ from all previous groups.

The highest averages of diversity are associated with the populations of Africa, Europe and the
Middle East, which supports the idea of greater diversity corresponds to the smaller distances with respect
to Africa, while the lower averages correspond to Asia, Oceania and the Americas, which are the ones
that are farthest from Africa.

Table 5.

Descriptive statistics by region.

Region N Mean StDev 95% CI

Africa 5 0.77608 0.0072 (0.75295, 0.79921)

4
Americas 5 0.5989 0.0688 (0.5758, 0.6220)
Asia 2 0.73097 0.0173 (0.71995, 0.74200)
2 7
Europe 8 0.75253 0.0033 (0.73424, 0.77081)
4
Middle_East 4 0.75737 0.0074 (0.73151, 0.78324)
2
Oceania 2 0.6836 0.0216 (0.6471, 0.7202)
Pooled StDev = 0.0255923
Table 6.

Mean comparison by region.

Tukey Pairwise Comparisons

Grouping Information Using the Tukey Method and 95% Confidence
Region N Mean Grouping

Africa 5 0.77608 A
Middle_East 4 0.75737 A B
Europe 8 0.75253 A B
Asia 22 0.73097 B C
Oceania 2 0.6836 C
Americas 5 0.5989 D
Means that do not share a letter are significantly different.
Interval Plot of Diversity vs Region
95% CI for the Mean

0.80

0.75
Diversity

0.70

0.65

0.60

Africa Americas Asia Europe Middle_East Oceania

Region
The pooled standard deviation is used to calculate the intervals.

Figure 5.

Interval plot of diversity by region.

Question 3. Discuss which of regression or ANOVA is the better in this case. Are both valid and
equally useful in respect of the aim of the study?

In the first question, a linear regression model was construed that included only one explanatory
variable and one variable to be explained. The two variables are continuous and the regression model was
the most adequate procedure. However, if we include the variable region in the regression model, the
conclusions we would have reached would be similar to the question 2. The ANOVA was adequate to
construct a model with a continuous variable to explain and a qualitative explanatory variable, with the
advantage over the simple regression model in question 1 that conclusions can be obtained regarding the
groups defined by the regions. The two models used are valid and useful for the study that we want to
perform.

Reference
Montgomery, D. C. (2013). Design and analysis of experiments (Eighth ed.). Hoboken: John Wiley &
Sons, Inc.

Homework 1
0% (1)
Homework 1
8 pages
Regression Analysis Assignment
No ratings yet
Regression Analysis Assignment
8 pages
Statistical Analysis in Microbiology StatNotes
0% (1)
Statistical Analysis in Microbiology StatNotes
173 pages
Study Guide For STA3701
No ratings yet
Study Guide For STA3701
325 pages
Advanced Factorial Design Analysis
No ratings yet
Advanced Factorial Design Analysis
24 pages
QSB Imp
No ratings yet
QSB Imp
22 pages
Pilgrim Bank (B) Case Analysis
100% (2)
Pilgrim Bank (B) Case Analysis
3 pages
Regression Analysis on Birth Weight
No ratings yet
Regression Analysis on Birth Weight
5 pages
Introduction To Econometrics
100% (2)
Introduction To Econometrics
28 pages
Factorial Experimentation Guide
No ratings yet
Factorial Experimentation Guide
24 pages
Lab3 Fitting and Plotting of Binomial Distribution & Poisson Distribution (Challenging Experiment 2 (A) and 2 (B) ) Aim
No ratings yet
Lab3 Fitting and Plotting of Binomial Distribution & Poisson Distribution (Challenging Experiment 2 (A) and 2 (B) ) Aim
18 pages
Gamma Extended Frechet Distribution
No ratings yet
Gamma Extended Frechet Distribution
23 pages
RM 9
No ratings yet
RM 9
35 pages
Ho - Diagnostics Examples 2 in SPSS
No ratings yet
Ho - Diagnostics Examples 2 in SPSS
4 pages
Linear Regression for Data Analysis
No ratings yet
Linear Regression for Data Analysis
2 pages
Observational Studies: Usmle Endpoint Statistics
100% (2)
Observational Studies: Usmle Endpoint Statistics
32 pages
Estimating Demand Functions: Managerial Economics
No ratings yet
Estimating Demand Functions: Managerial Economics
38 pages
Chapter 8 Multiple Regression - Oct21
No ratings yet
Chapter 8 Multiple Regression - Oct21
24 pages
Facilities and Design CHAPTER 5
100% (2)
Facilities and Design CHAPTER 5
50 pages
CASP Cohort Study Checklist 2018 - Fillable - Form
No ratings yet
CASP Cohort Study Checklist 2018 - Fillable - Form
8 pages
6 APT Slides ch10
No ratings yet
6 APT Slides ch10
53 pages
Chapter Test 4
No ratings yet
Chapter Test 4
2 pages
FINMAR - Time Value of Money
No ratings yet
FINMAR - Time Value of Money
2 pages
Statistical Analysis for Data Scientists
No ratings yet
Statistical Analysis for Data Scientists
26 pages
PPT08 - Analysis of Variance
No ratings yet
PPT08 - Analysis of Variance
45 pages
Assignment #1: FACTOR ANALYSIS ON "Satisf - Sav": KMO & Bartlett's Test
No ratings yet
Assignment #1: FACTOR ANALYSIS ON "Satisf - Sav": KMO & Bartlett's Test
8 pages
Statnotes PDF
No ratings yet
Statnotes PDF
300 pages
Statistics For Decision Making: ANOVA: Analysis of Variance
No ratings yet
Statistics For Decision Making: ANOVA: Analysis of Variance
32 pages
Propensity Score Method in Management
No ratings yet
Propensity Score Method in Management
40 pages
By Armstrong, Richard A. Hilton, Anthony C
100% (1)
By Armstrong, Richard A. Hilton, Anthony C
173 pages
Corr and Regress
No ratings yet
Corr and Regress
61 pages
Chapter - 13 Correlation and Linear Regression
No ratings yet
Chapter - 13 Correlation and Linear Regression
26 pages
Standford - HRP 259 Introduction To Probability and Statistics - Lecture 12
No ratings yet
Standford - HRP 259 Introduction To Probability and Statistics - Lecture 12
67 pages
ANOVA Seminar Paper: Biostatistics
No ratings yet
ANOVA Seminar Paper: Biostatistics
49 pages
Class Notes
No ratings yet
Class Notes
147 pages
STATA Panel Data Commands Guide
No ratings yet
STATA Panel Data Commands Guide
7 pages
Anova SC Gupta
100% (3)
Anova SC Gupta
55 pages
Doe
No ratings yet
Doe
143 pages
Biostatistics PPT - 5
No ratings yet
Biostatistics PPT - 5
44 pages
Analysis of Variance
No ratings yet
Analysis of Variance
4 pages
Module02 ANOVA
No ratings yet
Module02 ANOVA
28 pages
5-Fundamentals of Applied Statistics
No ratings yet
5-Fundamentals of Applied Statistics
60 pages
ISOM2500 Spring 2019 Assignment 4 Suggested Solution: Regression Statistics
No ratings yet
ISOM2500 Spring 2019 Assignment 4 Suggested Solution: Regression Statistics
4 pages
Solutions - Lab 4 - Assumptions & Multiple Comparisons: Learning Outcomes
No ratings yet
Solutions - Lab 4 - Assumptions & Multiple Comparisons: Learning Outcomes
23 pages
Analysis of Covariance
No ratings yet
Analysis of Covariance
5 pages
Yadunandan Sharma 500826933 MTH480 Due Date: April 15, 2021
No ratings yet
Yadunandan Sharma 500826933 MTH480 Due Date: April 15, 2021
16 pages
Lecture 3. Part 1 - Regression Analysis
No ratings yet
Lecture 3. Part 1 - Regression Analysis
21 pages
BALAJI - Module - 6 7 AOV, CHI
No ratings yet
BALAJI - Module - 6 7 AOV, CHI
8 pages
Chapter 4 Hypotheses Testing of More Than Two Populations
No ratings yet
Chapter 4 Hypotheses Testing of More Than Two Populations
90 pages
Business Statistics Ii B.com 2ND Year Question Bank
No ratings yet
Business Statistics Ii B.com 2ND Year Question Bank
8 pages
ANOVA
No ratings yet
ANOVA
23 pages
Use of F Distribution (Analysis of Variance (ANOVA) )
No ratings yet
Use of F Distribution (Analysis of Variance (ANOVA) )
10 pages
Game Theory: Backward Induction
No ratings yet
Game Theory: Backward Induction
11 pages
Inference About One Population Variance: Outline
No ratings yet
Inference About One Population Variance: Outline
10 pages
Anova
No ratings yet
Anova
13 pages
Stat For Management CH 5
No ratings yet
Stat For Management CH 5
7 pages
Research Methodology and Biostatistics Unit II Part 3, F-Test ANOVA - One Way and Two Way Classification
No ratings yet
Research Methodology and Biostatistics Unit II Part 3, F-Test ANOVA - One Way and Two Way Classification
18 pages
ANOVA: One-Way and Two-Way Analysis
No ratings yet
ANOVA: One-Way and Two-Way Analysis
26 pages
Regression Analysis Guide
No ratings yet
Regression Analysis Guide
25 pages
Statistics Final Project
No ratings yet
Statistics Final Project
8 pages
AP Statistics Worksheet Residuals and Least Squares
No ratings yet
AP Statistics Worksheet Residuals and Least Squares
3 pages
Chi-Square Test, F Test and T Test Examples and Formulas
No ratings yet
Chi-Square Test, F Test and T Test Examples and Formulas
12 pages
Stats 6th
No ratings yet
Stats 6th
132 pages
Panel Data Model Princeton 101 SHORT
No ratings yet
Panel Data Model Princeton 101 SHORT
29 pages
Lecture 13 - Analysis of Varaince ANOVA
No ratings yet
Lecture 13 - Analysis of Varaince ANOVA
23 pages
Chapter 4 MLR
No ratings yet
Chapter 4 MLR
17 pages
RM Unit 4 - Overview
No ratings yet
RM Unit 4 - Overview
62 pages
06 - Banerjee and Banerjee - Business Analytics - Ch06
No ratings yet
06 - Banerjee and Banerjee - Business Analytics - Ch06
21 pages
Anova Lecture Nites
No ratings yet
Anova Lecture Nites
30 pages
As Las 4
No ratings yet
As Las 4
6 pages
Regn Lect 7
No ratings yet
Regn Lect 7
26 pages
Regn Lect 4
No ratings yet
Regn Lect 4
9 pages
Lesson 4 Analysis of Variance
No ratings yet
Lesson 4 Analysis of Variance
50 pages
Lecture 12
No ratings yet
Lecture 12
67 pages
Unit 2 MCQ
No ratings yet
Unit 2 MCQ
5 pages
Research Statistics Lesson 5
No ratings yet
Research Statistics Lesson 5
11 pages
RM Chapter-6
No ratings yet
RM Chapter-6
15 pages
Two ANOVA
No ratings yet
Two ANOVA
16 pages
BAI405D Syllabus
No ratings yet
BAI405D Syllabus
3 pages
ANOVA - Modified
No ratings yet
ANOVA - Modified
53 pages
IE266 S25 Week11 - Redaksiyon
No ratings yet
IE266 S25 Week11 - Redaksiyon
71 pages
Statistics and Quantitative Methods
No ratings yet
Statistics and Quantitative Methods
11 pages
Using R For Bayesian Spatial and Spatio Temporal Health Modeling - 1st Edition High-Resolution PDF Download
100% (1)
Using R For Bayesian Spatial and Spatio Temporal Health Modeling - 1st Edition High-Resolution PDF Download
16 pages
Maths Assignment 2
No ratings yet
Maths Assignment 2
10 pages
ANOVA
No ratings yet
ANOVA
12 pages
Agrc 212 Lecture Six - June 2023
No ratings yet
Agrc 212 Lecture Six - June 2023
44 pages

Report

Uploaded by

Report

Uploaded by

Question 1.

Correlation between Diversity and Migration Distance.

Scatterplot for Distance and Diversity

0 5000 10000 15000 20000 25000 30000

Regression Analysis: Diversity versus Distance

Regression 1 0.112752 0.112752 242.66 0.000

0.0215558 84.65 84.30% 80.15%

Constant 0.79861 0.00567 140.90 0.000

7 0.68880 0.62301 0.06579 3.24 R

Normal probability plot for residuals.

b. State the statistical model used and the Regression Equation

The statistical model corresponds to the following expression:

u: unobserved random error or disturbance term.

The regression equation is given by:

Ho: β0 = β2 = 0 vs. H1: Not all the βi are 0

For constant, Ho: β0 = 0 vs. H1: β0 ≠ 0

For distance, Ho: β1 = 0 vs. H1: β1 ≠ 0

Confidence interval and prediction interval for diversity.

Prediction for Diversity

0.683409 0.0041718 (0.675002, (0.639160, 0.727658)

a. State the model you are fitting,

y ij: j-th observation of the region i.

μi: mean of diversity for the i-th region.

ϵ ij : random error component.

Histogram Versus Order

Residual plots for diversity.

c. State the hypotheses you tested and your conclusions

The following hypotheses are tested:

Ho: µ1 = µ2 = µ3 = µ4 = µ5 = µ6 vs. H1: µi = µj for at least one pair (i, j).

Analysis of Variance and model summary.

Source DF Adj SS Adj MS F-Value P-Value

Regio 5 0.10700 0.02140 32.67 0.000

0.0255923 80.33 77.87% 70.62%

Descriptive statistics by region.

Region N Mean StDev 95% CI

Africa 5 0.77608 0.0072 (0.75295, 0.79921)

Mean comparison by region.

Tukey Pairwise Comparisons

Africa Americas Asia Europe Middle_East Oceania

Interval plot of diversity by region.

You might also like