Simple Linear Regression
Excell
Girth Yield
50 480
45 375
62 500
78 650
55 440
40 400
52 468
57 513
45 408
66 540
R Studio
Reg_data=read.table("clipboard",header=1)
plot(Girth~Yield,data=Reg_data,xlab="Yield",ylab="Girth",xlim=c(40,80))
plot(Reg_data)
lm1.fit=lm(Girth~Yield,data=Reg_data)
lm1.fit
summary(aov(lm1.fit))
par(mfrow=c(2,2))
plot(lm1.fit)
pred=predict(lm1.fit,interval = "predict")
pred
abline(lm1.fit,col="Blue")
ggplot(Reg_data, aes(x = Yield, y = Girth)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(x = "Yield", y = "Girth", title = "Scatter Plot with Regression Line")
R Studio Results
> Reg_data=read.table("clipboard",header=1)
> plot(Girth~Yield,data=Reg_data,xlab="Yield",ylab="Girth",xlim=c(40,80))
> plot(Reg_data)
>
> lm1.fit=lm(Girth~Yield,data=Reg_data)
> lm1.fit
Call:
lm(formula = Girth ~ Yield, data = Reg_data)
Coefficients:
(Intercept) Yield
-8.7523 0.1335
>
> summary(aov(lm1.fit))
Df Sum Sq Mean Sq F value Pr(>F)
Yield 1 1039.2 1039.2 67.71 3.56e-05 ***
Residuals 8 122.8 15.3
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> par(mfrow=c(2,2))
> plot(lm1.fit)
>
> pred=predict(lm1.fit,interval = "predict")
Warning message:
In predict.lm(lm1.fit, interval = "predict") :
predictions on current data refer to _future_ responses
> pred
fit lwr upr
1 55.34721 45.87153 64.82288
2 41.32544 31.10463 51.54625
3 58.01802 48.50517 67.53087
4 78.04911 66.58164 89.51659
5 50.00558 40.42758 59.58358
6 44.66396 34.75591 54.57200
7 53.74472 44.26301 63.22642
8 59.75405 50.18566 69.32243
9 45.73228 35.90759 55.55697
10 63.35964 53.59914 73.12015
>
> abline(lm1.fit,col="Blue")
>
> ggplot(Reg_data, aes(x = Yield, y = Girth)) +
+ geom_point() +
+ geom_smooth(method = "lm", se = FALSE, color = "red") +
+ labs(x = "Yield", y = "Girth", title = "Scatter Plot with Regression Line
")
`geom_smooth()` using formula = 'y ~ x
Interpretation
Standardized residuals
Residuals vs Fitted Q-Q Residuals
1.5
6
5 5
Residuals
0.0
-6 -2
-1.5
6 1 6
1
40 50 60 70 -1.5 -0.5 0.5 1.5
Fitted values Theoretical Quantiles
Standardized residuals
Standardized residuals
Scale-Location Residuals vs Leverage
1.2
1.5
6 5 1
5
2
0.5
-1.5 0.0
0.6
1
0.0
Cook's
6 distance
40 50 60 70 0.0 0.2 0.4 0.6
Fitted values Leverage
According to the residual analysis, there is no reason for reject the model, so this is not broken the
underline the assumption.
> summary(aov(lm1.fit))
Df Sum Sq Mean Sq F value Pr(>F)
Yield 1 1039.2 1039.2 67.71 3.56e-05 ***
Residuals 8 122.8 15.3
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
According to this results value of intercept and p value of Yield are less than 0.05 therefore Yield variable
significant.
Therefore, the regression model is as follow,