0% found this document useful (0 votes)

19 views4 pages

110 MultipleRegression

The document outlines a community project aimed at encouraging academics to share statistical support resources, specifically focusing on multiple linear regression in R using a dataset on birthweights. It details the methodology for analyzing relationships between birthweight and various independent variables, including gestational age and maternal factors, while also addressing assumptions of regression analysis. Key findings indicate significant relationships between the predictors and birthweight, with an adjusted R² value of 0.58, suggesting that the model explains a substantial portion of the variation in birthweight.

Uploaded by

Cain Bure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views4 pages

110 MultipleRegression

Uploaded by

Cain Bure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

community project

encouraging academics to share statistics support resources

All stcp resources are released under a Creative Commons licence

stcp-karadimitriou-regressionMultR

The following resources are associated: Dataset ‘Birthweight_reduced.csv’ and Multiple regression script file.
Scatterplots, Correlation, Simple linear regression and Checking normality in R resources.

Multiple linear regression in R

Dependent variable: Continuous (scale)
Independent variables: Continuous (scale) or binary (e.g. yes/no)
Common Applications: Regression is used to (a) look for significant relationships between two
variables or (b) predict a value of one variable for given values of the others.
Data: The data set ‘Birthweight_reduced.csv’ contains details of 42 babies and their parents at
birth. The dependant variable is Birth weight (lbs) and the independent variables on this sheet are
gestational age of the baby at birth (in weeks) and variables relating to the mother (mothers’ height
and weight as well as whether or not she smokes).

Weight of
Birthweight Gestation smoker motheragemnocig
Mother mheight mppwt
5.80 33 0 24 0 58 99 mother before
smokes = 1 pregnancy
4.20 33 1 20 7 63 109
6.40 34 0 26 0 65 140

The Simple linear regression in R resource should be read before using this sheet.
Open the birthweight reduced dataset from a csv file and call it birthweightR then attach the data
so just the variable name is needed in commands.
birthweightR<-read.csv("D:\\Birthweight reduced.csv",header=T)
attach(birthweightR)

Tell R that ‘smoker’ is a factor and attach labels to the categories e.g. 1 is smoker.
smoker<-factor(smoker,c(0,1),labels=c('Non-smoker','Smoker'))

Assumptions for regression

All the assumptions for simple regression (with one independent variable) also apply for multiple
regression with one addition. If two of the independent variables are highly related, this leads to a
problem called multicollinearity. This causes problems with the analysis and interpretation. To
investigate possible multicollinearity, first look at the correlation coefficients for each pair of
continuous (scale) variables. Correlations of 0.8 or above suggest a strong relationship and only
one of the two variables is needed in the regression analysis.

© Sofia Maria Karadimitriou and Ellen Marshall Reviewer: Jim Bull

University of Sheffield University of Swansea
Multiple regression in R

First produce a table of Pearson’s correlation coefficients rounded to two decimal places:
round(cor(cbind(Birthweight,Gestation,mheight,mppwt)),2)

Gestational age has the

strongest relationship with
birthweight (r = 0.71) and
weight and height are
moderately related to
birthweight. Maternal weight
and height are strongly related to each other (r = 0.69) but this is not above 0.8. R also provides a
measure of multicollinearity called the Variance Inflation Factor (VIF) which assesses the
relationships between each independent variable and all the other variables.

Scatterplots should be produced for each independent with the dependent so see if the relationship
is linear (scatter forms a rough line). Binary variables can be distinguished by different markers on
scatterplots which helps to investigate patterns within groups.

To produce multiple scatterplots identifying smokers with red circles:

pairs(~Birthweight+Gestation+mheight+mppwt,main='Birth weight
scatterplots',col=c('red','blue')[smoker],pch=c(1,4)[smoker])

There are no non-linear patterns

Birth weight scatterplo between any pair of variables.
34 38 42 100 130 160
The relationship between
gestational age and birthweight is
10

the strongest but there is also a

Birthw moderate/ strong relationship

between two of the independent

variables (weight and height of

the mother). The babies of

Gestat
smokers (shown using red circles)
34

tend to be lighter at each

gestational age.
64

mheig
58
140

mppwt
100

4 6 8 10 58 64 70

Steps in R
Fit the regression model using the lm(dependent~Independent1+ independent 2)
command and give it a name (reg2). Then request the regression output using summary().
reg2<-lm(Birthweight~Gestation+smoker+mppwt)
summary(reg2)

statstutor community project www.statstutor.ac.uk

Multiple regression in R

Output
The Coefficients table contains the coefficients for the regression equation (model), tests of
significance for each variable and R squared value.

P-value for gestation

after controlling for
smoking and weight of
mother
p < 0.001
*** = highly significant

The last column contains the p-values for each of the independent variables. The hypothesis
being tested for each is that the coefficient (B) is 0 after controlling for the other variables. For
example, the effects of gestational age and smoking are removed before assessing the
relationship between the weight of the mother and the weight of the baby. A p-value < 0.05,
provides evidence that the coefficient is different to 0. Gestational age (p < 0.001), smoker (p =
0.017) and mothers’ pre-pregnancy weight (p = 0.03) are all significant predictors of birthweight. If
the independent value is significant, explain the relationship between the independent and
dependent variables using the Estimate column.

The Estimate column in the coefficients table, gives us the coefficients for each independent
variable in the regression model. The model is:
Birthweight (y) = -7.165 + 0.313 *(Gestation) – 0.665*(Smoker) + 0.02*(mppwt)
For gestation, there is a 0.313 lb increase in birthweight for each extra week of gestation. For each
extra pound (lb) a mother weighs, the baby’s weight increases by 0.02 lbs. A binary variable such
as Smoker coded as 0 and 1, the coefficient only applies for the group coded as 1. Here smokers
have babies who weigh 0.665 lbs less than non-smokers.

The R2 value increases with the number of independent variables so it is better to use the adjusted
R squared value especially when comparing models. The adjusted R2 indicates that 57% of the
variation in birth weight can be explained by the model containing gestation, smoker and pre-
pregnancy weight which is quite high so predictions from the regression equation are fairly reliable.

Checking the assumptions

Plots will help check the assumptions of normality and homoscedasticity.
First produce a histogram of standardised residuals to check the assumption of normality.
hist(resid(reg1),main='Histogram of residuals',xlab='Standardised
Residuals',ylab='Frequency')
The fitted values and residuals plot to check the assumption of homoscedasticity.
plot(reg1, which = 1)

statstutor community project www.statstutor.ac.uk

Multiple regression in R

The residuals are approximately

Histogram of residuals
normally distributed so the
Residuals vs Fitted assumption of normality has been
10

2
37 met. We expect 5% of standardised
residuals to be outside ±1.96 but if
8

23
Frequency

Residuals

1
there are more than this or if there
6

extreme residuals outside±3, run the

0
4

regression with and without the

-1
30
extreme values to see if the
0

coefficients of the model change

-2 -1 0 1 2 5 6 7 8 9
much. There is no pattern in the
Standardised residuals Fitted values
scatter of the fitted values and
residuals. The width of the scatter as predicted values increase is roughly the same so the
assumption of homoscedasticity has been met.
Collinearity statistics measure the relationship between multiple independent variables by giving a
score for each independent. The "tolerance" is an indication of the percent of variance in an
independent that cannot be accounted for by the other independent variables, hence very small
values indicate that an independent variable is redundant. The VIF, which stands for variance
inflation factor, is (1 / tolerance). The VIF scores should be close to 1 but under 5 is fine and 10+
suggests high collinearity so the variable may not be needed. All the values in this analysis have
scores close to 1.

To calculate the Variance Inflation Factors the library car must be loaded. library(car)
If this command does not work, you will need to select Packages --> Install package(s) then the UK
(London) CRAN mirror and choose car from the list. For Rstudio, use Tools  Install packages.
library(car)
Calculate the VIF for each variable.
vif(reg2)
The VIF scores should be close to 1 but under 5 is fine
and 10+ suggests high collinearity so the variable may
not be needed. All the values in this analysis have
scores close to 1.

Note: Some institutions are using early versions of car where the vif command does not work.
If this is the case, try installing the usdm package instead and library(usdm)
You will need to format independent variables as a data frame.
independents<-data.frame(cbind(Gestation,smoker,mppwt))
Then request the VIF scores for the independent variables using the command
vif(independents)

Reporting regression
Multiple linear regression was carried out to investigate the relationship between gestational age at
birth (weeks), mothers’ pre-pregnancy weight and whether she smokes and birth weight (lbs).
There was a significant relationship between gestation and birth weight (t = 5.926, p < 0.001),
smoking and birth weight (t = 2.485, p = 0.017) and pre-pregnancy weight and birth weight (t =
2.261, p = 0.03). For gestation, there was a 0.313 lb increase in birthweight for each extra week of
gestation. For each extra pound (lb) a mother weighs, the baby’s weight increases by 0.02 lbs and
smokers have babies who weigh 0.665 lbs less than non-smokers. The adjusted R2 value was
0.58 so 58% of the variation in birth weight can be explained by the model containing gestation,
pre-pregnancy weight and whether the mother smokes or not. The data met the assumptions of
homogeneity of variance and linearity and the residuals were approximately normally distributed.

statstutor community project www.statstutor.ac.uk

Six Sigma A Complete Step by Step Guide
100% (2)
Six Sigma A Complete Step by Step Guide
299 pages
Statistical Modelling For Biomedical Researchers
100% (2)
Statistical Modelling For Biomedical Researchers
544 pages
Specialty Chemical Production Analysis
No ratings yet
Specialty Chemical Production Analysis
8 pages
Quantitative Methods in Procurement
No ratings yet
Quantitative Methods in Procurement
15 pages
How to Open a PayPal Account Anonymously
100% (1)
How to Open a PayPal Account Anonymously
5 pages
30C00200 Problem Set 1
No ratings yet
30C00200 Problem Set 1
4 pages
Amazon Vs Walmart Fighting It Out Online On Price
No ratings yet
Amazon Vs Walmart Fighting It Out Online On Price
5 pages
Multiple Regression Project
33% (3)
Multiple Regression Project
10 pages
Introductory Econometrics
No ratings yet
Introductory Econometrics
1 page
Biostatistics (Correlation and Regression)
100% (1)
Biostatistics (Correlation and Regression)
29 pages
07 - Inference For Numerical Data
No ratings yet
07 - Inference For Numerical Data
3 pages
The Economist
No ratings yet
The Economist
27 pages
Community Project: Simple Linear Regression in SPSS
No ratings yet
Community Project: Simple Linear Regression in SPSS
4 pages
STATS301 PS1 Corrected
No ratings yet
STATS301 PS1 Corrected
7 pages
KITI FHK Technik 2015 Engl INT PDF
No ratings yet
KITI FHK Technik 2015 Engl INT PDF
140 pages
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
No ratings yet
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
4 pages
77 MultipleRegression
No ratings yet
77 MultipleRegression
4 pages
Class Xiii: Controlling For Other Variables
No ratings yet
Class Xiii: Controlling For Other Variables
33 pages
The 7 Balkan Conference On Operational Research Constanta, May 2005, Romania
No ratings yet
The 7 Balkan Conference On Operational Research Constanta, May 2005, Romania
11 pages
330 Lecture18 2014
No ratings yet
330 Lecture18 2014
24 pages
RESEARCH METHODS LESSON 18 - Multiple Regression
No ratings yet
RESEARCH METHODS LESSON 18 - Multiple Regression
6 pages
Introductory Econometrics: Your Friendly Teaching Team
No ratings yet
Introductory Econometrics: Your Friendly Teaching Team
23 pages
Assignment IV Probability
No ratings yet
Assignment IV Probability
18 pages
Counter Rust 7010 TDS
No ratings yet
Counter Rust 7010 TDS
2 pages
120.508 Module 8 Multiple Regression (PDF Full Page Color)
No ratings yet
120.508 Module 8 Multiple Regression (PDF Full Page Color)
52 pages
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
BSC 6000
No ratings yet
BSC 6000
54 pages
Linear Regression and Correlation
No ratings yet
Linear Regression and Correlation
99 pages
Chapter Fourteen: Multiple Regression and Correlation Analysis
No ratings yet
Chapter Fourteen: Multiple Regression and Correlation Analysis
27 pages
Further Statistics
No ratings yet
Further Statistics
31 pages
Interactions Stata R20170622
No ratings yet
Interactions Stata R20170622
15 pages
SPSS and Building Models
No ratings yet
SPSS and Building Models
36 pages
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
No ratings yet
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
1,472 pages
Notes 11
No ratings yet
Notes 11
23 pages
Solution Practice 6 Consolidations 3
No ratings yet
Solution Practice 6 Consolidations 3
8 pages
PHC121-Paper Assignment - 2nd Sem 23-24
No ratings yet
PHC121-Paper Assignment - 2nd Sem 23-24
3 pages
D Linear Regression With R
No ratings yet
D Linear Regression With R
9 pages
Multiple Linear Regression (Multiple Regression Analysis)
No ratings yet
Multiple Linear Regression (Multiple Regression Analysis)
37 pages
Problem Set 2: General Guideline
No ratings yet
Problem Set 2: General Guideline
2 pages
Chapter 3 MLR
No ratings yet
Chapter 3 MLR
40 pages
Econometrics Analysis Guide
No ratings yet
Econometrics Analysis Guide
14 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
57 pages
Chapter 14 Multiple Regression
No ratings yet
Chapter 14 Multiple Regression
28 pages
Assignment 7-Inference-for-Numerical-Data
No ratings yet
Assignment 7-Inference-for-Numerical-Data
5 pages
Call Center Skills Workshop
No ratings yet
Call Center Skills Workshop
14 pages
Development Strategy And: Economic Reform
No ratings yet
Development Strategy And: Economic Reform
436 pages
PPOL205 Assignment 4
No ratings yet
PPOL205 Assignment 4
6 pages
Psy 234 Investigating Relationships Week 11
No ratings yet
Psy 234 Investigating Relationships Week 11
37 pages
HUAWEI MateView GT Quick Start Guide - (01, En-Us, Zhuque)
No ratings yet
HUAWEI MateView GT Quick Start Guide - (01, En-Us, Zhuque)
41 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
79 pages
SDS - Barrier 90 - Comp. B - Marine - Protective - English (Uk) - Australia - 2524 - 30.10.2012
No ratings yet
SDS - Barrier 90 - Comp. B - Marine - Protective - English (Uk) - Australia - 2524 - 30.10.2012
7 pages
Chuks
No ratings yet
Chuks
4 pages
Unit 4-1
No ratings yet
Unit 4-1
29 pages
Corelation With Example
No ratings yet
Corelation With Example
112 pages
Lec Topic6
No ratings yet
Lec Topic6
33 pages
Bank Deposit Secrecy Law Overview
No ratings yet
Bank Deposit Secrecy Law Overview
7 pages
Lec 1 Cost Est
No ratings yet
Lec 1 Cost Est
42 pages
Week 2 Hebron International or Global Political Economy
No ratings yet
Week 2 Hebron International or Global Political Economy
16 pages
Developing Human Resources Through Educational Institute in Bangladesh
100% (4)
Developing Human Resources Through Educational Institute in Bangladesh
13 pages
5 Predicting Data
No ratings yet
5 Predicting Data
7 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
39 pages
78 Outliers Etc
No ratings yet
78 Outliers Etc
4 pages
Week 2 China's Development Path by Michael Dunford
No ratings yet
Week 2 China's Development Path by Michael Dunford
31 pages
Hypothesis Testing in R
No ratings yet
Hypothesis Testing in R
13 pages
Lecture Oct 2 2024 Ab
No ratings yet
Lecture Oct 2 2024 Ab
15 pages
15multiple Linear Regression
No ratings yet
15multiple Linear Regression
168 pages
SME Report English
No ratings yet
SME Report English
28 pages
Batocera Installation Guide
No ratings yet
Batocera Installation Guide
14 pages
Escp European Standard Clinical Practice Recommendations For Non Hodgkin Lymphoma of Childhood and
No ratings yet
Escp European Standard Clinical Practice Recommendations For Non Hodgkin Lymphoma of Childhood and
45 pages
Chapter4 Notes
No ratings yet
Chapter4 Notes
18 pages
Week 6 Examining World Bank Lending China Graduation or Modulation
No ratings yet
Week 6 Examining World Bank Lending China Graduation or Modulation
23 pages
Week 9 China Technology Transfer and WTO
No ratings yet
Week 9 China Technology Transfer and WTO
26 pages
Assignment Chapter 4 MULTI LINEAR REGRESSION
No ratings yet
Assignment Chapter 4 MULTI LINEAR REGRESSION
7 pages
Geographies of d/Development
No ratings yet
Geographies of d/Development
18 pages
Week 4 BRICS Countries-Bilateral-Economic-Relations-2009-To-2019-Between-Rhetoric-And-Reality
No ratings yet
Week 4 BRICS Countries-Bilateral-Economic-Relations-2009-To-2019-Between-Rhetoric-And-Reality
16 pages
Week 9 Compulsory Licensing of Medicine in Developing Countries
No ratings yet
Week 9 Compulsory Licensing of Medicine in Developing Countries
6 pages
EE3211 Modelling Techniques
No ratings yet
EE3211 Modelling Techniques
47 pages
Week 5 What You Really Need To Know About The SDR and How To Make It Work For Multilateral F
No ratings yet
Week 5 What You Really Need To Know About The SDR and How To Make It Work For Multilateral F
14 pages
Marx's Impact on Class Struggle
No ratings yet
Marx's Impact on Class Struggle
3 pages
China's Global South Engagement
No ratings yet
China's Global South Engagement
12 pages
Statistical Modeling For Biomedical Researchers: A Simple Introduction To The Analysis of Complex Data (Cambridge Medicine (Paperback) )
100% (27)
Statistical Modeling For Biomedical Researchers: A Simple Introduction To The Analysis of Complex Data (Cambridge Medicine (Paperback) )
23 pages
Avoiding the Middle-Income Trap
No ratings yet
Avoiding the Middle-Income Trap
11 pages
Metrics Practice Test 1 Group 7
No ratings yet
Metrics Practice Test 1 Group 7
6 pages
20231025Group4-NES PPT-Jia
No ratings yet
20231025Group4-NES PPT-Jia
7 pages
Bengtech Metallurgy Extended
100% (1)
Bengtech Metallurgy Extended
2 pages
Mos Word 2016 - Core Practice Exam 3 Training
No ratings yet
Mos Word 2016 - Core Practice Exam 3 Training
9 pages
LHS Cab Fuse & Relay Panel Guide
No ratings yet
LHS Cab Fuse & Relay Panel Guide
1 page
Correlation Regression Tutorial
No ratings yet
Correlation Regression Tutorial
42 pages
RSM1282-2025-Session 6-Multiple Regression POST
No ratings yet
RSM1282-2025-Session 6-Multiple Regression POST
84 pages
L3 Handout Solution
No ratings yet
L3 Handout Solution
5 pages
Mapingure Simple Linear Regression
No ratings yet
Mapingure Simple Linear Regression
23 pages
Covumaiphuongthionline2 - Menhdequanhe
No ratings yet
Covumaiphuongthionline2 - Menhdequanhe
3 pages
1 s2.0 S1877705812011332 Main
No ratings yet
1 s2.0 S1877705812011332 Main
10 pages
2025 Term 1 Calendar of Events
No ratings yet
2025 Term 1 Calendar of Events
2 pages
Biostat 2 Assignment PDF
No ratings yet
Biostat 2 Assignment PDF
32 pages
Scholarship Advert Jan 2025
No ratings yet
Scholarship Advert Jan 2025
1 page
Improving Population Health Using Electronic Health Records: Methods For Data Management and Epidemiological Analysis 1st Edition Goldstein
No ratings yet
Improving Population Health Using Electronic Health Records: Methods For Data Management and Epidemiological Analysis 1st Edition Goldstein
60 pages
Multiple Linear Regression - Prof. Sami Day 1
No ratings yet
Multiple Linear Regression - Prof. Sami Day 1
58 pages
Auditing
No ratings yet
Auditing
54 pages
Procedure For Design and Development
No ratings yet
Procedure For Design and Development
8 pages
Felix Bittmann 2019. Propensity Score Matching
No ratings yet
Felix Bittmann 2019. Propensity Score Matching
14 pages
Solutions Manual To Advanced Regression Models With SAS and R 1st Edition Olga Korosteleva PDF Download
No ratings yet
Solutions Manual To Advanced Regression Models With SAS and R 1st Edition Olga Korosteleva PDF Download
182 pages
One (Financial Well Being) Model Fits All? Testing The Multidimensional Subjective Financial Well Being Scale Across Nine Countries
No ratings yet
One (Financial Well Being) Model Fits All? Testing The Multidimensional Subjective Financial Well Being Scale Across Nine Countries
26 pages
Vacancy Announcement 5 of 2025
No ratings yet
Vacancy Announcement 5 of 2025
2 pages
Per Tanika Publish
No ratings yet
Per Tanika Publish
21 pages
Chitatu 11-WPS Office
No ratings yet
Chitatu 11-WPS Office
5 pages
Analyzing The Implications of The Growing Street Vendor Population in Gweru, Zimbabwe: A Risk Assessment
No ratings yet
Analyzing The Implications of The Growing Street Vendor Population in Gweru, Zimbabwe: A Risk Assessment
7 pages

110 MultipleRegression

Uploaded by

110 MultipleRegression

Uploaded by

community project

encouraging academics to share statistics support resources

Multiple linear regression in R

Assumptions for regression

© Sofia Maria Karadimitriou and Ellen Marshall Reviewer: Jim Bull

Gestational age has the

To produce multiple scatterplots identifying smokers with red circles:

There are no non-linear patterns

the strongest but there is also a

Birthw moderate/ strong relationship

between two of the independent

variables (weight and height of

the mother). The babies of

tend to be lighter at each

statstutor community project www.statstutor.ac.uk

P-value for gestation

Checking the assumptions

statstutor community project www.statstutor.ac.uk

The residuals are approximately

extreme residuals outside±3, run the

regression with and without the

coefficients of the model change

statstutor community project www.statstutor.ac.uk

You might also like