Research made simple
Understanding and interpreting regression analysis
Evid Based Nurs: first published as 10.1136/ebnurs-2021-103425 on 8 September 2021. Downloaded from http://ebn.bmj.com/ on June 14, 2025 by guest.
Parveen Ali ,1,2 Ahtisham Younas 3,4
10.1136/ebnurs-2021-103425 Introduction Linear regression and interpretation
A nurse educator is interested in finding out the Linear regression analysis involves examining the rela-
1 academic and non- academic predictors of success in tionship between one independent and dependent vari-
School of Nursing and
Midwifery, University of nursing students. Given the complexity of educational able. Statistically, the relationship between one inde-
Sheffield, Sheffield, UK and clinical learning environments, demographic, clin- pendent variable (x) and a dependent variable (y) is
Protected by copyright, including for uses related to text and data mining, AI training, and similar technologies.
2
Sheffiled University ical and academic factors (age, gender, previous educa- expressed as: y= β0+ β1x+ε. In this equation, β0 is the y
Interpersonal Violence Research tional training, personal stressors, learning demands, intercept and refers to the estimated value of y when x is
Group, The University of Sheffiled motivation, assignment workload, etc) influencing equal to 0. The coefficient β1 is the regression coefficient
SEAS, Sheffield, UK nursing students’ success, she was able to list various and denotes that the estimated increase in the dependent
3
Faculty of Nursing, Memorial potential factors contributing towards success relatively variable for every unit increase in the independent vari-
University of Newfoundland, easily. Nevertheless, not all of the identified factors will able. The symbol ε is a random error component and
St. John's, Newfoundland and
be plausible predictors of increased success. Therefore, signifies imprecision of regression indicating that, in
Labrador, Canada
4 she could use a powerful statistical procedure called actual practice, the independent variables are cannot
Swat College of Nursing,
Mingora, Swat, Pakistan regression analysis to identify whether the likelihood of perfectly predict the change in any dependent variable.1
increased success is influenced by factors such as age, Multiple linear regression follows the same logic as
stressors, learning demands, motivation and education. univariate linear regression except (a) multiple regres-
Correspondence to: sion, there are more than one independent variable and
Ahtisham Younas, Memorial (b) there should be non-collinearity among the inde-
University of Newfoundland, St.
What is regression?
pendent variables.
John's, NL A1C 5S7, Canada; Regression analysis allows for investigating the rela-
ay6133@mun.c a tionship between variables.1 Usually, the variables
are labelled as dependent or independent. An inde- Factors affecting regression
pendent variable is an input, driver or factor that has Linear and multiple regression analyses are affected by
an impact on a dependent variable (which can also be factors, namely, sample size, missing data and the nature
called an outcome). For example, if we were to say of sample.2
age affects academic performance of students, what ►► Small sample size may only demonstrate connec-
will be the independent and dependent variables here? tions among variables with strong relationship.
Well here age is an independent variable, and it has Therefore, sample size must be chosen based on
the potential to impact on outcome/dependent vari- the number of independent variables and expect
able—in this case, academic performance. Similarly, strength of relationship.
in the nurse educator's example, critical thinking is a ►► Many missing values in the data set may affect the
dependent variable and age, experience and training sample size. Therefore, all the missing values should
are independent variables. be adequately dealt with before conducting regres-
sion analyses.
►► The subsamples within the larger sample may mask
Purposes of regression analysis
the actual effect of independent and dependent var-
Regression analysis has four primary purposes: descrip-
iables. Therefore, if subsamples are predefined, a re-
tion, estimation, prediction and control.12 By descrip-
gression within the sample could be used to detect
tion, regression can explain the relationship between
true relationships. Otherwise, the analysis should be
dependent and independent variables. Estimation means
undertaken on the whole sample.
that by using the observed values of independent varia-
bles, the value of dependent variable can be estimated.2
Example
Regression analysis can be useful for predicting the
Building on her research interest mentioned in the
outcomes and changes in dependent variables based on
beginning, let us consider a study by Ali and Naylor.4
the relationships of dependent and independent varia-
They were interested in identifying the academic and
bles. Finally, regression enables in controlling the effect
non-academic factors which predict the academic
of one or more independent variables while investi-
success of nursing diploma students. This purpose is
gating the relationship of one independent variable with
consistent with one of the above-mentioned purposes
the dependent variable.1
of regression analysis (ie, prediction). Ali and Naylor’s
chosen academic independent variables were preadmis-
Types of regression analyses sion qualification, previous academic performance and
There are commonly three types of regression analyses, school type and the non-academic variables were age,
namely, linear, logistic and multiple regression. The gender, marital status and time gap. To achieve their
differences among these types are outlined in table 1 in purpose, they collected data from 628 nursing students
terms of their purpose, nature of dependent and inde- between the age range of 15–34 years. They used both
pendent variables, underlying assumptions, and nature linear and multiple regression analyses to identify the
of curve.1 3 However, more detailed discussion for linear predictors of student success. For analysis, they exam-
regression is presented as follows. ined the relationship of academic and non- academic
116 Evid Based Nurs October 2021 | volume 24 | number 4 |
Research made simple
Table 1 Comparison of linear, logistic and multiple regression
Linear Logistic Multiple
Evid Based Nurs: first published as 10.1136/ebnurs-2021-103425 on 8 September 2021. Downloaded from http://ebn.bmj.com/ on June 14, 2025 by guest.
Purpose
Examines the relationship between Calculates the likelihood of event with It is an extension of simple linear
one independent variables with one binary outcome (ie, yes or no) regression and examines the relationship
dependent continuous variable between one or more independent and
dependent variables simultaneously
Nature of dependent and independent variables
1. Dependent variable should be 1. Dependent variable should be 1. Dependent variables should be
continuous categorial continuous
Protected by copyright, including for uses related to text and data mining, AI training, and similar technologies.
2. Independent variables could be at any 2. Independent variables could be at any 2. Independent variables could be at any
level of measurement level of measurement level of measurement
Assumptions
1. Assumes that the distribution of 1. Assumes that the distribution of 1. Assumes that the distribution of
dependent data is normal or Gaussian dependent data is binomial. dependent data is normal or Gaussian
2. Requires a linear relationship between 2. It does not require a linear 2. Requires a linear relationship between
dependent and independent variables relationship between dependent and dependent and independent variables
independent variables 3. The independent variables should
3. The independent variables should not not be correlated. Higher correlation
be correlated among the independent variables
may affect the relationship between
independent and dependent variable
Nature of curve
It uses a straight line It uses an S-curve It uses a straight line
Example
Examining the relationship between hours Estimating the likelihood of development Examining the relationship between hours
of training and levels of patient self-care of pressure ulcers (dichotomous outcome: of training and patient self-care levels
and predict how long training should last yes or no) due to longer hospital stay, while controlling for other variables (eg,
for every unit increase in self-care levels number of times of positioning, BMI (Body family support, duration of disease) that
Mass Index) and age may affect the relationship
variables across different years of study and noted that in one variable results in an increase in the other) and
academic factors accounted for 36.6%, 44.3% and 50.4% negative (ie, increase in one variable results in decrease
variability in academic success of students in year 1, in the other).
year 2 and year 3, respectively.4 Table 2 presents the results of regression analysis
Ali and Naylor presented the relationship among
for academic and non- academic variables for year 4
these variables using scatter plots, which are commonly
students’ success. The significant predictors of student
used graphs for data display in regression analysis—see
examples of various scatter plots in figure 1.4 In a scatter success are denoted with a significant p value. For
plot, the clustering of the dots denoted the strength of every, significant predictor, the beta value indicates the
relationship, whereas the direction indicates the nature percentage increase in students’ academic success with
of relationships among variables as positive (ie, increase one unit increase in the variable.
Figure 1 An Example of Scatter Plot for Regression.
Evid Based Nurs October 2021 | volume 24 | number 4 | 117
Research made simple
or more variables when researchers are interested in
Table 2 Regression model for the final year students
(N=343) examining the relationship among specific variables.
Some of the key considerations are presented that may
Evid Based Nurs: first published as 10.1136/ebnurs-2021-103425 on 8 September 2021. Downloaded from http://ebn.bmj.com/ on June 14, 2025 by guest.
Variables β SE (β) P value
be useful for researchers undertaking regression anal-
Academic variables ysis. While planning and conducting regression anal-
Entry qualification ysis, researchers should consider the type and number
Intermediate science 5.47 1.054 <0.001** of dependent and independent variables as well as the
Intermediate arts/
4.436 1.214 <0.001** nature and size of sample. Choosing a wrong type of
commerce
regression analysis with small sample may result in erro-
Matric science 2.041 0.81 0.004**
(ref=matric arts)
neous conclusions about the studied phenomenon.
Protected by copyright, including for uses related to text and data mining, AI training, and similar technologies.
Previous academic 0.344 0.057 <0.001** Twitter Parveen Ali @parveenazamali and Ahtisham
performance (%) Younas @@Ahtisham04
Schools type
Private 5.086 0.667 <0.001**
Funding The authors have not declared a specific
(ref=public)
grant for this research from any funding agency in the
Non-academic variables
public, commercial or not-for-profit sectors.
Gender
Male 1.398 4.006 0.727 Competing interests None declared.
(ref=female)
Patient consent for publication Not required.
Place of domicile
Urban 4.268 3.675 0.246 Provenance and peer review Commissioned; internally
(ref=rural area)
peer reviewed.
Interaction: gender −0.0953 0.065 0.146
and previous academic
© Author(s) (or their employer(s)) 2021. No commercial
performance re-use. See rights and permissions. Published by BMJ.
Interaction: place of domicile −0.0512 0.062 0.401
and previous academic ORCID iDs
performance Parveen Ali http://orcid.org/0000-0002-7839-8130
Interaction between gender and previous academic Ahtisham Younas http://orcid.org/0000-0003-0157-
performance=0.146. 5319
Interaction between place of domicile and previous academic
performance=0.401
**p<0.01. References
1 Montgomery DC, Peck EA, Vining GG. Introduction to linear
regression analysis. John Wiley & Sons, 2012.
Conclusions
2 Schneider A, Hommel G, Blettner M. Linear regression analysis:
Regression analysis is a powerful and useful statistical
part 14 of a series on evaluation of scientific publications. Dtsch
procedure with many implications for nursing research. Arztebl Int 2010;107:776.
It enables researchers to describe, predict and estimate 3 Hilbe JM. Practical guide to logistic regression. CRC Press, 2016.
the relationships and draw plausible conclusions about 4 Ali PA, Naylor PB. Association between academic and non-
the interrelated variables in relation to any studied academic variables and academic success of diploma nursing
phenomena. Regression also allows for controlling one students in Pakistan. Nurse Educ Today 2010;30:157–62.
118 Evid Based Nurs October 2021 | volume 24 | number 4 |