Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
28 views34 pages

Simple Linear Regression Analysis

Uploaded by

Haliyah Musibau
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views34 pages

Simple Linear Regression Analysis

Uploaded by

Haliyah Musibau
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Medical Statistics

Topic : Simple Regression analysis

Lecture by
Dr. Chindo Ibrahim bisallah
MB.BS, MPH, MPA, PhD

Department of Community Medicne, Faculty of


Health Sciences
Ibrahim Badamasi Babangida University Lapai.
REGRESSION
ANALYSIS
Introduction
• Regression analysis is a statistical method used to
establish a relationship between two or more variables.

• It can also be defined as a statistical method to model the


relationship between a dependent variable and one or
more independent variables.

• Purpose: Prediction, inference, and understanding


relationships
Historical Background
Regression analysis has evolved significantly over time:
• 19th Century:
Early correlation studies for understanding relationships between
variables was Pioneered by Francis Galton. He began the exploration of
statistical relationships on heredity and human characteristics
• Early 20th Century:
Karl Pearson and other statisticians formalized the concepts of correlation
and regression, establishing methods that are still in use today.
• Late 20th Century to Present:
With advances in computing power, regression methods have expanded
into complex multivariate and non-linear models, integrating into modern
machine learning techniques.
Basic terminology
• Dependent Variable (Outcome) -The outcome or response
variable that the model seeks to predict or explain
• Independent Variable (Predictor)- The predictor or
explanatory variable used to forecast or explain the
dependent variable
• Coefficient - A numerical value that quantifies the
relationship between an independent variable and the
dependent variable; it indicates the magnitude and direction
of the effect
• Intercept - The expected value of the dependent variable
when all independent variables are zero; essentially, it is the
starting point of the regression line.
Types of regression analysis
1. Linear Regression
Simple Linear Regression
Models the relationship between a single independent variable and a
dependent variable using a straight line.
Multiple Linear Regression
Extends simple linear regression to include multiple independent
variables.
2. Logistic Regression
Used when the dependent variable is categorical (often binary e.g smoker or
not smoker, disease or not disease, dead or alive). It estimates the probability
of a certain class or event. logistic regression is useful in analysing binary
outcomes and identifying factors that influence the probalility of an event
occurring.
3. Polynomial Regression
A form of linear regression where the relationship between the independent
variable(s) and the dependent variable is modeled as an nth degree polynomial
4. Regularized Regression
Ridge Regression and Lasso Regression
Correlation vs Regression
1. The correlation coefficient is independent of the units
of measurement.
2. The regression coefficients (slope and intercept) will
change as units of measurement change

3. Furthermore, the regression of X on Y is not the same as the


regression of Y on X--- which variable is dependent and which is
independent matter

4. In contrast, the correlation of Y with X is the same as the correlation


of X with Y.
5. The main difference is that Regression looks at the change in one
variable ( the response, outcome or dependant variable) that
corresponds to a given change in the other ( explanatory, predictor or
independant varaible)

6. The objective is to predict or estimate the value of the response


associated with a fixed value of the explanatory variable

7. Correlation analysis does not distinguish between the two variables


Simple Vs Multiple Linesr Regression

• A linear regression model attempts to explain the relationship


between two or more variables using a straight line.

• Simple linear regression seeks to predict an outcome


(dependent) variable values from a single independent variable.

• Whereas Multiple linear regression seeks to predict one


outcome (dependent) variable from several independent
variables.

• Here we are examining for a dependence of one variable


(dependent variable) on the other independent variable (s).
• Simple Vs Multiple Linesr Regression contd.

• Two important steps in regression analysis involve examining a


scatter plot of two variables and calculating the correlation
coefficient.
• The relationship is summarized by a regression equation
consisting of a slope and an intercept.

• The equation for the regression line in simple regression is y


= α + βx;
where α = intercept at Y axis when x=0
and β = slope (regression coefficient).
The slope represents the amount the dependent variable increases
with each unit increase in the independent variable.
What to Expect in Regression Analysis
Derive Regression / Prediction Equation

Hypothesis Testing

Regression Slope
What to Expect in Regression Analysis
Derive Regression / Prediction Equation

Hypothesis Testing

Regression Slope
What to Expect
• Derive Regression / Prediction
Equation
Comparable to equation of straight line
y = α + βx
Y

Intercept Slope β Δ
α
Prediction equation is in the form:
---------- Y
y = α + βx where
ΔX
α is the intercept of the line, or the mean value of
the response y when x is equal to 0
β is the slope, or the change in y that corresponds X
to a one unit change in x
Interpret the prediction equation
• The model represents a straight line
• The line y = α + βx, is called the regression line
• The parameters α and β are constants and are the coefficients
of the equation
• α is the intercept of the line, or the mean value of the
response y when x is equal to 0
• β is the slope, or the change in y that corresponds to
a one
unit change in x
• If β is positive, then the expected value of y increases
in magnitude as x increases
• If β is negative, the expected value of y decreases as
x increases
Simple linear regression
• Simple linear regression is a method used to model the relationship
between two continuous variables—one dependent (response) variable
and one independent (predictor) variable.
the Prediction equation = y = β₀ + β₁x + ε
• where:
• y: Dependent variable (the outcome you're trying to predict).
• x: Independent variable (the predictor).
• β₀ (Intercept): The expected value of y when x is 0.
• β₁ (Slope): The change in y for a one-unit change in x.
• ε: The error term accounting for the difference between the observed and
predicted values.
Assumption of simple Linear Regression
1. Continous data (data is interval or ratio).
2. The relationship between the two variables is linear, meaning the
data points on a scatterplot should generally form a straight line
3. No Significant Outliers in your data
4. The data you are analyzing needs to be normally distributed
5. Homoscedasticity (Equal Variances). The spread (variance) of one
variable should be roughly the same across the values of the other
variable
Evaluation of the
Regression Model
• Now that the least squares regression line has been
determined, how well does the model actually fit the
observed data?
• One way to evaluate the fit of a model is to compute
the coefficient of determination
• R2 is the square of the Pearson correlation
coefficient
• R2 can be interpreted as the proportion of the
variability among the observed values of y
that is explained by the linear regression of y
on x
Components of Simple Linear Regression

1. Descriptive Component
• Regression Equation : Ỳ = bo + biXi
• Correlation Coefficient : r
• Coefficient of determination : r2
2. Inferential Component (Hypothesis testing)
- Regression Model
- Slope
bo = y intercept
bi = slope
1. Regression
F RATIO
Model
Source SS df MS F

HYPOTHESIS TESTING
Regression SSR p MSR MSR/ MSE
Error SSE n-p-1 MSE

Total SST n-1

2. Slope t test
Regression Model
This is testing whether there's a relationship between the dependent
variable (Y) and an independent variable (X₁)
Null Hypothesis (H₀), Y=β0​+ei
 Alternative Hypothesis (Hₐ), Y=β0 +β1X1+ei
This assumes that X₁ does influence Y, as indicated by the slope coefficient β1

This explains how hypothesis testing is used in regression to determine if an

through analysis of the slope 𝛽1


independent variable (X₁) significantly predicts the dependent variable (Y),

.
This is a more specific test of whether the slope 𝛽1 (the effect of X₁ on Y)
Slope

is significantly different from zero

• Null Hypothesis (H₀):𝛽1=0


→ No relationship between X₁ and Y.

• Alternative Hypotheses (Hₐ)


β 1=0: There is some relationship
β 1>0: A positive relationship
β 1<0: A negative relationship
r = coefficient correlation
r2 = Coefficient of determination
r2 = SS explained
SS Total

Coefficient of determination r2 is the proportion of


the total variation that could be accounted by the
linear relationship between x and y .

r2
= ... x 100 = .. %
Important steps in regression analysis using SPSS:
a. Conducting a Bivariate Simple Linear Regression Analysis
Select the Analyzed menu, click Regression, then click Linear. Select systolic blood
pressure, then click arrow to move it to the Dependent box. Click weight, then click
arrow to move it to the Independent(s) box. Click Statistics, then click Descriptives.
Make sure that Estimates and Model Fit are also selected. Click Continue, and then
OK.

b. Scatterplot with Regression Line


The result of the regression analysis can be summarized using scatter-plot with
regression line.
Click Graph, then click Scatter. Click Simple, then click Define. Click systolic
blood pressure and click arrow to move it to the Y axis box. Click weight and click
arrow to move it to the X axis box. Click OK.
Once you have created a scatter-plot showing the relationship between the weight and
systolic blood pressure, you can add a regression line by following these steps.
Double click on the chart to select it for editing. Click Add Fit Line at Total to open
the
Properties box. Click Elements and Fit Line and Linear.
c. Assumptions for the linear regression model – residual analysis

For linear regression model to be valid, there are there assumptions to be checked on the
residues:
No outliers.
The data points must be independent.
The distribution of these residuals should be normal with mean = 0 and a constant variance.

i) Checking outlier
Select the Analyzed menu, click Regression, then click Linear. Select systolic blood pressure,
then click arrow to move it to the Dependent box. Click weight, then click arrow to move it to
the Independent(s) box. Click Statistics, then tick on Casewise Diagnostic box……
Our interest is in the Standardised Residual; making sure that the minimum and maximum
values do not exceed ±3.
ii) Checking independence
Run the analyses as above, tick on the Durbin-Watson box after click Statistic….
iii) Checking the normality assumptions of the residuals
Run the analyses as in 3(i), click on the Plot folders, tick on the Histogram and Normal
probability plot to check the normality assumptions of the residues.
iv) Checking for constant variance
Run the analyses as in 3(i), click on the Plot folders, select *ZRESID (Regression Standardized
Residual) into the Y box and *ZPRED (Regression Standardized Predicted Value) into the X
box.
As long as the scatter of the points shows no clear pattern, then we can conclude that the
variance is constant
Q. Hypothesis testing
We wish to test if the sample value of ‗r‘ is of sufficient magnitude that, in the population, SBP and Age
are correlated.

We conduct the hypoteheis test as follow:


1. Data :
2. Assumptions:
For each value of x there is a normally distributed subpopulation of y values
for each value of y there is a normally distribution of x values
the joint distribution of x and y is normally a distribution called the bivariate normal distribution
the subpopulation of y value all have the same variance
the subpopulation of x value all have the same variance
3. Hypothesis : Ho : ρ = 0 and HA = ρ ≠ 0
4. Test statistic: When, ρ = 0 it can be shown that the appropriate test statistic is
5.Distribution of test statistic: When Ho is true and the assumptions are met, the test statistic
is distributed as t distribution with a n-2 degrees of freedom.
6.Decision rule: Let α =0.05, If from our data the computed t value is either greater than or equal to the t
value (+..) in the t table (with the n-2 degrees of freedom) Or less than or equal to (minus …), we reject
the null hypothesis
7. Calculation of t test statistic: Our calculated t value is t =
8.P value : Since our computed t value = ‗……..‘ which is > or < than ‗……‘ (interpolated value from
the table) , we have for this test p <0.05 ( or p> 0.05 depending on the case)
9.Statistic decision: Since the computed value of t does exceed the critical value of t value, we reject the
null hypothesis. However, if the computed value of t does not exceed the critical value of t value, we do
not reject the null hypothesis.
10. Conclusion : We conclude that, in the population, the two variables ‗X‘ and ‗Y‘ is linearly relat9e1d.
Using SPSS to draw a
scatterplot with a
regression line
• Graph  Scatter  Simple  x and y axis
variables  OK
• In the graph, double click then  Options
 reference line
Data of 30 students with their Ages and SBP for Regression Analysis

Age: 25, 30, 22, 45, 50, 35, 40, 28, 33, 38, 60, 55, 48, 42, 26, 31, 29, 37,
43, 34, 27, 36, 39, 32, 41, 46, 44, 52, 49, 24
SBP: 120, 122, 115, 135, 140, 128, 132, 118, 125, 130, 150, 145, 138,
134, 117, 123, 119, 129, 136, 127, 116, 126, 131, 124, 133, 137, 139,
142, 141, 114
REFERENCES
1 Daniel, W.W. (2009). Biostatistics: A Foundation for Analysis in Health
Sciences (9th Edition). Boston: John Wiley and Sons
2. Rosner, B. (2006). Fundementals of Biostatistics (6th Edition).
Duxbury: Thompson Learning Publishers

3. Jennifer Peat , Belinda Barton (2008) Medical Statistics- A Guide to


data analysis. Blackwell Publishers.

4. Jekel J et al. (2007). Epidemiology, Biostatistics and


Preventive Medicine. Third Ed. Philadelphia: Elsevier Saunders

You might also like