Regression Basics

The document provides an overview of predictive analytics, focusing on linear regression and its assumptions, methods, and diagnostics. It explains the significance of R-squared, F-statistic, AIC, and BIC in evaluating model performance, as well as issues like autocorrelation and multicollinearity. Recommendations for improving model reliability include addressing non-normal residuals and refining the model based on diagnostic results.

Uploaded by

bharat.goel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views27 pages

Regression Basics

Uploaded by

bharat.goel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Predictive Analytics Basics: Linear

Regression & Logistic Regression

Minati Rath
Overview of Predictive Analytics
Predictive analytics uses statistical techniques to forecast future
outcomes based on historical data. It is crucial for informed
decision-making in business. Types of predictive models include
regression, classification, and time-series forecasting.
Introduction to Linear Regression
Linear Regression models the relationship between a dependent
variable and one or more independent variables.
The equation:
Y = β0 + β1X + ε.

β0 (intercept) represents the baseline level; β1 (slope) represents

the change in Y for each unit change in X.

Assumptions: Linearity, Independence, Homoscedasticity, and

Normality. These must be met for the model to produce reliable
results.
Assumptions of Linear Regression
Check the linearity between the predictor variables and the response variable

1. Scatter Plots: Between Predictors and Response: Create scatter plots of

each predictor variable against the response variable. You’re looking for a
linear relationship in these plots. If the points roughly form a straight line (or a
cloud of points that is centered around a line), it suggests a linear relationship.
Pairwise Scatter Plots: For multiple predictors, plot each predictor against
every other predictor. This helps identify multicollinearity and whether the
relationship between predictors is linear.

2. Residual Plots:
After fitting a linear regression model, plot the residuals (the differences
between observed and predicted values) against the predicted values or against
each predictor variable.
In a well-fitting linear model, residuals should be randomly scattered without
any clear pattern. If you notice patterns (like curves or trends), it might
indicate that the relationship is not purely linear.
Component + Residual Plots (CERes plots):
CERes plots are used to visualize the relationship between a
predictor and the response variable while accounting for the
effects of other predictors in the model. They can help identify
non-linearity that might not be obvious from simple scatter plots.

Polynomial Regression or Transformation:

Fit a polynomial regression model (e.g., quadratic or cubic) and
compare it to the linear model. If the polynomial model provides
a significantly better fit, this may indicate that the relationship
between the predictor and response variable is non-linear.
Apply transformations to the predictors or the response variable
(like logarithmic or square root transformations) and check if the
linearity improves.
OLS Result Analysis for one X and Y
Model Overview
Dep. Variable: This is the dependent variable Y, the outcome
variable you're trying to predict or explain.
Model: The type of regression model used is OLS (Ordinary Least
Squares). (OLS) is a type of linear regression model that is widely
used in statistical analysis. In OLS, the goal is to find the linear
relationship between a dependent variable Y and one or more
independent variables X₁, X₂, …., Xₙ
The model assumes that this relationship can be expressed as:
𝑌 = 𝛽₀ + 𝛽₁𝑋₁ + 𝛽₂𝑋₂ + ⋯ + 𝛽ₙ𝑋ₙ + 𝜖
Where: Y is the dependent variable,
X₁, X₂, …., Xₙ are the independent variables.
β0 is the intercept (constant term).
β1,β2,…,βn are the coefficients for the independent variables.
ϵ is the error term (residual), representing the difference between the
observed and predicted values of Y.
Method: Least Squares
Least Squares is the method used to estimate the parameters (i.e., the
coefficients β0,β1,…,βn in the OLS regression model.
The "least squares" method aims to find the line (or hyperplane in
higher dimensions) that minimizes the sum of the squared differences
between the observed values of Y and the values predicted by the
model.
Why Minimize the Sum of Squared Errors?
The method works by minimizing the sum of squared errors (SSE),
which is calculated as:
SSE = Σ(Yᵢ − Ŷᵢ)²
Where:
Yᵢ is the observed value of the dependent variable for the i-th
observation.
Ŷᵢ is the predicted value of Y for the i-th observation, calculated as:
Ŷᵢ = β₀ + β₁Xᵢ₁ + β₂Xᵢ₂ + ⋯ + βₙXᵢₙ.
R-squared and Adjusted R-squared

R-squared: R-squared is a statistical measure that indicates the

proportion of the variance in the dependent variable that is
predictable from the independent variable(s).
Interpretation: An R-squared value close to 1 (like 0.934 in this
case) indicates that a large proportion of the variance in the
dependent variable (Y) is explained by the independent variable(s)
(X). This suggests that the model fits the data well.

Adjusted R-squared: Adjusted R-squared adjusts the R-squared

value for the number of predictors in the model. It accounts for the
degrees of freedom associated with adding additional predictors.
R-squared and Adjusted R-squared
Interpretation: The Adjusted R-squared (0.933 in this case) is
slightly lower than the R-squared, which is common. This
adjustment is useful when comparing models with a different
number of predictors, as it penalizes the inclusion of unnecessary
predictors that do not improve the model significantly.
Summary:
R-squared provides a general measure of fit, while Adjusted R-
squared gives a more accurate measure when comparing models
with different numbers of predictors. Both values being high
indicates that the model explains a significant portion of the
variance in the outcome variable.
F-statistic and Prob(F-statistic)
F-statistic: The F-statistic is a measure used in regression analysis to
test the overall significance of the model. It compares the model with
no predictors (the intercept-only model) to the model with the
specified predictors.
Calculation: The F-statistic is calculated by dividing the mean square
of the model by the mean square of the residuals. Mathematically
Interpretation: A higher F-statistic indicates that the model explains
a significant amount of the variation in the dependent variable
compared to the noise (residuals). In the example, an F-statistic of
1386 is very high, suggesting that the model is highly significant.
Prob(F-statistic) or p-value of F-statistic: The p-value associated
with the F-statistic (also known as the Prob(F-statistic)) tells us the
probability of observing an F-statistic as extreme as the one
calculated, assuming that the null hypothesis (that all the regression
coefficients are equal to zero) is true.
F-statistic and Prob(F-statistic)
Interpretation: A very small p-value (like 1.24e-59 in the example)
indicates that the null hypothesis can be rejected, meaning that the
model is statistically significant. In other words, there is an extremely
low probability that the relationship between the dependent and
independent variables is due to random chance.

Summary: The F-statistic measures how well the model fits the data
compared to a model with no predictors. The Prob(F-statistic) or p-
value assesses the statistical significance of this fit. A very low p-
value indicates that the model significantly improves the prediction of
the dependent variable compared to having no predictors
Log-Likelihood, AIC, and BIC
Log-Likelihood: -807.50
The log-likelihood measures how well the model fits the data. In
general, higher (less negative) values of the log-likelihood indicate a
better fit of the model to the data. However, the log-likelihood value
alone does not provide a clear picture of model performance without
comparing it to other models.
Interpretation: The log-likelihood value of -807.50 indicates the
likelihood of observing the data given the model. Without a
benchmark or comparison, it is challenging to interpret this value
alone.
AIC (Akaike Information Criterion): 1619
The AIC is used for model comparison and penalizes for the number
of parameters in the model. It helps to balance goodness-of-fit with
model complexity. Lower AIC values indicate a better model fit
relative to other models.
Log-Likelihood, AIC, and BIC
BIC (Bayesian Information Criterion): 1624
The BIC also balances goodness-of-fit with model complexity, but it
penalizes complexity more strongly than the AIC. Lower BIC values
suggest a better model fit, with a stronger penalty for additional
parameters compared to the AIC.
Interpretation: AIC: 1619 and BIC: 1624 are used to compare the
fit of different models. Both AIC and BIC penalize for model
complexity, with BIC imposing a stricter penalty.
1. Comparison: These criteria are particularly useful when
comparing multiple models. The model with the lowest AIC
and BIC values is generally preferred. If you have other models
or alternative specifications, you can compare their AIC and
BIC values to assess which model offers a better trade-off
between fit and complexity.
Coefficients: const and X
The coefficient for 'const' (-1751.8412) represents the intercept of the
regression line. It is the value of Y when all X variables are zero.
-1751.8412 indicates that when X is zero, the expected value of the
dependent variable Y is -1751.8412. This might be a theoretical
baseline value, but its practical significance depends on the context of
the data.
t-Statistic: -11.069 and p-Value: 0.000 suggest that the intercept is
significantly different from zero. The intercept is statistically
significant at any conventional significance level (e.g., 0.05, 0.01).
Confidence Interval: The 95% confidence interval for the intercept is
[-2065.906, -1437.776], indicating that we are 95% confident that the
true intercept value falls within this range.
The coefficient for 'X' (101.2787) represents the change in Y for a
one-unit change in X. This suggests a strong positive relationship
between 𝑋 and Y.
Coefficients: const and X
t-Statistic: 37.224 and p-Value: 0.000 indicate that the coefficient of X is
highly statistically significant. The relationship between X and Y is highly
significant at any conventional significance level.
Confidence Interval: The 95% confidence interval for the coefficient of 𝑋 is
[95.879, 106.678], indicating that we are 95% confident that the true
coefficient value falls within this range.
Both coefficients have very small p-values, indicating they are statistically
significant.
Overall Interpretation: Significance: Both the intercept and the coefficient
for X are highly significant, with very low p-values, indicating that the
relationship between X and Y is statistically significant. Effect Size: The
coefficient for 𝑋X is quite large (101.2787), suggesting a substantial effect
of X on Y. Intercept: The negative intercept might be a point of theoretical
interest, especially if X is expected to be zero or close to zero in the context
of the model. In summary, the model suggests a strong and significant
positive relationship between X and Y, and the intercept is also statistically
significant, although its practical significance may need further context.
Omnibus
The Omnibus test is a statistical test that assesses whether the
residuals from the regression model are normally distributed. It
combines tests for skewness and kurtosis into a single test statistic.
Omnibus test(10.614) (Prob: 0.005) tests the skewness and kurtosis
of the residuals. A significant result suggests non-normality.
Result: The p-value of 0.005 is less than the common significance
level of 0.05. This indicates that there is evidence against the null
hypothesis that the residuals are normally distributed. In other
words, the residuals deviate significantly from normality.
Residuals Statistics: Skew: 0.628 , Kurtosis: 2.274
1.Skew: 0.628 , Interpretation: Positive skewness indicates that
the residuals are skewed to the right. This suggests that there may be
more frequent smaller residuals and fewer larger residuals, leading
to an asymmetry in the distribution of residuals
Other Statistics
1.Kurtosis: Value: 2.274 , Interpretation: Kurtosis measures the
"tailedness" of the distribution. A kurtosis value of 2.274 is below the
value of 3, which is the kurtosis of a normal distribution. This
suggests that the residuals have lighter tails than a normal
distribution, indicating fewer extreme outliers.
Actionable Steps:
•Residual Analysis: Further residual diagnostics should be
performed to understand the nature of the deviations from normality.
This includes checking for patterns in residuals versus fitted values,
and considering transformations of the dependent variable if needed.
•Model Refinement: Consider if any model refinements, such as
adding or removing predictors, applying transformations, or using
different modeling techniques, could address the issues with residual
normality.
Durbin-Watson Statistic:
Durbin-Watson: 0.116. The Durbin-Watson statistic tests for the presence
of autocorrelation in the residuals from a regression analysis. The value
ranges from 0 to 4, where: 2 indicates no autocorrelation. Less than 2
indicates positive autocorrelation. Greater than 2 indicates negative
autocorrelation.
Result: A Durbin-Watson statistic of 0.116 is much lower than 2,
suggesting strong positive autocorrelation in the residuals. This means that
the residuals are highly correlated with each other, which violates the
assumption of independence of residuals.
Jarque-Bera Test: Jarque-Bera (JB): 8.763, Prob(JB): 0.0125
The Jarque-Bera test assesses whether the residuals follow a normal
distribution by evaluating both skewness and kurtosis. The null hypothesis
is that the residuals are normally distributed.
Result: The p-value of 0.0125 is less than the common significance level of
0.05, indicating that the residuals significantly deviate from normality. This
confirms the earlier Omnibus test result, suggesting that the residuals are
not normally distributed.
Condition Number
Cond. No.: 117. The condition number assesses multicollinearity in
the model. A higher condition number indicates potential issues with
multicollinearity, which can destabilize the regression estimates.
Result: A condition number of 117 is relatively high. While it is not
excessively high, it suggests that there might be some
multicollinearity issues. Typically, condition numbers above 30 are
considered problematic, so a value of 117 indicates that
multicollinearity could be affecting the regression results.
Summary and Recommendations:
1.Autocorrelation:
1. Issue: The Durbin-Watson statistic indicates strong positive
autocorrelation in the residuals.
2. Action: Investigate potential sources of autocorrelation, such
as omitted variables or incorrect model specification. Consider
using time series techniques or adding lagged variables if
applicable.
Condition Number
2.Non-Normal Residuals:
1. Issue: Both the Omnibus and Jarque-Bera tests indicate
significant deviations from normality.
2. Action: Consider transforming the dependent variable or
applying robust standard errors. Re-examine model
assumptions and residual patterns.
1.Multicollinearity:
1. Issue: The high condition number suggests potential
multicollinearity.
2. Action: Evaluate the correlation between predictors. Consider
techniques such as Principal Component Analysis (PCA) or
Ridge Regression to address multicollinearity.
Overall, the model diagnostics suggest that there are several issues to
address, including autocorrelation, non-normality of residuals, and
potential multicollinearity. Addressing these issues will improve the
reliability and validity of your regression analysis
Summary of OLS Regression Results
The OLS regression results provide a detailed overview of the
relationship between the dependent variable (Y) and the
independent variable (X). The model is statistically significant,
and the high R-squared value indicates a good fit. However, care
must be taken to check for assumptions such as normality and
autocorrelation.

Least Squares is the method used to estimate the coefficients in

the OLS regression model. The method minimizes the sum of
squared differences between observed values of Y and predicted
values.

This method ensures the best linear unbiased estimators of the

coefficients under the assumptions of the OLS model.
OLS Result Analysis for one 10 X and Y
Analysis of OLS Regression Results
Summary of Key Statistics:
Dependent Variable: YYY
Model: Ordinary Least Squares (OLS)
Number of Observations: 100
Degrees of Freedom (Residuals): 89
Degrees of Freedom (Model): 10
R-squared: 0.808
Adjusted R-squared: 0.787
F-statistic: 37.53
Prob (F-statistic): 1.06×10^{-27}1.06×10−27
Log-Likelihood: -123.49
AIC: 269.0
BIC: 297.6
Interpretation:
1.Model Fit:
1. R-squared: 0.808 indicates that approximately 80.8% of the
variance in the dependent variable YYY is explained by the
independent variables in the model. This is a relatively high R-
squared value, suggesting that the model has a good fit.
2. Adjusted R-squared: 0.787 adjusts for the number of predictors
in the model. This value is slightly lower than R-squared, but still
high, indicating that the predictors are collectively effective at
explaining the variance in YYY.
2.F-statistic:
1. F-statistic: 37.53 is quite high, suggesting that the overall
regression model is statistically significant.
2. Prob (F-statistic): 1.06×10−271.06 \times 10^{-27}1.06×10−27
indicates an extremely low p-value, much smaller than common
significance levels (0.05 or 0.01). This means the null hypothesis
(that all coefficients are zero) is rejected, confirming that the
model as a whole is significant.
Coefficients and Significance:
const (Intercept): Coefficient is 1.1174 with a p-value of 0.011, indicating that the
intercept is significantly different from zero.
X1: Coefficient is 2.8052 with a p-value of 0.000, indicating a strong positive
relationship with Y.
X2: Coefficient is 5.4174 with a p-value of 0.000, also indicating a strong positive
relationship with Y.
X3: Coefficient is 0.5118 with a p-value of 0.105, which is not significant at the 0.05
level.
X4: Coefficient is 0.2176 with a p-value of 0.501, indicating no significant effect on Y.
X5: Coefficient is 0.1181 with a p-value of 0.697, indicating no significant effect on Y.
X6: Coefficient is -0.0087 with a p-value of 0.979, indicating no significant effect on
Y.
X7: Coefficient is -0.1771 with a p-value of 0.625, indicating no significant effect on
Y.
X8: Coefficient is 0.0283 with a p-value of 0.929, indicating no significant effect on Y.
X9: Coefficient is 0.0252 with a p-value of 0.940, indicating no significant effect on Y.
X10: Coefficient is 0.7142 with a p-value of 0.023, indicating a significant positive
relationship with Y.
Model Diagnostics:
Durbin-Watson: 1.689, which is close to 2, suggesting that there is no strong
autocorrelation in the residuals.
Omnibus Test: 0.424 with a p-value of 0.809 indicates that the residuals are
normally distributed.
Jarque-Bera Test: 0.336 with a p-value of 0.845, further supports the normality of
residuals.
Condition Number: 11.0, which is not extremely high, suggesting that
multicollinearity is not a major concern.

Conclusion:
The model has a good fit with an R-squared of 0.808 and a highly significant F-
statistic.
Among the predictors, X1, X2, and X10 have significant coefficients, while the
others (X3, X4, X5, X6, X7, X8, X9) do not show significant effects on YYY at the
0.05 significance level.
Diagnostics indicate that the residuals are approximately normally distributed and
there is no strong evidence of autocorrelation or multicollinearity issues.

R06 Time-Series Analysis - Answers
100% (1)
R06 Time-Series Analysis - Answers
56 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Classical Linear Regression Model Assumptions and Diagnostics
No ratings yet
Classical Linear Regression Model Assumptions and Diagnostics
71 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
28 pages
Course+notes Regression Analysis
No ratings yet
Course+notes Regression Analysis
8 pages
Course Notes Linear Regression
No ratings yet
Course Notes Linear Regression
8 pages
Module 3 - Regression and Correlation Analysis
No ratings yet
Module 3 - Regression and Correlation Analysis
54 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
STAT630Slide Adv Data Analysis
0% (1)
STAT630Slide Adv Data Analysis
238 pages
Linear Regression Basics Guide
No ratings yet
Linear Regression Basics Guide
6 pages
Regression Models - Follow
No ratings yet
Regression Models - Follow
7 pages
Regression Analysis Basics
No ratings yet
Regression Analysis Basics
14 pages
Intermediate Analytics-Regression-Week 1
No ratings yet
Intermediate Analytics-Regression-Week 1
52 pages
DA&V Module 2 (SAMI)
No ratings yet
DA&V Module 2 (SAMI)
14 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
Deck2 BusinessIntelligence M1 ACSA
No ratings yet
Deck2 BusinessIntelligence M1 ACSA
15 pages
Unit 3
No ratings yet
Unit 3
24 pages
13 Predictive Analysis - Tests of Association - Regression
No ratings yet
13 Predictive Analysis - Tests of Association - Regression
70 pages
DAM Class 21-24 Regression Analysis
No ratings yet
DAM Class 21-24 Regression Analysis
93 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
Fda Unit 5
No ratings yet
Fda Unit 5
20 pages
Unit 3
No ratings yet
Unit 3
20 pages
Da Module 3
No ratings yet
Da Module 3
54 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
Simple Linear Regression Guide
100% (1)
Simple Linear Regression Guide
23 pages
Regression
No ratings yet
Regression
24 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Summary: Correlation and Regression
No ratings yet
Summary: Correlation and Regression
6 pages
Regression
No ratings yet
Regression
49 pages
Linear Regression
100% (2)
Linear Regression
28 pages
CH 06
No ratings yet
CH 06
22 pages
Module 3
No ratings yet
Module 3
34 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Regrion
No ratings yet
Regrion
19 pages
Unit III
No ratings yet
Unit III
13 pages
ML Unit-2
100% (1)
ML Unit-2
52 pages
Concepts - Regression Overview
No ratings yet
Concepts - Regression Overview
14 pages
CST8390 Regression
No ratings yet
CST8390 Regression
25 pages
Chapter 5
No ratings yet
Chapter 5
73 pages
Python ML Course Notes
No ratings yet
Python ML Course Notes
36 pages
Simple Regression
No ratings yet
Simple Regression
46 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
BA3 4 5modules
No ratings yet
BA3 4 5modules
258 pages
Inference For Regression
No ratings yet
Inference For Regression
24 pages
Predictive Analytics-Mid Sem Exam Question Bank
No ratings yet
Predictive Analytics-Mid Sem Exam Question Bank
28 pages
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
R-Programming - Unit 5
No ratings yet
R-Programming - Unit 5
43 pages
Da Unit 3 R22
No ratings yet
Da Unit 3 R22
15 pages
Simple Linear Regression Sample
No ratings yet
Simple Linear Regression Sample
55 pages
Regression
No ratings yet
Regression
46 pages
DA-3rd Unit
No ratings yet
DA-3rd Unit
16 pages
Lecture 16 Regression
No ratings yet
Lecture 16 Regression
30 pages
Number of Observations: It: Number of Variables Plus 1'. Here We Want To Estimate For 1 Variable Only, So Number of
No ratings yet
Number of Observations: It: Number of Variables Plus 1'. Here We Want To Estimate For 1 Variable Only, So Number of
3 pages
Linear Regression
No ratings yet
Linear Regression
65 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
Lecture Week 12 - Intro To Regression
No ratings yet
Lecture Week 12 - Intro To Regression
5 pages
Intro To Reg Models
No ratings yet
Intro To Reg Models
27 pages
Applied Statistics in Construction
No ratings yet
Applied Statistics in Construction
8 pages
Effects of Financial Innovations On Fina
No ratings yet
Effects of Financial Innovations On Fina
12 pages
Chapter One 1.1 Background To The Study
No ratings yet
Chapter One 1.1 Background To The Study
26 pages
Non Managed Customers Product Penetration HDFC Bank - Narsampet
100% (1)
Non Managed Customers Product Penetration HDFC Bank - Narsampet
21 pages
Uji Likelihood Ratio
No ratings yet
Uji Likelihood Ratio
5 pages
Review Test Submission - Online Quiz 10 (Chapter 11) - ..
No ratings yet
Review Test Submission - Online Quiz 10 (Chapter 11) - ..
3 pages
Green SCM Impact on Nigerian SMEs
No ratings yet
Green SCM Impact on Nigerian SMEs
14 pages
Guidelines For Econometrics
No ratings yet
Guidelines For Econometrics
2 pages
Romania's Abortion Ban: Socioeconomic Impact
No ratings yet
Romania's Abortion Ban: Socioeconomic Impact
29 pages
12.factors Affecting The Purchase Intension of Smartphone PDF
No ratings yet
12.factors Affecting The Purchase Intension of Smartphone PDF
16 pages
Lampiran 1 Irna Revisi FIX
No ratings yet
Lampiran 1 Irna Revisi FIX
18 pages
CH - 12 - Serial Correlation and Heteroskedasticity in Time Series Regressions
No ratings yet
CH - 12 - Serial Correlation and Heteroskedasticity in Time Series Regressions
19 pages
What Affects Long Term Stock Prices?: Aaron Chaum Bryan Price
No ratings yet
What Affects Long Term Stock Prices?: Aaron Chaum Bryan Price
21 pages
Gen Z & Islamic Investment Interest
No ratings yet
Gen Z & Islamic Investment Interest
9 pages
Mock Exam 5 Ans
No ratings yet
Mock Exam 5 Ans
74 pages
The Effect of Cross Cultural Differences and Productivity
No ratings yet
The Effect of Cross Cultural Differences and Productivity
10 pages
M07 Julita Nahar PDF
No ratings yet
M07 Julita Nahar PDF
8 pages
Durbin Watson Tabel (Anwar)
No ratings yet
Durbin Watson Tabel (Anwar)
148 pages
Detecting and Addressing Heteroscedasticity
No ratings yet
Detecting and Addressing Heteroscedasticity
16 pages
Chapter 7
50% (4)
Chapter 7
38 pages
Effect of Macroeconomic Variables On Financial Performance of Microfinance Banks in Kenya
No ratings yet
Effect of Macroeconomic Variables On Financial Performance of Microfinance Banks in Kenya
17 pages
Ciritcal Journal Review - Gilang
No ratings yet
Ciritcal Journal Review - Gilang
28 pages
Labor Market & Imprisonment Analysis
No ratings yet
Labor Market & Imprisonment Analysis
16 pages
The Safety Effects of Child-Resistant Packaging Fo PDF
No ratings yet
The Safety Effects of Child-Resistant Packaging Fo PDF
6 pages
P15 - 178380 - Eviews Guide
No ratings yet
P15 - 178380 - Eviews Guide
7 pages
Impact of Human Resource Planning On Organizational Performance
No ratings yet
Impact of Human Resource Planning On Organizational Performance
14 pages
Role of Banks in Financial Inclusion in India
No ratings yet
Role of Banks in Financial Inclusion in India
13 pages
Ribaj & Mexhuani (2021)
No ratings yet
Ribaj & Mexhuani (2021)
13 pages
Download
No ratings yet
Download
12 pages

Regression Basics

Uploaded by

Regression Basics

Uploaded by

Predictive Analytics Basics: Linear

Regression & Logistic Regression

β0 (intercept) represents the baseline level; β1 (slope) represents

Assumptions: Linearity, Independence, Homoscedasticity, and

1. Scatter Plots: Between Predictors and Response: Create scatter plots of

Polynomial Regression or Transformation:

R-squared: R-squared is a statistical measure that indicates the

Adjusted R-squared: Adjusted R-squared adjusts the R-squared

Least Squares is the method used to estimate the coefficients in

This method ensures the best linear unbiased estimators of the

You might also like