0% found this document useful (0 votes)

80 views22 pages

Multiple Linear Regression

The dataset contains information on 18,207 footballers including variables like wage, international reputation, skill moves, strength, long shots and vision. After cleaning the data, a multiple linear regression model was developed with salary as the dependent variable and the other variables as independent variables. The data cleaning involved dealing with missing values, outliers, encoding variables and transforming variables. The regression model found all independent variables to be statistically significant predictors of salary.

Uploaded by

Agnes Nalutaaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views22 pages

Multiple Linear Regression

Uploaded by

Agnes Nalutaaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

MAKERERE UNIVERSITY

COLLEGE OF BUSSINESS AND MANAGEMENT

SCIENCES
SCHOOL OF STATISTICS AND PLANNING
DEPARTMENT OF STATISTICS AND ACTUARIAL
SCIENCES
DATA ANALYSIS 3 COURSEWORK REPORT

NAME STUDENT NO. REG. NO.

NALUTAAYA AGNES 1800700023 18/U/023

LUNKUSE

TUMWEBAZE CLARITY 1800700022 18/U/022

KAGGA IVAN CLIFF MAZZI 217005264 17/U/4354/PS

TUHAIRWE DUNCAN 1800723176 18/U/23176/PS

SSEKABIRA CAROL 216004779 16/U/11552/PS

NANDAWULA

MUNEZERO BONIVENTURE 1800723165 18/U/23165/PS

ATWIINE GLORIA 1800741655 18/U/41655

TSIKHABI JOSHUA I MASAWI 1800714171 18/U/14171/PS

INTRODUCTION
The dataset was downloaded from https://www.kaggle.com/datasets and it was saved on desktop.
Since this site only has csv files, the dataset was converted into the desired format of an excel
workbook and then it was imported into Stata and then used in analysis.

It contains 18,207 observations (rows) and 80 features (variables). In this exercise we want to
look at how variables like international reputation, skill moves, long shots, strength and vision
affect wage each footballer in the dataset earns.

We propose the following multiple linear regression model:

Wage = β + αInternationalReputation + κSkillMoves + γStrength + θLongShots + ρ*Vision

Where:

Wage is the dependent variable.

InternationalReputation, SkillMoves, Strength, Longshots and Vision are the independent

variables.

β is the intercept.

α, κ, γ, θ and ρ are the coefficients that measure the strength at which Wage depends on
InternationalReputation, SkillMoves, Strength, Longshots and Vision respectively.

PART ONE: DATA CLEANING

Generally, codebook command was run and it gave a description of the data corresponding to
every variable in the dataset.

• Dealing with duplicates

A duplicates report was generated and there were no duplicates in the entire dataset. The
command and output were:
. duplicates report

Duplicates in terms of all variables

copies observations surplus

1 18207 0
• Cleaning variable Wage and generating Salary

The dependent variable had a euro symbol of currency and a symbol “K” for thousands so it was
read by Stata as a string variable and therefore there was need to clean it up. The symbol of the
euro currency was dropped and the cell content was destringed plus ignoring the “K” at the end.

The new variable name given to Wage was salary. Then after it was multiplied by 1000 to give
figures in thousands. The command and output were:
. split Wage , parse(€) generate (wage_y)
variables created as string:
wage_y1 wage_y2

.
. destring wage_y2, generate(salary) ignore("K")
wage_y2: character K removed; salary generated as int
(241 missing values generated)

.
. replace salary = salary * 1000
variable salary was int now long
(17,966 real changes made)

And therefore, there is need to restructure the multiple linear regression model as:

salary = β + αInternationalReputation + κSkillMoves + γStrength + θLongShots + ρ*Vision

Where:

salary is the dependent variable.

InternationalReputation, SkillMoves, Strength, Longshots and Vision are the independent

variables.

β is the intercept.

α, κ, γ, θ and ρ are the coefficients that measure the strength at which salary depends on
InternationalReputation, SkillMoves, Strength, Longshots and Vision respectively.

• Dropping some variables and keeping some variables.

Out of all the variables, 6 were needed and the variable Name and A making them 8. So, the
keep command was used so that Stata could keep the 8 and drop the rest. The command was:
. keep A Name InternationalReputation SkillMoves Strength Vision LongShots salary

• Cleaning variable Name

The variable Name was encoded from string to a form that Stata could manipulate. The new
variable generated was Names. The command was:

. encode Name, generate(Names)

• Dealing with missing values in InternationalReputation and SkillMoves variables

In order to work on missing values in the variables InternationalReputation and SkillMoves

which have discrete integer values 1, 2, 3, 4 and 5, a calculation of the average rank position for
1, 2, 3, 4 and 5 was made in Stata. The command and output were:

. display (1+2+3+4+5)/5
3

Then its this average position 3 that was replaced where the missing values in these two variables
were. The commands were:

. replace InternationalReputation = 3 if InternationalReputation == .

(48 real changes made)

. replace SkillMoves = 3 if SkillMoves == .

(48 real changes made)

• Dealing with missing values in variable salary

In order to work on missing values of the variable salary, the mean of values present in the
column was calculated and replaced where the missing values were. The command and output
for generating mean were:
. mean salary

Mean estimation Number of obs = 17,966

Mean Std. Err. [95% Conf. Interval]

salary 9861.85 165.0083 9538.418 10185.28

The command for replacing missing values in the salary command was:
. replace salary = 9861.85 if salary == .
variable salary was long now double
(241 real changes made)

• Generating variable Time from variable A

The variable A was renamed to Time since a time variable is needed in the Durbin Watson’s d
statistic test for autocorrelation. So, that is why this variable was not dropped at initial stages
because its use was to be realized in the near future. The command was:
. label variable A "time"

.
. rename A Time

• Dealing with missing values in variable Strength

The variable Strength had missing values. So, the mean in that column was calculated and then
replaced in the missing values. The commands and output were:
. mean Strength

Mean estimation Number of obs = 18,159

Mean Std. Err. [95% Conf. Interval]

Strength 65.31197 .0931837 65.12932 65.49462

. replace Strength = 65.31197 if Strength == .

variable Strength was byte now float
(48 real changes made)

• Dealing with missing values in variable LongShots

The variable LongShots had missing values. So, the mean in that column was calculated and then
replaced in the missing values. The commands and output were:
. mean LongShots

Mean estimation Number of obs = 18,159

Mean Std. Err. [95% Conf. Interval]

LongShots 47.10997 .1429296 46.82982 47.39013

. replace LongShots = 47.10997 if LongShots == .

variable LongShots was byte now float
(48 real changes made)

• Dealing with missing values in variable Vision

The variable Vision had missing values. So, the mean in that column was calculated and then
replaced in the missing values. The commands and output were:
. mean Vision

Mean estimation Number of obs = 18,159

Mean Std. Err. [95% Conf. Interval]

Vision 53.4009 .104982 53.19513 53.60668

. replace Vision = 53.4009 if Vision == .

variable Vision was byte now float
(48 real changes made)

Conclusion:

The data was summarized after the cleaning procedure. The command and output were:
. summarize Time InternationalReputation SkillMoves Strength LongShots Vision salary Names

Variable Obs Mean Std. Dev. Min Max

Time 18,207 9103 5256.053 0 18206

Internatio~n 18,207 1.118196 .4052307 1 5
SkillMoves 18,207 2.362992 .7558765 1 5
Strength 18,207 65.31197 12.54044 17 97
LongShots 18,207 47.10997 19.23512 3 94

Vision 18,207 53.4009 14.12822 10 94

salary 18,207 9861.85 21970.4 1000 565000
Names 18,207 8562.34 4944.38 1 17194

The total number of observations for all variables is 18,207.

• Time: The mean is 9103, standard deviation is 5256.053, minimum is 0 and maximum is
18206.
• InternationalReputation: The mean is 1.118196, standard deviation is 0.4052307,
minimum is 1 and maximum is 5.
• SkillMoves: The mean is 2.362992, standard deviation is 0.7558765, minimum is 1 and
maximum is 5.
• Strength: The mean is 65.31197, standard deviation is 12.54044, minimum is 17 and
maximum is 97.
• LongShots: The mean is 47.10977, standard deviation is 19.23512, minimum is 3 and
maximum is 94.
• Vision: The mean is 53.4009, standard deviation is 14.12822, minimum is 10 and
maximum is 94.
• salary: The mean is 9861.85, standard deviation is 21970.4, minimum is 1000 and
maximum is 565000.
• Names: It has mean, standard deviation a minimum and maximum which we will
consider invalid since the data in it is in word form though not in string format since it
was encoded from string.

PART TWO: MULTIPLE LINEAR REGRESSION AND

TESTING ASSUMPTIONS
Running the multiple linear regression model.

The command and output were:

. regress salary i.InternationalReputation i.SkillMoves Strength LongShots Vision

Source SS df MS Number of obs = 18,207

F(11, 18195) = 1772.09
Model 4.5453e+12 11 4.1321e+11 Prob > F = 0.0000
Residual 4.2427e+12 18,195 233177953 R-squared = 0.5172
Adj R-squared = 0.5169
Total 8.7880e+12 18,206 482698402 Root MSE = 15270

salary Coef. Std. Err. t P>|t| [95% Conf. Interval]

InternationalReputation
2 19552.05 471.2686 41.49 0.000 18628.32 20475.78
3 56872 844.8184 67.32 0.000 55216.07 58527.92
4 158901.5 2168.295 73.28 0.000 154651.5 163151.6
5 286925 6338.772 45.27 0.000 274500.4 299349.6

SkillMoves
2 -4862.77 481.343 -10.10 0.000 -5806.248 -3919.292
3 -3332.904 595.9187 -5.59 0.000 -4500.961 -2164.847
4 7805.529 819.2849 9.53 0.000 6199.654 9411.405
5 9032.809 2289.064 3.95 0.000 4546.028 13519.59

Strength 177.3617 9.44834 18.77 0.000 158.8421 195.8814

LongShots 52.97948 11.2732 4.70 0.000 30.88294 75.07603
Vision 173.3927 13.10883 13.23 0.000 147.6981 199.0872
_cons -13400.04 807.4742 -16.60 0.000 -14982.77 -11817.31

For the categorical variables (InternationalReputation and SkillMoves) dummy variables were
generated.

Interpretation

The total number of observations is 18207.

The F value is 0.0000 which is less than 0.05 which means that not all coefficients of variables
in this linear regression model are zero.
The adjusted R^2 is 0.5169 which means that 51.69% of a change in salary is explained by
InternationalReputation, SkillMoves, Strength, Longshots and Vision.

A change in InternationalReputation from 1 to 2 increases salary by 19552.05.

A change in InternationalReputation from 1 to 3 increases salary by 56872.

A change in InternationalReputation from 1 to 4 increases salary by 158901.5.

A change in InternationalReputation from 1 to 5 increases salary by 286925.

A change in SkillMoves from 2 to 1 decreases salary by 4862.77.

A change in SkillMoves from 3 to 1 decreases salary by 3332.904.

A change in SkillMoves from 1 to 4 increases salary by 7805.529.

A change in SkillMoves from 1 to 5 increases salary by 9032.809

A unit increase in Strength increases salary by 177.3617.

A unit increase in LongShots increases salary by 52.97948.

A unit increase in Vision increases salary by 173.3927.

All p values are 0.000 meaning that all variables are significant in this model.

The intercept is -13400.04.

The multiple linear model is:

salary = -13400.04 + 19552.05InternatioanlReputation2 + 56872 InternatioanlReputation3 +

158901.5*InternatioanlReputation4 + 286925* InternatioanlReputation5 +
-4862.77*SkillMoves2 + -3332.904*SkillMoves3 + 7805.529* SkillMoves4 + 9032.809*
SkillMoves5 + 177.3617*Strength + 52.97948*LongShots + 173.3927*Vision

TESTING ASSUMPTIONS

1) Your dependent variable must be measured at a continuous level/ scale.

We checked this assumption by tabulating and summarizing charges. The commands were:

. tab salary

. summarize salary

Variable Obs Mean Std. Dev. Min Max

salary 18,207 9861.85 21970.4 1000 565000

Conclusion: The variable Salary is continuous.

2) You have two or more independent variables measured at continuous or categorical

level.
We checked this assumption by using multiple one-way frequency tables for
InternationalReputation, SkillMoves, Strength, LongShots and Vision using the command:

. tab1 InternationalReputation SkillMoves Strength LongShots Vision

And from this we conclude that Strength, LongShots and Vision are continuous and also
InternationalReputation and SkillMoves are categorical.

Furthermore, on this assumption we did some summary statistics for the independent variables.

The command and output are:

. summarize InternationalReputation SkillMoves Strength LongShots Vision salary

Variable Obs Mean Std. Dev. Min Max

Internatio~n 18,207 1.118196 .4052307 1 5

SkillMoves 18,207 2.362992 .7558765 1 5
Strength 18,207 65.31197 12.54044 17 97
LongShots 18,207 47.10997 19.23512 3 94
Vision 18,207 53.4009 14.12822 10 94

salary 18,207 9861.85 21970.4 1000 565000

3) There needs to be a linear relationship between:

a) the dependent variable and independent variables and
b) the dependent variable and the independent variables collectively.

Testing assumption 3a): The needs to be a linear relationship between dependent variable
and independent variables.

Generally, a log transformation was done for variable salary in order to shift the line of best fit up
in all plots done for this assumption. The command was:

. generate logsalary = ln( salary)

Strength

It was tested by using a scatter plot of salary and Strength.

The command and output before the log transformation were:

. twoway (scatter salary Strength) (lfit salary Strength)

600000
400000
200000
0

20 40 60 80 100
Strength

salary Fitted values

The command and output after the log transformation were:

. twoway (scatter logsalary Strength) (lfit logsalary Strength)

14
12
10
8
6

20 40 60 80 100
Strength

logsalary Fitted values

Conclusion: There is linearity between Strength and salary.

LongShots

It was tested by using a scatter plot of salary and LongShots.

The command and output before the log transformation were:

. twoway (scatter salary LongShots ) (lfit salary LongShots )
600000
400000
200000
0

0 20 40 60 80 100
LongShots

salary Fitted values

The command and output after the log transformation were:

. twoway (scatter logsalary LongShots ) (lfit logsalary LongShots )
14
12
10
8
6

0 20 40 60 80 100
LongShots

logsalary Fitted values

Conclusion: There is linearity between Longshots and salary.

Vision

It was tested by using a scatter plot of salary and Vision.

The command and output before the log transformation were:

. twoway (scatter salary Vision ) (lfit salary Vision )

600000
400000
200000
0

0 20 40 60 80 100
Vision

salary Fitted values

The command and output after the log transformation were:

. twoway (scatter logsalary Vision ) (lfit logsalary Vision )

14
12
10
8
6

0 20 40 60 80 100
Vision

logsalary Fitted values

Conclusion: There is linearity between Vision and salary.

Testing assumption 3b): There should be a linear relationship between the dependent
variable and the independent variables collectively.

This assumption is tested using partial regression plots or added variable plots. The command
produces an added variable plot for all variables in the multiple linear regression model giving
respect to dummy variables too.

The command was:

. avplots

Conclusion: There is linearity between salary and all independent variables.

4) Your data must not show multi collinearity which occurs when you have two or more
variables that are highly correlated.
This assumption was tested using variance inflation factors. The command and output were:
. estat vif

Variable VIF 1/VIF

Internatio~n
2 1.12 0.894554
3 1.07 0.933454
4 1.03 0.975212
5 1.03 0.967540
SkillMoves
2 4.51 0.221882
3 6.43 0.155575
4 2.51 0.398925
5 1.14 0.875024
Strength 1.10 0.912298
LongShots 3.67 0.272388
Vision 2.68 0.373396

Mean VIF 2.39

Conclusion: Since all the variance inflation factors (VIF) are between 1 and 10, there is moderate
correlation and hence no high correlation.
5) There should be homoscedasticity
This assumption was tested using a residual versus fitted values plot (rvfplot). The command and
output were:
. rvfplot, yline(0)
400000
200000
Residuals
0
-200000
-400000

0 100000 200000 300000

Fitted values

Conclusion: Since variances are moving in a way that they spread out from the line of fit at 0, it
means there is no homoscedasticity.

6) There should be no autocorrelation

The assumption was tested using the Durbin Watson’s d statistic. Since it only applies to time
series data, then the data was declared to be time series data and the test was applied. The command
and output were:
. tsset Time
time variable: Time, 0 to 18206
delta: 1 unit

. estat dwatson

Durbin-Watson d-statistic( 12, 18207) = 1.421951

Conclusion: The Durbin Watson Statistic is 1.421951 which is not equal to 2 meaning there is
autocorrelation. This statistic ranges between 0 and 4; at 2 there is no autocorrelation.
7) The residuals should be normally distributed.

In order to test this assumption, studentized residuals were generated and then plotted in a
histogram with a normal density plot imposed. The commands and output were:
. predict stres, rstudent

. histogram stres, normal

(bin=42, start=-21.936043, width=.95808297)
.8
.6
Density
.4 .2
0

-20 -10 0 10 20
Studentized residuals

Conclusion: The studentized residuals are normally distributed.

8) There should be no significant outliers, high leverage points and highly influential
points
Outliers
To check for outliers a stem and leaf display was generated. Then the outliers above and below
were identified. For those that were above, since they are few, they were all listed and for those
that were below, since they are many we shall list only 10 in this document but in Stata all will
be output since there is a command for that in the do-file.
The commands and output were:
i) For those above
. list InternationalReputation SkillMoves Strength LongShots Vision salary stres if stres <= -5

Intern~n SkillM~s Strength LongSh~s Vision salary stres

23. 5 1 80 16 70 130000 -12.34081

42. 4 1 69 13 50 77000 -5.966961
69. 4 4 67 86 86 100000 -5.60563
77. 4 4 58 71 93 21000 -10.7845
109. 4 2 86 56 48 57000 -7.300009

110. 5 5 86 82 79 15000 -21.93604

207. 4 4 86 79 74 55000 -8.657025
222. 4 5 60 71 84 72000 -7.453211
281. 4 4 68 66 86 67000 -7.738674
315. 4 4 55 78 78 62000 -7.867957

318. 4 1 63 11 53 60000 -7.052768

319. 4 1 70 13 65 10000 -10.61177
379. 4 4 92 90 75 25000 -10.77981
548. 4 2 72 55 68 20000 -9.823951
551. 4 3 75 77 81 11000 -10.79005

553. 4 3 78 80 82 13000 -10.71431

677. 3 4 66 71 81 1000 -5.238172

ii) For those below

. list InternationalReputation SkillMoves Strength LongShots Vision salary stres if stres >= 9 in 1/21

Intern~n SkillM~s Strength LongSh~s Vision salary stres

1. 5 4 59 94 94 565000 18.30344
5. 4 4 75 91 94 355000 11.10393
6. 4 4 66 80 89 340000 10.3053
7. 4 4 58 82 92 420000 15.72673
8. 5 3 83 85 84 455000 10.90395

9. 4 3 83 59 63 380000 13.90365
12. 4 3 73 92 86 355000 11.96119
15. 3 2 76 69 79 225000 10.23313
19. 3 1 79 10 69 240000 11.1949
21. 4 3 77 54 87 315000 9.366471

Conclusion: There are outliers in the model.

Leverage points

In order to check for leverage points, they were predicted using the following command:

. predict leverage, leverage

Then the cut off for leverage points was calculated. The formula is (2k + 2)/n where k is the number
of variables used in the multiple linear regression model and n is the total number of observations.
The command and output were:
. display ((2*6)+2)/18207
.00076894

Then a list of leverage points greater than the cut off of 0.00076894 was generated but for this
document only the first 10 have been shown, all of them will appear in Stata since the command
is there in the do-file. The command and output were:
. list InternationalReputation SkillMoves Strength LongShots Vision salary leverage if leverage > .00076894 in
> 1/10

Intern~n SkillM~s Strength LongSh~s Vision salary leverage

1. 5 4 59 94 94 565000 .1726412
2. 5 5 79 93 82 405000 .1719912
3. 5 5 49 82 87 290000 .1720172
4. 4 1 64 12 68 260000 .02072
5. 4 4 75 91 94 355000 .0202646

6. 4 4 66 80 89 340000 .0202131
7. 4 4 58 82 92 420000 .0202777
8. 5 3 83 85 84 455000 .172036
9. 4 3 83 59 63 380000 .0201346
10. 3 1 78 12 70 94000 .0040192

Conclusion: There are leverage points in the model.

Influential points

In order to check for influential points, distributed inter frame space (dfits) were predicted using
the following command:

. predict dfits, dfits

Then the cut off for influential points was calculated. The formula is (2*sqrt(k/n)) where k is the
number of variables used in the multiple linear regression model and n is the total number of
observations. The command and output were:

. display 2*sqrt(6/18207)
.03630667

Then a list of influential points greater than the cut off of 0.03630667 was generated but for this
document only the first 10 have been shown, all of them will appear in Stata since the command
is there in the do-file. The command and output were:
. list InternationalReputation SkillMoves Strength LongShots Vision salary dfits if dfits>.03630667 in 1/11

Intern~n SkillM~s Strength LongSh~s Vision salary dfits

1. 5 4 59 94 94 565000 8.360995
2. 5 5 79 93 82 405000 2.931811
4. 4 1 64 12 68 260000 .8741187
5. 4 4 75 91 94 355000 1.596951
6. 4 4 66 80 89 340000 1.480169

7. 4 4 58 82 92 420000 2.26254
8. 5 3 83 85 84 455000 4.970357
9. 4 3 83 59 63 380000 1.993044
10. 3 1 78 12 70 94000 .099716
11. 4 4 84 84 77 205000 .1810446

Conclusion: There are influential points in the model.

Treatment of unusual points like outliers.

In order to treat unusual points like outliers, a robust regression was used.

Robust regression is an iterative procedure that seeks to identify outliers and minimize their
impact on the coefficient estimates.

For this document, the robust regression model will be shown excluding the iterations. All the
out put will fully be shown in Stata since the command is there in the do-file.

The command and output were:

. rreg salary i.InternationalReputation i.SkillMoves Strength LongShots Vision

Robust regression Number of obs = 18,203
F( 11, 18191) = 22023.11
Prob > F = 0.0000

salary Coef. Std. Err. t P>|t| [95% Conf. Interval]

InternationalReputation
2 6796.234 94.90796 71.61 0.000 6610.205 6982.262
3 48224.09 170.1606 283.40 0.000 47890.56 48557.62
4 156367.1 436.6914 358.07 0.000 155511.1 157223.1
5 -3.67e-10 2221.061 -0.00 1.000 -4353.488 4353.488

SkillMoves
2 -1504.482 96.93651 -15.52 0.000 -1694.486 -1314.477
3 -229.601 120.0113 -1.91 0.056 -464.8344 5.632478
4 3018.416 165.0164 18.29 0.000 2694.968 3341.864
5 2914.861 467.8699 6.23 0.000 1997.792 3831.93

Strength 58.55993 1.902957 30.77 0.000 54.82995 62.2899

LongShots 23.77543 2.27064 10.47 0.000 19.32476 28.2261
Vision 33.79158 2.640441 12.80 0.000 28.61607 38.96709
_cons -2412.282 162.6184 -14.83 0.000 -2731.03 -2093.535

Since there is some insignificancy in the categorical variables in this treatment, one of them was
dropped i.e. SkillMoves and the robust regression model was finally run.

The command and output were:

. rreg salary i.InternationalReputation Strength LongShots Vision

Robust regression Number of obs = 18,202

F( 7, 18194) = 34959.35
Prob > F = 0.0000

salary Coef. Std. Err. t P>|t| [95% Conf. Interval]

InternationalReputation
2 11701.48 95.58984 122.41 0.000 11514.11 11888.84
3 50277.6 171.8411 292.58 0.000 49940.78 50614.43
4 156413.4 445.3501 351.21 0.000 155540.5 157286.4
5 284990.4 3159.159 90.21 0.000 278798.2 291182.7

Strength 47.29808 1.90894 24.78 0.000 43.55638 51.03979

LongShots 16.49655 1.869364 8.82 0.000 12.83242 20.16068
Vision 51.34202 2.598845 19.76 0.000 46.24804 56.436
_cons -3127.501 163.6803 -19.11 0.000 -3448.329 -2806.672

Interpretation

There are 18,202 observations since some unusual points have been dropped.
The F value is 0.0000 which is less than 0.05 which means that not all coefficients of variables in
this robust regression model are zero.

A change in InternationalReputation from 1 to 2 increases salary by 11701.48.

A change in InternationalReputation from 1 to 3 increases salary by 50277.6.

A change in InternationalReputation from 1 to 4 increases salary by 156413.4.

A change in InternationalReputation from 1 to 5 increases salary by 284990.4.

A unit increase in Strength increases salary by 47.29808.

A unit increase in LongShots increases salary by 16.49655.

A unit increase in Vision increases salary by 51.34202.

All p values are 0.000 meaning that all variables are significant in this model.

The intercept is -3127.501.

The robust model is:

salary = -3127.501 + 11701.48InternatioanlReputation2 + 50277.6 InternatioanlReputation3 +

156413.4*InternatioanlReputation4 + 284990.4* InternatioanlReputation5 +
47.29808*Strength + 16.49655*LongShots + 51.34202*Vision

END

Business Analytics Data Analysis Decision Making 6th Edition Business Analytics Data Analysis Decision Making PDF Download
100% (1)
Business Analytics Data Analysis Decision Making 6th Edition Business Analytics Data Analysis Decision Making PDF Download
84 pages
CFA Mock 1 - Morning
100% (2)
CFA Mock 1 - Morning
25 pages
Valuing Wind Farm Assets April 2016
No ratings yet
Valuing Wind Farm Assets April 2016
26 pages
MBA Starting Salaries - Class Demonstration
No ratings yet
MBA Starting Salaries - Class Demonstration
26 pages
Proc Robust Reg
No ratings yet
Proc Robust Reg
56 pages
SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis
No ratings yet
SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis
12 pages
Assessment 2: Answer: Yes
No ratings yet
Assessment 2: Answer: Yes
6 pages
Applied Linear Regression
No ratings yet
Applied Linear Regression
9 pages
Multiple Imputation IN: Mplus
No ratings yet
Multiple Imputation IN: Mplus
19 pages
SPSS Tutorial: Scatter Plot, Regression, ANOVA
No ratings yet
SPSS Tutorial: Scatter Plot, Regression, ANOVA
43 pages
Forecasting One Mark
No ratings yet
Forecasting One Mark
45 pages
Department of Economics Problem Set
No ratings yet
Department of Economics Problem Set
5 pages
Stata Panel Data Analysis Summary
No ratings yet
Stata Panel Data Analysis Summary
45 pages
5103A1
No ratings yet
5103A1
6 pages
Multiple Linear Regression in Excel
No ratings yet
Multiple Linear Regression in Excel
19 pages
Analyse Econometrique Avec Stata 12 2
No ratings yet
Analyse Econometrique Avec Stata 12 2
414 pages
B.Tech IT Machine Learning Guide
No ratings yet
B.Tech IT Machine Learning Guide
135 pages
Stata Data Managment
No ratings yet
Stata Data Managment
79 pages
Statistical Modelling and Prediction of Compressive Strength of Concrete
No ratings yet
Statistical Modelling and Prediction of Compressive Strength of Concrete
7 pages
MGMT 469 Helpful Stata Commands
No ratings yet
MGMT 469 Helpful Stata Commands
8 pages
Lec 5 V 11
No ratings yet
Lec 5 V 11
44 pages
Oup 6
No ratings yet
Oup 6
48 pages
Topic - 9 PDF
No ratings yet
Topic - 9 PDF
12 pages
Panel Data Models Stata Program and Output PDF
100% (1)
Panel Data Models Stata Program and Output PDF
8 pages
Comparing Two Measurement Devices
No ratings yet
Comparing Two Measurement Devices
32 pages
Chapter 15
No ratings yet
Chapter 15
43 pages
Tutorial 5 - Solutions
No ratings yet
Tutorial 5 - Solutions
8 pages
Multiple Linear Regression Analysis Usin
No ratings yet
Multiple Linear Regression Analysis Usin
19 pages
Stat A Tutorial
No ratings yet
Stat A Tutorial
40 pages
Chap 6 MultipleLinearRegression Adjusted
No ratings yet
Chap 6 MultipleLinearRegression Adjusted
30 pages
FHMM1034 Chapter 5 Correlation and Regression (Student Version)
100% (2)
FHMM1034 Chapter 5 Correlation and Regression (Student Version)
49 pages
SSRN Id3990877
No ratings yet
SSRN Id3990877
8 pages
Ex - 2 - Data Transformation-1
No ratings yet
Ex - 2 - Data Transformation-1
3 pages
Survival Analysis Coursework
No ratings yet
Survival Analysis Coursework
11 pages
Experiment 3 & 4
No ratings yet
Experiment 3 & 4
15 pages
R Working Manuals Students
No ratings yet
R Working Manuals Students
11 pages
Regression Analysis Insights
No ratings yet
Regression Analysis Insights
21 pages
Harvard-Sup Chain Syl
No ratings yet
Harvard-Sup Chain Syl
23 pages
SDM - Task B - Group 1G - Movies
No ratings yet
SDM - Task B - Group 1G - Movies
11 pages
Section 2
No ratings yet
Section 2
22 pages
TaylorFit Regression Manual
No ratings yet
TaylorFit Regression Manual
15 pages
Fajar Yulianto DK - F0318050 - Tugas STATBIS A
No ratings yet
Fajar Yulianto DK - F0318050 - Tugas STATBIS A
21 pages
Understanding Time Series Analyisis 2021
100% (1)
Understanding Time Series Analyisis 2021
69 pages
Food Price Volatility & Welfare in Nigeria
No ratings yet
Food Price Volatility & Welfare in Nigeria
38 pages
KTL 31 10 2024
No ratings yet
KTL 31 10 2024
2 pages
Determinants of The Household Electricity Consumption: A Case Study of Delhi
No ratings yet
Determinants of The Household Electricity Consumption: A Case Study of Delhi
12 pages
Financial Management Coursework
100% (1)
Financial Management Coursework
11 pages
Introduction To The Course: Rob Reider
No ratings yet
Introduction To The Course: Rob Reider
36 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
45 pages
B211230053-03 SPV
No ratings yet
B211230053-03 SPV
4 pages
Measures of Correlation
No ratings yet
Measures of Correlation
5 pages
LC III Slides
No ratings yet
LC III Slides
20 pages
JC2 H2 Math Exam Instructions
No ratings yet
JC2 H2 Math Exam Instructions
6 pages
Practicals For Basic Econometrics-2.Docx 20241118 002851 0000
No ratings yet
Practicals For Basic Econometrics-2.Docx 20241118 002851 0000
3 pages
Final Submission of Fundamental of Mathematics & Statictis
No ratings yet
Final Submission of Fundamental of Mathematics & Statictis
37 pages
L9.1 2023
No ratings yet
L9.1 2023
47 pages
CSIT Module IV Notes
No ratings yet
CSIT Module IV Notes
19 pages
Excel Functions and Shortcuts
No ratings yet
Excel Functions and Shortcuts
13 pages
4BBA B Rakhi Toshniwal 21211137
No ratings yet
4BBA B Rakhi Toshniwal 21211137
37 pages
Energy Simulation in Historic Buildings
No ratings yet
Energy Simulation in Historic Buildings
11 pages
7-9 Stata
No ratings yet
7-9 Stata
4 pages
Esai Seshan FMS Practical Final Submission
No ratings yet
Esai Seshan FMS Practical Final Submission
25 pages
Regression Analysis Solutions
No ratings yet
Regression Analysis Solutions
5 pages
Regression Practice Questions
No ratings yet
Regression Practice Questions
19 pages
Veganism Attitudes in Gujarat, India
No ratings yet
Veganism Attitudes in Gujarat, India
12 pages
Leadership's Impact on Employee Performance
No ratings yet
Leadership's Impact on Employee Performance
6 pages
Topic 1 Class Exercises
No ratings yet
Topic 1 Class Exercises
5 pages
An Enquiry Into The Perception of Post Office Savings Schemes Amongst Millenial Population
No ratings yet
An Enquiry Into The Perception of Post Office Savings Schemes Amongst Millenial Population
60 pages
Ejemplo Prueba de Goldfeld-Quandt
No ratings yet
Ejemplo Prueba de Goldfeld-Quandt
2 pages
Mediating Effect of Employee Positive Mood in The Relationship Between Employee Job Satisfaction and Employee Performance in Nepal Electricity Authority
No ratings yet
Mediating Effect of Employee Positive Mood in The Relationship Between Employee Job Satisfaction and Employee Performance in Nepal Electricity Authority
23 pages
Investors' Awareness and Behavior On Stock Investment
No ratings yet
Investors' Awareness and Behavior On Stock Investment
14 pages
Lecture Note - Regression
No ratings yet
Lecture Note - Regression
40 pages
Impact of Customer Relationship Management CRM On The Performance of Deposit Money Bank in Nigeria
No ratings yet
Impact of Customer Relationship Management CRM On The Performance of Deposit Money Bank in Nigeria
12 pages
The Impact of Influencer Marketing On Consumer Purchase Intention The Mediating Role of Trust, Content, Consumer Engagement, and Popularity
No ratings yet
The Impact of Influencer Marketing On Consumer Purchase Intention The Mediating Role of Trust, Content, Consumer Engagement, and Popularity
20 pages
2015 Regression Using Stata and SAS
No ratings yet
2015 Regression Using Stata and SAS
36 pages
Assignment 1
No ratings yet
Assignment 1
16 pages
R Stastics PDF
No ratings yet
R Stastics PDF
30 pages
Gen Z's Virtual Try-On Impact
No ratings yet
Gen Z's Virtual Try-On Impact
10 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Stata Instructions
No ratings yet
Stata Instructions
7 pages
Regression Test Farid Ahmad Khalil 09
No ratings yet
Regression Test Farid Ahmad Khalil 09
5 pages
DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
1 Residuals, Outliers and Regression Diagnostics - CH 14.8 15.8 Revised
No ratings yet
1 Residuals, Outliers and Regression Diagnostics - CH 14.8 15.8 Revised
48 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
Mae Ce 375 Labmanual Exp1 Ch3 Saua
No ratings yet
Mae Ce 375 Labmanual Exp1 Ch3 Saua
15 pages
Multiple Regression BI
No ratings yet
Multiple Regression BI
16 pages
Chapter 7
No ratings yet
Chapter 7
3 pages
STATA Precourse
No ratings yet
STATA Precourse
6 pages

Multiple Linear Regression

Uploaded by

Multiple Linear Regression

Uploaded by

MAKERERE UNIVERSITY

COLLEGE OF BUSSINESS AND MANAGEMENT

NAME STUDENT NO. REG. NO.

NALUTAAYA AGNES 1800700023 18/U/023

TUMWEBAZE CLARITY 1800700022 18/U/022

KAGGA IVAN CLIFF MAZZI 217005264 17/U/4354/PS

TUHAIRWE DUNCAN 1800723176 18/U/23176/PS

SSEKABIRA CAROL 216004779 16/U/11552/PS

MUNEZERO BONIVENTURE 1800723165 18/U/23165/PS

ATWIINE GLORIA 1800741655 18/U/41655

TSIKHABI JOSHUA I MASAWI 1800714171 18/U/14171/PS

We propose the following multiple linear regression model:

Wage = β + α*InternationalReputation + κ*SkillMoves + γ*Strength + θ*LongShots + ρ*Vision

Wage is the dependent variable.

InternationalReputation, SkillMoves, Strength, Longshots and Vision are the independent

PART ONE: DATA CLEANING

• Dealing with duplicates

Duplicates in terms of all variables

copies observations surplus

salary = β + α*InternationalReputation + κ*SkillMoves + γ*Strength + θ*LongShots + ρ*Vision

salary is the dependent variable.

InternationalReputation, SkillMoves, Strength, Longshots and Vision are the independent

• Dropping some variables and keeping some variables.

• Cleaning variable Name

. encode Name, generate(Names)

• Dealing with missing values in InternationalReputation and SkillMoves variables

In order to work on missing values in the variables InternationalReputation and SkillMoves

. replace InternationalReputation = 3 if InternationalReputation == .

. replace SkillMoves = 3 if SkillMoves == .

• Dealing with missing values in variable salary

Mean estimation Number of obs = 17,966

Mean Std. Err. [95% Conf. Interval]

salary 9861.85 165.0083 9538.418 10185.28

• Generating variable Time from variable A

• Dealing with missing values in variable Strength

Mean estimation Number of obs = 18,159

Mean Std. Err. [95% Conf. Interval]

Strength 65.31197 .0931837 65.12932 65.49462

. replace Strength = 65.31197 if Strength == .

• Dealing with missing values in variable LongShots

Mean estimation Number of obs = 18,159

Mean Std. Err. [95% Conf. Interval]

LongShots 47.10997 .1429296 46.82982 47.39013

. replace LongShots = 47.10997 if LongShots == .

• Dealing with missing values in variable Vision

Mean estimation Number of obs = 18,159

Mean Std. Err. [95% Conf. Interval]

Vision 53.4009 .104982 53.19513 53.60668

. replace Vision = 53.4009 if Vision == .

Variable Obs Mean Std. Dev. Min Max

Time 18,207 9103 5256.053 0 18206

Vision 18,207 53.4009 14.12822 10 94

The total number of observations for all variables is 18,207.

PART TWO: MULTIPLE LINEAR REGRESSION AND

The command and output were:

Source SS df MS Number of obs = 18,207

salary Coef. Std. Err. t P>|t| [95% Conf. Interval]

Strength 177.3617 9.44834 18.77 0.000 158.8421 195.8814

The total number of observations is 18207.

A change in InternationalReputation from 1 to 2 increases salary by 19552.05.

A change in InternationalReputation from 1 to 3 increases salary by 56872.

A change in InternationalReputation from 1 to 4 increases salary by 158901.5.

A change in InternationalReputation from 1 to 5 increases salary by 286925.

A change in SkillMoves from 2 to 1 decreases salary by 4862.77.

A change in SkillMoves from 3 to 1 decreases salary by 3332.904.

A change in SkillMoves from 1 to 4 increases salary by 7805.529.

A change in SkillMoves from 1 to 5 increases salary by 9032.809

A unit increase in Strength increases salary by 177.3617.

A unit increase in LongShots increases salary by 52.97948.

A unit increase in Vision increases salary by 173.3927.

The intercept is -13400.04.

The multiple linear model is:

salary = -13400.04 + 19552.05*InternatioanlReputation2 + 56872* InternatioanlReputation3 +

1) Your dependent variable must be measured at a continuous level/ scale.

Variable Obs Mean Std. Dev. Min Max

salary 18,207 9861.85 21970.4 1000 565000

Wage = β + αInternationalReputation + κSkillMoves + γStrength + θLongShots + ρ*Vision

salary = β + αInternationalReputation + κSkillMoves + γStrength + θLongShots + ρ*Vision

salary = -13400.04 + 19552.05InternatioanlReputation2 + 56872 InternatioanlReputation3 +