Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
4 views18 pages

Lecture 3

Uploaded by

Sanjana Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views18 pages

Lecture 3

Uploaded by

Sanjana Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Generalized linear regression models

Lecture 3

September 14, 2017


Generalized linear models (GLM)
• Generalized linear models (GLM) extend ordinary
regression to non-normal response distributions.

• Response distribution come from the exponential


family of distributions
It has 3 components:
Why do we use GLM’s?

• Linear regression assumes that the response is


distributed normally
• GLM’s enable us to analyze the linear relationship
between predictor variables and the mean of the
response variable when it is not reasonable to assume
the data is distributed normally
Form
• The function g(·) is called the “link function”: it links the
E(Yi) to the set of explanatory variables and their
estimates)
• For example, if we let g(x) = x and Y is distributed as a
normal random variable, then we are back to the linear
model
• For GLM, you generally have the flexibility to choose
what ever link you desire
Random Component
• Conditionally Normally distributed response with constant
standard deviation – standard regression models
• Binary outcomes (Success or Failure)- Random
component has Binomial distribution and model is called
Logistic Regression
• Count data (number of events in fixed area and/or length
of time)- Random component has Poisson distribution and
model is called Poisson Regression
Logistic Regression

• Logistic Regression - Dichotomous Response


variable and numeric and/or categorical explanatory
variable(s)
– Model the probability of a particular outcome as a function of
the predictor variable(s)
– Probabilities are bounded between 0 and 1
• Distribution of Responses: Binomial
Logistic regression

• Consider a binary response variable


• Variable with two outcomes: One outcome
represented by a 1 and the other represented
by a 0
• Examples:
Does the person have a disease? Yes or No
Outcome of a baseball game? Win or loss
Logistic regression
Logistic Regression with 1 Predictor

• Response - Presence/Absence of characteristic


• Predictor - Numeric variable observed for each case
• Model - p(x)  Probability of presence at predictor level x

 0  1 x
e
p ( x)   0  1 x
1 e
• 1 = 0  P(Presence) is the same at each level of x
• 1 > 0  P(Presence) increases as x increases
•  1< 0  P(Presence) decreases as x increases
Logistic Regression with 1 Predictor

 0, 1 are unknown parameters and must be


estimated using statistical software such as R or
STATA
· Main interest in estimating and testing hypotheses
regarding 1
· Large-Sample test (Wald Test):
· H 0: 1 = 0 HA: 1  0
Odds Ratio

• Interpretation of Regression Coefficient ():


– In linear regression, the slope coefficient is the change in the
mean response as x increases by 1 unit
– In logistic regression, we can show that:

odds( x  1)
 e
odds( x)

• Thus e represents the change in the odds of the outcome


(multiplicatively) by increasing x by 1 unit
• If  = 0, the odds and probability are the same at all x levels (e=1)
• If  > 0 , the odds and probability increase as x increases (e>1)
• If  < 0 , the odds and probability decrease as x increases (e<1)
Multiple Logistic Regression

• Extension to more than one predictor variable (either numeric or


dummy variables).
• With k predictors, the model is written:

e  0  1x1   k xk
p
1  e  0  1x1   k xk

• Adjusted Odds ratio for raising xi by 1 unit, holding


all other predictors constant:

ORi  e  i
Testing Regression Coefficients
• Testing the overall model:

H 0 : 1     k  0
H A : Not all  i  0
Poisson Regression

• Consider a count response variable


• Response variable is the number of occurrences
in a given time frame
• Outcomes equal to 0, 1, 2, ….
• Examples:
Number of penalties during a football game
Number of customers shop at a store on a given
day
Number of car accidents at an intersection
Poisson Regression
• Generally used to model Count data
• Distribution: Poisson
• Link Function: the log link
• One particular feature: E(Y)=var(Y)=μ

g (  )  ln(  )   0  1 X 1  ...   k X k
   X 1 ,..., X k   e  0  1 X 1 ...   k X k
• The X’s are variables that might affect the mean
value
Example: If the count variable is number of visits to
a museum in a given year, then X’s can be
variables such as income, admission price, parking
fees etc.
Tests are conducted as in Logistic regression

You might also like