VI.
Logistic Regression - Feedback
You achieved a score of 3.50 out of 5.00.
Your answers, as well as our explanations, are shown below.
To review the material and deepen your understanding of the course content, please
answer the review questions below, and hit submit at the bottom of the page when you're
done.
You are allowed to take/re-take these review quizzes multiple times, and each time you
will see a slightly different set of questions or answers. We will use only your highest
score, and strongly encourage you to continue re-taking each quiz until you get a 100%
score at least once. (Even after that, you can re-take it to review the content further, with
no risk of your final score being reduced.) To prevent rapid-fire guessing, the system
enforces a minimum of 10 minutes between each attempt.
Question 1
Suppose that you have trained a logistic regression classifier, and it outputs on a new
example x a prediction h(x) = 0.2. This means (check all that apply):
Our estimate for P(y=0|x;) is 0.8.
Our estimate for P(y=1|x;) is 0.8.
Our estimate for P(y=1|x;) is 0.2.
Our estimate for P(y=0|x;) is 0.2.
Your answer
Score
Choice explanation
Since we must have P(y=0|
x;) = 1P(y=1|x;), the former
Our estimate for P(y=0|x;) is 0.8.
0.25
is 10.2=0.8.
h(x) gives P(y=1|x;),
Our estimate for P(y=1|x;) is 0.8.
0.25
not 1P(y=1|x;).
h(x) is precisely P(y=1|x;), so each
Our estimate for P(y=1|x;) is 0.2.
0.25
is 0.2.
Our estimate for P(y=0|x;) is 0.2.
0.25
h(x) is P(y=1|x;), not P(y=0|x;)
Total
1.00 / 1.00
Question 2
Suppose you train a logistic classifier h(x)=g(0+1x1+2x2).
Suppose0=6,1=0,2=1. Which of the following figures represents the decision
boundary found by your classifier?
Your answer
Score
Choice explanation
In this figure, we transition from negative
to positive when x1 goes from below 6 to
above 6, but for the given values of ,
the transition occurs when x2 goes from
0.00
Total
below 6 to above 6
0.00 / 1.00
Question 3
Suppose you have the following training set, and fit a logistic regression
classifier h(x)=g(0+1x1+2x2).
x
1
x2
0.5
1.5
Which of the following are true? Check all that apply.
The positive and negative examples cannot be separated using a straight line. So,
gradient descent will fail to converge.
At the optimal value of (e.g., found by fminunc), we will have
J()0.
Adding polynomial features (e.g., instead
using h(x)=g(0+1x1+2x2+3x21+4x1x2+5x22) ) would
increase J() because we are now summing over more terms.
J() will be a convex function, so gradient descent should converge to the global
minimum.
Your answer
Scor
e
Choice explanation
While it is true they cannot be
separated, gradient descent
The positive and negative examples cannot be
will still converge to the optimal
separated using a straight line. So, gradient descent
fit. Some examples will remain
will fail to converge.
0.25 misclassified at the optimum.
At the optimal value of (e.g., found by fminunc), we
The cost function J()is
will have J()0.
always non-negative for logistic
0.25 regression.
Adding polynomial features (e.g., instead
0.25 The summation in J()is over
examples, not features.
Furthermore, the hypothesis
will now be more accurate (or
at least just as accurate) with
new features, so the cost
function will decrease.
using h(x)=g(0+1x1+2x2+3x21+4x1x2+
5x22)) would increase J() because we are now
summing over more terms.
The cost function J()is
J() will be a convex function, so gradient descent
should converge to the global minimum.
Total
Question 4
guaranteed to be convex for
0.25 logistic regression.
1.00 / 1.00
For logistic regression, the gradient is given by jJ()=mi=1(h(x(i))
y(i))x(i)j. Which of these is a correct gradient descent update for logistic regression
with a learning rate of ? Check all that apply.
j:=j1mmi=1(Txy(i))x(i)j (simultaneously update for all j).
:=1mmi=1(h(x(i))y(i))x(i).
j:=j1mmi=1(11+eTx(i)y(i))x(i)j (simultaneously update for all j).
:=1mmi=1(Txy(i))x(i).
Your answer
j:=j1mmi=1(Txy(i))x(i)j(simultaneo
usly update for all j).
Scor
e
Choice explanation
This uses the linear regression
hypothesis Tx instead of that for
0.25 linear regression.
This is a vectorized version of the
direct substitution of jJ() into
:=1mmi=1(h(x(i))y(i))x(i).
0.25 the gradient descent update.
j:=j1mmi=1(11+eTx(i)
0.25 This substitutes the exact form
of h(x(i)) used by logistic
regression into the gradient
y(i))x(i)j(simultaneously update for all j).
descent update.
This vectorized version uses the
linear regression
hypothesis Tx instead of that for
:=1mmi=1(Txy(i))x(i).
0.25 logistic regression.
Total
1.00 / 1.00
Question 5
Which of the following statements are true? Check all that apply.
The cost function J() for logistic regression trained with m1 examples is
always greater than or equal to zero.
The sigmoid function g(z)=11+ez is never greater than one (>1).
For logistic regression, sometimes gradient descent will converge to a local
minimum (and fail to find the global minimum). This is the reason we prefer more
advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/LBFGS/etc).
Linear regression always works well for classification if you classify by using a
threshold on the prediction made by linear regression.
Your answer
Score
Choice explanation
The cost for any example x(i) is
always 0since it is the negative log of
a quantity less than one. The cost
function J() is a summation over the
The cost function J() for logistic
regression trained with m1 examples
is always greater than or equal to zero. 0.00
cost for each eample, so the cost
function itself must be greater than or
equal to zero.
The denomiator ranges
The sigmoid function g(z)=11+ez is
never greater than one (>1).
from to 1 as zgrows, so the result is
0.25
For logistic regression, sometimes
0.00
gradient descent will converge to a local
minimum (and fail to find the global
minimum). This is the reason we prefer
always in (0,1).
The cost function for logistic regression
is convex, so gradient descent will
always converge to the global minimum.
We still might use a more advanded
more advanced optimization algorithms
such as fminunc (conjugate
gradient/BFGS/L-BFGS/etc).
optimization algorithm since they can be
faster and don't require you to select a
learning rate.
Linear regression always works well for
classification if you classify by using a
threshold on the prediction made by
linear regression.
0.25
As demonstrated in the lecture, linear
regression often classifies poorly since
its training prodcedure focuses on
predicting real-valued outputs, not
classification.
Total
0.50 / 1.00