Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 21 additions & 11 deletions doc/modules/ensemble.rst
Original file line number Diff line number Diff line change
Expand Up @@ -251,9 +251,9 @@ the transformation performs an implicit, non-parametric density estimation.
AdaBoost
========

The module :mod:`sklearn.ensemble` implements the popular boosting algorithm
known as AdaBoost. This algorithm was first introduced by Freud and Schapire
[FS1995]_ back in 1995.
The module :mod:`sklearn.ensemble.weight_boosting` implements the popular
boosting algorithm known as AdaBoost introduced in 1995 by Freud and
Schapire [FS1995]_.

The core principle of AdaBoost is to fit a sequence of weak learners (i.e.,
models that are only slightly better than random guessing, such as small
Expand All @@ -266,7 +266,7 @@ to each of the training samples. Initially, those weights are all set to
original data. For each successive iteration, the sample weights are
individually modified and the learning algorithm is reapplied to the reweighted
data. At a given step, those training examples that were incorrectly predicted
by the boosting model induced at the previous step have their weights increased,
by the boosted model induced at the previous step have their weights increased,
whereas the weights are decreased for those that were predicted correctly. As
iterations proceed, examples that are difficult to predict receive
ever-increasing influence. Each subsequent weak learner is thereby forced to
Expand Down Expand Up @@ -306,15 +306,25 @@ The number of weak learners is controlled by the parameter ``n_estimators``. The
the final combination. By default, weak learners are decision stumps. Different
weak learners can be specified through the ``base_estimator`` parameter.
The main parameters to tune to obtain good results are ``n_estimators`` and
the complexity of the base estimators (e.g., its depth ``max_depth`` in case
of decision trees).
the complexity of the base estimators (e.g., its depth ``max_depth`` or
minimum required number of samples at a leaf ``min_samples_leaf`` in case of
decision trees).

.. topic:: Examples:

* :ref:`example_ensemble_plot_adaboost_hastie_10_2.py`
* :ref:`example_ensemble_plot_adaboost_multiclass.py`
* :ref:`example_ensemble_plot_adaboost_regression.py`
* :ref:`example_ensemble_plot_adaboost_twoclass.py`
* :ref:`example_ensemble_plot_adaboost_hastie_10_2.py` compares the
classification error of a decision stump, decision tree, and a boosted
decision stump using AdaBoost-SAMME and AdaBoost-SAMME.R.

* :ref:`example_ensemble_plot_adaboost_multiclass.py` shows the performance
of AdaBoost-SAMME and AdaBoost-SAMME.R on a multi-class problem.

* :ref:`example_ensemble_plot_adaboost_twoclass.py` shows the decision boundary
and decision function values for a non-linearly separable two-class problem
using AdaBoost-SAMME.

* :ref:`example_ensemble_plot_adaboost_regression.py` demonstrates regression
with the AdaBoost.R2 algorithm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, add the other two examples (hastie and multi-class) as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like some file names have changed::

plot_adaboost_hastie_10_2.py
plot_adaboost_multiclass.py
plot_adaboost_regression.py
plot_adaboost_twoclass.py

.. topic:: References

Expand All @@ -324,7 +334,7 @@ of decision trees).
.. [ZZRH2009] J. Zhu, H. Zou, S. Rosset, T. Hastie. "Multi-class AdaBoost",
2009.

.. [D1997] H. Drucker. "Improving Regressor using Boosting Techniques", 1997.
.. [D1997] H. Drucker. "Improving Regressors using Boosting Techniques", 1997.

.. [HTF2009] T. Hastie, R. Tibshirani and J. Friedman, "Elements of
Statistical Learning Ed. 2", Springer, 2009.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way you could add a link to a pdf or a page giving a pdf (not behind a paywall) for the references. I know that at least some of these references are downloadable from the web (in particular the elements of statistical learning).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these work?

http://cns.bu.edu/~gsc/CN710/FreundSc95.pdf
http://www.stanford.edu/~hastie/Papers/samme.pdf

I cannot find an open link for the Drucker paper. I am in the CERN network and they must have access to all the journals since I am able to download them. Hmm... a copy may have appeared here ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expand Down
6 changes: 2 additions & 4 deletions examples/ensemble/plot_adaboost_regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,8 @@

clf_1 = DecisionTreeRegressor(max_depth=4)

clf_2 = AdaBoostRegressor(
DecisionTreeRegressor(max_depth=4),
n_estimators=300,
random_state=rng)
clf_2 = AdaBoostRegressor(DecisionTreeRegressor(max_depth=4),
n_estimators=300, random_state=rng)

clf_1.fit(X, y)
clf_2.fit(X, y)
Expand Down
10 changes: 5 additions & 5 deletions examples/ensemble/plot_adaboost_twoclass.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,9 @@
y = np.concatenate((y1, - y2 + 1))

# Create and fit an AdaBoosted decision tree
bdt = AdaBoostClassifier(
DecisionTreeClassifier(max_depth=1),
algorithm="SAMME",
n_estimators=200)
bdt = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1),
algorithm="SAMME",
n_estimators=200)

bdt.fit(X, y)

Expand Down Expand Up @@ -68,7 +67,8 @@
pl.scatter(X[idx, 0], X[idx, 1],
c=c, cmap=pl.cm.Paired,
label="Class %s" % n)
pl.axis("tight")
pl.xlim(x_min, x_max)
pl.ylim(y_min, y_max)
pl.legend(loc='upper right')
pl.xlabel("Decision Boundary")

Expand Down