-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
MRG: Multi-output decision trees #923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
a25522e
4728c79
eac35cc
064a48c
74bf03c
55dbb49
be8ea69
afacf44
6cf4d26
b22b1f6
7b6ef37
5ee718c
41cd38f
b4131f9
d560372
9f7a0dd
0cae649
e00d789
358884a
c549cb6
0d4719e
18a2e23
5333afa
f178fe6
b14c23a
f1bdd99
f11ff94
264737e
386631e
91963b8
a08a910
637ab82
81a1f90
e5a61dc
94a5f3f
dc8e65a
532c54c
f14601a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,6 +38,8 @@ Some advantages of decision trees are: | |
of variable. See :ref:`algorithms <tree_algorithms>` for more | ||
information. | ||
|
||
- Able to handle multi-output problems. | ||
|
||
- Uses a white box model. If a given situation is observable in a model, | ||
the explanation for the condition is easily explained by boolean logic. | ||
By constrast, in a black box model (e.g., in an artificial neural | ||
|
@@ -49,6 +51,7 @@ Some advantages of decision trees are: | |
- Performs well even if its assumptions are somewhat violated by | ||
the true model from which the data were generated. | ||
|
||
|
||
The disadvantages of decision trees include: | ||
|
||
- Decision-tree learners can create over-complex trees that do not | ||
|
@@ -78,6 +81,7 @@ The disadvantages of decision trees include: | |
It is therefore recommended to balance the dataset prior to fitting | ||
with the decision tree. | ||
|
||
|
||
.. _tree_classification: | ||
|
||
Classification | ||
|
@@ -87,8 +91,8 @@ Classification | |
classification on a dataset. | ||
|
||
As other classifiers, :class:`DecisionTreeClassifier` take as input two | ||
arrays: an array X of size [n_samples, n_features] holding the training | ||
samples, and an array Y of integer values, size [n_samples], holding | ||
arrays: an array X of size ``[n_samples, n_features]`` holding the training | ||
samples, and an array Y of integer values, size ``[n_samples]``, holding | ||
the class labels for the training samples:: | ||
|
||
>>> from sklearn import tree | ||
|
@@ -147,6 +151,7 @@ After being fitted, the model can then be used to predict new values:: | |
|
||
* :ref:`example_tree_plot_iris.py` | ||
|
||
|
||
.. _tree_regression: | ||
|
||
Regression | ||
|
@@ -177,6 +182,67 @@ instead of integer values:: | |
|
||
* :ref:`example_tree_plot_tree_regression.py` | ||
|
||
|
||
.. _tree_multioutput: | ||
|
||
Multi-output problems | ||
===================== | ||
|
||
A multi-output problem is a supervised learning problem with several outputs | ||
to predict, that is when Y is a 2d array of size ``[n_samples, n_outputs]``. | ||
|
||
When there is no correlation between the outputs, a very simple way to solve | ||
this kind of problem is to build n independent models, i.e. one for each | ||
output, and then to use those models to independently predict each one of the n | ||
outputs. However, because it is likely that the output values related to the | ||
same input are themselves correlated, an often better way is to build a single | ||
model capable of predicting simultaneously all n outputs. First, it requires | ||
lower training time since only a single estimator is built. Second, the | ||
generalization accuracy of the resulting estimator may often be increased. | ||
|
||
With regard to decision trees, this strategy can readily be used to support | ||
multi-output problems. This requires the following changes: | ||
|
||
- Store n output values in leaves, instead of 1; | ||
- Use splitting criteria that compute the average reduction across all | ||
n outputs. | ||
|
||
This module offers support for multi-output problems by implementing this | ||
strategy in both :class:`DecisionTreeClassifier` and | ||
:class:`DecisionTreeRegressor`. If a decision tree is fit on an output array Y | ||
of size ``[n_samples, n_outputs]`` then the resulting estimator will: | ||
|
||
* Output n_output values upon ``predict``; | ||
|
||
* Output a list of n_output arrays of class probabilities upon | ||
``predict_proba``. | ||
|
||
|
||
The use of multi-output trees for regression is demonstrated in | ||
:ref:`example_tree_plot_tree_regression_multioutput.py`. In this example, the input | ||
X is a single real value and the outputs Y are the sine and cosine of X. | ||
|
||
.. figure:: ../auto_examples/tree/images/plot_tree_regression_multioutput_1.png | ||
:target: ../auto_examples/tree/plot_tree_regression_multioutput.html | ||
:scale: 75 | ||
:align: center | ||
|
||
The use of multi-output trees for classification is demonstrated in | ||
:ref:`example_ensemble_plot_forest_multioutput.py`. In this example, the inputs | ||
X are the pixels of the upper half of faces and the outputs Y are the pixels of | ||
the lower half of those faces. | ||
|
||
.. figure:: ../auto_examples/ensemble/images/plot_forest_multioutput_1.png | ||
:target: ../auto_examples/ensemble/plot_forest_multioutput.html | ||
:scale: 75 | ||
:align: center | ||
|
||
.. topic:: Examples: | ||
|
||
* :ref:`example_tree_plot_tree_regression_multioutput.py` | ||
* :ref:`example_ensemble_plot_forest_multioutput.py` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rather than just linking here, please include an inline plot + a small paragraph explaining what are the inputs and the outputs for this example. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done :) |
||
|
||
|
||
.. _tree_complexity: | ||
|
||
Complexity | ||
|
@@ -228,6 +294,7 @@ slowing down the algorithm significantly. | |
|
||
Tips on practical use | ||
===================== | ||
|
||
* Decision trees tend to overfit on data with a large number of features. | ||
Getting the right ratio of samples to number of features is important, since | ||
a tree with few samples in high dimensional space is very likely to overfit. | ||
|
@@ -259,6 +326,7 @@ Tips on practical use | |
* All decision trees use Fortran ordered ``np.float32`` arrays internally. | ||
If training data is not in this format, a copy of the dataset will be made. | ||
|
||
|
||
.. _tree_algorithms: | ||
|
||
Tree algorithms: ID3, C4.5, C5.0 and CART | ||
|
@@ -297,6 +365,7 @@ scikit-learn uses an optimised version of the CART algorithm. | |
.. _ID3: http://en.wikipedia.org/wiki/ID3_algorithm | ||
.. _CART: http://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees | ||
|
||
|
||
.. _tree_mathematical_formulation: | ||
|
||
Mathematical formulation | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
""" | ||
========================================= | ||
Face completion with multi-output forests | ||
========================================= | ||
|
||
This example shows the use of multi-output forests to complete images. | ||
The goal is to predict the lower half of a face given its upper half. | ||
|
||
The first row of images shows true faces. The second half illustrates | ||
how the forest completes the lower half of those faces. | ||
|
||
""" | ||
print __doc__ | ||
|
||
import numpy as np | ||
import pylab as pl | ||
|
||
from sklearn.datasets import fetch_olivetti_faces | ||
from sklearn.ensemble import ExtraTreesRegressor | ||
from sklearn.tree import DecisionTreeRegressor | ||
|
||
|
||
# Load the faces datasets | ||
data = fetch_olivetti_faces() | ||
targets = data.target | ||
|
||
data = data.images.reshape((len(data.images), -1)) | ||
train = data[targets < 30] | ||
test = data[targets >= 30] # Test on independent people | ||
n_pixels = data.shape[1] | ||
|
||
X_train = train[:, :int(0.5 * n_pixels)] # Upper half of the faces | ||
Y_train = train[:, int(0.5 * n_pixels):] # Lower half of the faces | ||
X_test = test[:, :int(0.5 * n_pixels)] | ||
Y_test = test[:, int(0.5 * n_pixels):] | ||
|
||
# Build a multi-output forest | ||
forest = ExtraTreesRegressor(n_estimators=10, | ||
max_features=32, | ||
random_state=0) | ||
|
||
forest.fit(X_train, Y_train) | ||
Y_test_predict = forest.predict(X_test) | ||
|
||
# Plot the completed faces | ||
n_faces = 5 | ||
image_shape = (64, 64) | ||
|
||
pl.figure(figsize=(2. * n_faces, 2.26 * 2)) | ||
pl.suptitle("Face completion with multi-output forests", size=16) | ||
|
||
for i in xrange(1, 1 + n_faces): | ||
face_id = np.random.randint(X_test.shape[0]) | ||
|
||
true_face = np.hstack((X_test[face_id], Y_test[face_id])) | ||
completed_face = np.hstack((X_test[face_id], Y_test_predict[face_id])) | ||
|
||
pl.subplot(2, n_faces, i) | ||
pl.axis("off") | ||
pl.imshow(true_face.reshape(image_shape), | ||
cmap=pl.cm.gray, | ||
interpolation="nearest") | ||
|
||
pl.subplot(2, n_faces, n_faces + i) | ||
pl.axis("off") | ||
pl.imshow(completed_face.reshape(image_shape), | ||
cmap=pl.cm.gray, | ||
interpolation="nearest") | ||
|
||
pl.show() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
""" | ||
=================================================================== | ||
Multi-output Decision Tree Regression | ||
=================================================================== | ||
|
||
Multi-output regression with :ref:`decision trees <tree>`: the decision tree | ||
is used to predict simultaneously the noisy x and y observations of a circle | ||
given a single underlying feature. As a result, it learns local linear | ||
regressions approximating the circle. | ||
|
||
We can see that if the maximum depth of the tree (controlled by the | ||
`max_depth` parameter) is set too high, the decision trees learn too fine | ||
details of the training data and learn from the noise, i.e. they overfit. | ||
""" | ||
print __doc__ | ||
|
||
import numpy as np | ||
|
||
# Create a random dataset | ||
rng = np.random.RandomState(1) | ||
X = np.sort(200 * rng.rand(100, 1) - 100, axis=0) | ||
y = np.array([np.pi * np.sin(X).ravel(), np.pi * np.cos(X).ravel()]).T | ||
y[::5,:] += (0.5 - rng.rand(20,2)) | ||
|
||
# Fit regression model | ||
from sklearn.tree import DecisionTreeRegressor | ||
|
||
clf_1 = DecisionTreeRegressor(max_depth=2) | ||
clf_2 = DecisionTreeRegressor(max_depth=5) | ||
clf_3 = DecisionTreeRegressor(max_depth=8) | ||
clf_1.fit(X, y) | ||
clf_2.fit(X, y) | ||
clf_3.fit(X, y) | ||
|
||
# Predict | ||
X_test = np.arange(-100.0, 100.0, 0.01)[:, np.newaxis] | ||
y_1 = clf_1.predict(X_test) | ||
y_2 = clf_2.predict(X_test) | ||
y_3 = clf_3.predict(X_test) | ||
|
||
# Plot the results | ||
import pylab as pl | ||
|
||
pl.figure() | ||
pl.scatter(y[:,0], y[:,1], c="k", label="data") | ||
pl.scatter(y_1[:,0], y_1[:,1], c="g", label="max_depth=2") | ||
pl.scatter(y_2[:,0], y_2[:,1], c="r", label="max_depth=5") | ||
pl.scatter(y_3[:,0], y_3[:,1], c="b", label="max_depth=8") | ||
pl.xlim([-6, 6]) | ||
pl.ylim([-6, 6]) | ||
pl.xlabel("data") | ||
pl.ylabel("target") | ||
pl.title("Multi-output Decision Tree Regression") | ||
pl.legend() | ||
pl.show() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a link to the face completion example here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.