scikit-learn · glouppe · Jul 9, 2012 · Jun 25, 2012 · Jun 27, 2012 · Jun 28, 2012
diff --git a/doc/modules/ensemble.rst b/doc/modules/ensemble.rst
@@ -50,6 +50,10 @@ target values (class labels) for the training samples::
     >>> clf = RandomForestClassifier(n_estimators=10)
     >>> clf = clf.fit(X, Y)
 
+Like :ref:`decision trees <tree>`, forests of trees also extend
+to :ref:`multi-output problems <tree_multioutput>`  (if Y is an array of size
+``[n_samples, n_outputs]``).
+
 
 Random Forests
 --------------
@@ -161,6 +165,8 @@ amount of time (e.g., on large datasets).
 
  * :ref:`example_ensemble_plot_forest_iris.py`
  * :ref:`example_ensemble_plot_forest_importances_faces.py`
+ * :ref:`example_ensemble_plot_forest_multioutput.py`
+
 
 .. topic:: References
 
@@ -210,6 +216,7 @@ the matching feature to the prediction function.
  * :ref:`example_ensemble_plot_forest_importances_faces.py`
  * :ref:`example_ensemble_plot_forest_importances.py`
 
+
 .. _gradient_boosting:
 
 Gradient Tree Boosting
@@ -471,6 +478,7 @@ can be controled via the ``max_features`` parameter.
  * :ref:`example_ensemble_plot_gradient_boosting_regression.py`
  * :ref:`example_ensemble_plot_gradient_boosting_regularization.py`
 
+
 .. topic:: References
 
  .. [F2001] J. Friedman, "Greedy Function Approximation: A Gradient Boosting Machine",

diff --git a/doc/modules/tree.rst b/doc/modules/tree.rst
@@ -38,6 +38,8 @@ Some advantages of decision trees are:
       of variable. See :ref:`algorithms <tree_algorithms>` for more
       information.
 
+    - Able to handle multi-output problems.
+
     - Uses a white box model. If a given situation is observable in a model,
       the explanation for the condition is easily explained by boolean logic.
       By constrast, in a black box model (e.g., in an artificial neural
@@ -49,6 +51,7 @@ Some advantages of decision trees are:
     - Performs well even if its assumptions are somewhat violated by
       the true model from which the data were generated.
 
+
 The disadvantages of decision trees include:
 
     - Decision-tree learners can create over-complex trees that do not
@@ -78,6 +81,7 @@ The disadvantages of decision trees include:
       It is therefore recommended to balance the dataset prior to fitting
       with the decision tree.
 
+
 .. _tree_classification:
 
 Classification
@@ -87,8 +91,8 @@ Classification
 classification on a dataset.
 
 As other classifiers, :class:`DecisionTreeClassifier` take as input two
-arrays: an array X of size [n_samples, n_features] holding the training
-samples, and an array Y of integer values, size [n_samples], holding
+arrays: an array X of size ``[n_samples, n_features]`` holding the training
+samples, and an array Y of integer values, size ``[n_samples]``, holding
 the class labels for the training samples::
 
     >>> from sklearn import tree
@@ -147,6 +151,7 @@ After being fitted, the model can then be used to predict new values::
 
  * :ref:`example_tree_plot_iris.py`
 
+
 .. _tree_regression:
 
 Regression
@@ -177,6 +182,67 @@ instead of integer values::
 
  * :ref:`example_tree_plot_tree_regression.py`
 
+
+.. _tree_multioutput:
+
+Multi-output problems
+=====================
+
+A multi-output problem is a supervised learning problem with several outputs
+to predict, that is when Y is a 2d array of size ``[n_samples, n_outputs]``.
+
+When there is no correlation between the outputs, a very simple way to solve
+this kind of problem is to build n independent models, i.e. one for each
+output, and then to use those models to independently predict each one of the n
+outputs. However, because it is likely that the output values related to the
+same input are themselves correlated, an often better way is to build a single
+model capable of predicting simultaneously all n outputs. First, it requires
+lower training time since only a single estimator is built. Second, the
+generalization accuracy of the resulting estimator may often be increased.
+
+With regard to decision trees, this strategy can readily be used to support
+multi-output problems. This requires the following changes:
+
+  - Store n output values in leaves, instead of 1;
+  - Use splitting criteria that compute the average reduction across all
+    n outputs.
+
+This module offers support for multi-output problems by implementing this
+strategy in both :class:`DecisionTreeClassifier` and
+:class:`DecisionTreeRegressor`. If a decision tree is fit on an output array Y
+of size ``[n_samples, n_outputs]`` then the resulting estimator will:
+
+  * Output n_output values upon ``predict``;
+
+  * Output a list of n_output arrays of class probabilities upon
+    ``predict_proba``.
+
+
+The use of multi-output trees for regression is demonstrated in
+:ref:`example_tree_plot_tree_regression_multioutput.py`. In this example, the input
+X is a single real value and the outputs Y are the sine and cosine of X.
+
+.. figure:: ../auto_examples/tree/images/plot_tree_regression_multioutput_1.png
+   :target: ../auto_examples/tree/plot_tree_regression_multioutput.html
+   :scale: 75
+   :align: center
+
+The use of multi-output trees for classification is demonstrated in
+:ref:`example_ensemble_plot_forest_multioutput.py`. In this example, the inputs
+X are the pixels of the upper half of faces and the outputs Y are the pixels of
+the lower half of those faces.
+
+.. figure:: ../auto_examples/ensemble/images/plot_forest_multioutput_1.png
+   :target: ../auto_examples/ensemble/plot_forest_multioutput.html
+   :scale: 75
+   :align: center
+
+.. topic:: Examples:
+
+ * :ref:`example_tree_plot_tree_regression_multioutput.py`
+ * :ref:`example_ensemble_plot_forest_multioutput.py`
+
+
 .. _tree_complexity:
 
 Complexity
@@ -228,6 +294,7 @@ slowing down the algorithm significantly.
 
 Tips on practical use
 =====================
+
   * Decision trees tend to overfit on data with a large number of features.
     Getting the right ratio of samples to number of features is important, since
     a tree with few samples in high dimensional space is very likely to overfit.
@@ -259,6 +326,7 @@ Tips on practical use
   * All decision trees use Fortran ordered ``np.float32`` arrays internally.
     If training data is not in this format, a copy of the dataset will be made.
 
+
 .. _tree_algorithms:
 
 Tree algorithms: ID3, C4.5, C5.0 and CART
@@ -297,6 +365,7 @@ scikit-learn uses an optimised version of the CART algorithm.
 .. _ID3: http://en.wikipedia.org/wiki/ID3_algorithm
 .. _CART: http://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees
 
+
 .. _tree_mathematical_formulation:
 
 Mathematical formulation

diff --git a/doc/whats_new.rst b/doc/whats_new.rst
@@ -30,6 +30,9 @@ Changelog
 
    - A common testing framework for all estimators was added.
 
+   - Decision trees and forests of randomized trees now support multi-output
+     classification and regression problems, by `Gilles Louppe`
+
 API changes summary
 -------------------
 

diff --git a/examples/ensemble/plot_forest_importances_faces.py b/examples/ensemble/plot_forest_importances_faces.py
@@ -21,7 +21,7 @@
 # Number of cores to use to perform parallel fitting of the forest model
 n_jobs = 1
 
-# Loading the digits dataset
+# Load the faces dataset
 data = fetch_olivetti_faces()
 X = data.images.reshape((len(data.images), -1))
 y = data.target

diff --git a/examples/ensemble/plot_forest_multioutput.py b/examples/ensemble/plot_forest_multioutput.py
@@ -0,0 +1,70 @@
+"""
+=========================================
+Face completion with multi-output forests
+=========================================
+
+This example shows the use of multi-output forests to complete images.
+The goal is to predict the lower half of a face given its upper half.
+
+The first row of images shows true faces. The second half illustrates
+how the forest completes the lower half of those faces.
+
+"""
+print __doc__
+
+import numpy as np
+import pylab as pl
+
+from sklearn.datasets import fetch_olivetti_faces
+from sklearn.ensemble import ExtraTreesRegressor
+from sklearn.tree import DecisionTreeRegressor
+
+
+# Load the faces datasets
+data = fetch_olivetti_faces()
+targets = data.target
+
+data = data.images.reshape((len(data.images), -1))
+train = data[targets < 30]
+test = data[targets >= 30]  # Test on independent people
+n_pixels = data.shape[1]
+
+X_train = train[:, :int(0.5 * n_pixels)]  # Upper half of the faces
+Y_train = train[:, int(0.5 * n_pixels):]  # Lower half of the faces
+X_test = test[:, :int(0.5 * n_pixels)]
+Y_test = test[:, int(0.5 * n_pixels):]
+
+# Build a multi-output forest
+forest = ExtraTreesRegressor(n_estimators=10,
+                             max_features=32,
+                             random_state=0)
+
+forest.fit(X_train, Y_train)
+Y_test_predict = forest.predict(X_test)
+
+# Plot the completed faces
+n_faces = 5
+image_shape = (64, 64)
+
+pl.figure(figsize=(2. * n_faces, 2.26 * 2))
+pl.suptitle("Face completion with multi-output forests", size=16)
+
+for i in xrange(1, 1 + n_faces):
+    face_id = np.random.randint(X_test.shape[0])
+
+    true_face = np.hstack((X_test[face_id], Y_test[face_id]))
+    completed_face = np.hstack((X_test[face_id], Y_test_predict[face_id]))
+
+    pl.subplot(2, n_faces, i)
+    pl.axis("off")
+    pl.imshow(true_face.reshape(image_shape),
+              cmap=pl.cm.gray,
+              interpolation="nearest")
+
+    pl.subplot(2, n_faces, n_faces + i)
+    pl.axis("off")
+    pl.imshow(completed_face.reshape(image_shape),
+              cmap=pl.cm.gray,
+              interpolation="nearest")
+
+pl.show()
diff --git a/examples/tree/plot_tree_regression_multioutput.py b/examples/tree/plot_tree_regression_multioutput.py
@@ -0,0 +1,55 @@
+"""
+===================================================================
+Multi-output Decision Tree Regression 
+===================================================================
+
+Multi-output regression with :ref:`decision trees <tree>`: the decision tree
+is used to predict simultaneously the noisy x and y observations of a circle
+given a single underlying feature. As a result, it learns local linear
+regressions approximating the circle.
+
+We can see that if the maximum depth of the tree (controlled by the
+`max_depth` parameter) is set too high, the decision trees learn too fine
+details of the training data and learn from the noise, i.e. they overfit.
+"""
+print __doc__
+
+import numpy as np
+
+# Create a random dataset
+rng = np.random.RandomState(1)
+X = np.sort(200 * rng.rand(100, 1) - 100, axis=0)
+y = np.array([np.pi * np.sin(X).ravel(), np.pi * np.cos(X).ravel()]).T
+y[::5,:] += (0.5 - rng.rand(20,2))
+
+# Fit regression model
+from sklearn.tree import DecisionTreeRegressor
+
+clf_1 = DecisionTreeRegressor(max_depth=2)
+clf_2 = DecisionTreeRegressor(max_depth=5)
+clf_3 = DecisionTreeRegressor(max_depth=8)
+clf_1.fit(X, y)
+clf_2.fit(X, y)
+clf_3.fit(X, y)
+
+# Predict
+X_test = np.arange(-100.0, 100.0, 0.01)[:, np.newaxis]
+y_1 = clf_1.predict(X_test)
+y_2 = clf_2.predict(X_test)
+y_3 = clf_3.predict(X_test)
+
+# Plot the results
+import pylab as pl
+
+pl.figure()
+pl.scatter(y[:,0], y[:,1], c="k", label="data")
+pl.scatter(y_1[:,0], y_1[:,1], c="g", label="max_depth=2")
+pl.scatter(y_2[:,0], y_2[:,1], c="r", label="max_depth=5")
+pl.scatter(y_3[:,0], y_3[:,1], c="b", label="max_depth=8")
+pl.xlim([-6, 6])
+pl.ylim([-6, 6])
+pl.xlabel("data")
+pl.ylabel("target")
+pl.title("Multi-output Decision Tree Regression")
+pl.legend()
+pl.show()