Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit ec5002e

Browse files
committed
add py-earth (https://github.com/jcrudy/py-earth) into sklearn/additive
2 parents 0e21f93 + 0c6b121 commit ec5002e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+74297
-0
lines changed

doc/images/hinge.png

22.3 KB
Loading

doc/images/piecewise_linear.png

21 KB
Loading

doc/modules/classes.rst

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1269,6 +1269,26 @@ Low-level methods
12691269
tree.export_graphviz
12701270

12711271

1272+
.. _earth_ref:
1273+
1274+
:mod:`sklearn.earth`: Earth
1275+
===========================
1276+
1277+
.. automodule:: sklearn.earth
1278+
:no-members:
1279+
:no-inherited-members:
1280+
1281+
**User guide:** See the :ref:`earth` section for further details.
1282+
1283+
.. currentmodule:: sklearn
1284+
1285+
.. autosummary::
1286+
:toctree: generated/
1287+
:template: class.rst
1288+
1289+
earth.Earth
1290+
1291+
12721292
.. _utils_ref:
12731293

12741294
:mod:`sklearn.utils`: Utilities

doc/modules/earth.rst

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
.. _earth:
2+
3+
========================================
4+
Multivariate Adaptive Regression Splines
5+
========================================
6+
7+
.. currentmodule:: sklearn.earth
8+
9+
Multivariate adaptive regression splines, implemented by the :class:`EarthRegressor` class, is a method for supervised
10+
learning that is most commonly used for feature extraction and selection. ``EarthRegressor`` models can be thought of as linear models in a higher dimensional
11+
basis space. ``EarthRegressor`` automatically searches for interactions and non-linear relationships. Each term in an ``EarthRegressor`` model is a
12+
product of so called "hinge functions". A hinge function is a function that's equal to its argument where that argument
13+
is greater than zero and is zero everywhere else.
14+
15+
.. math::
16+
\text{h}\left(x-t\right)=\left[x-t\right]_{+}=\begin{cases}
17+
x-t, & x>t\\
18+
0, & x\leq t
19+
\end{cases}
20+
21+
.. image:: ../images/hinge.png
22+
23+
An ``EarthRegressor`` model is a linear combination of basis functions, each of which is a product of one
24+
or more of the following:
25+
26+
1. A constant
27+
2. Linear functions of input variables
28+
3. Hinge functions of input variables
29+
30+
For example, a simple piecewise linear function in one variable can be expressed
31+
as a linear combination of two hinge functions and a constant (see below). During fitting, the ``EarthRegressor`` class
32+
automatically determines which variables and basis functions to use.
33+
The algorithm has two stages. First, the
34+
forward pass searches for terms that locally minimize squared error loss on the training set. Next, a pruning pass selects a subset of those
35+
terms that produces a locally minimal generalized cross-validation (GCV) score. The GCV
36+
score is not actually based on cross-validation, but rather is meant to approximate a true
37+
cross-validation score by penalizing model complexity. The final result is a set of basis functions
38+
that is nonlinear in the original feature space, may include interactions, and is likely to
39+
generalize well.
40+
41+
42+
.. math::
43+
y=1-2\text{h}\left(1-x\right)+\frac{1}{2}\text{h}\left(x-1\right)
44+
45+
46+
.. image:: ../images/piecewise_linear.png
47+
48+
49+
A Simple EarthRegressor Example
50+
----------------------
51+
52+
.. literalinclude:: ../auto_examples/earth/plot_v_function.py
53+
:lines: 13-
54+
55+
56+
.. figure:: ../auto_examples/earth/images/plot_v_function_1.png
57+
:target: ../auto_examples/earth/plot_v_function.html
58+
:align: center
59+
:scale: 75%
60+
61+
62+
.. topic:: Bibliography:
63+
64+
1. Friedman, J. (1991). Multivariate adaptive regression splines. The annals of statistics,
65+
19(1), 1–67. http://www.jstor.org/stable/10.2307/2241837
66+
2. Stephen Milborrow. Derived from mda:mars by Trevor Hastie and Rob Tibshirani.
67+
(2012). earth: Multivariate Adaptive Regression Spline Models. R package
68+
version 3.2-3.
69+
3. Friedman, J. (1993). Fast MARS. Stanford University Department of Statistics, Technical Report No 110.
70+
http://statistics.stanford.edu/~ckirby/techreports/LCS/LCS%20110.pdf
71+
4. Friedman, J. (1991). Estimating functions of mixed ordinal and categorical variables using adaptive splines.
72+
Stanford University Department of Statistics, Technical Report No 108.
73+
http://statistics.stanford.edu/~ckirby/techreports/LCS/LCS%20108.pdf
74+
5. Stewart, G.W. Matrix Algorithms, Volume 1: Basic Decompositions. (1998). Society for Industrial and Applied
75+
Mathematics.
76+
6. Bjorck, A. Numerical Methods for Least Squares Problems. (1996). Society for Industrial and Applied
77+
Mathematics.
78+
7. Hastie, T., Tibshirani, R., & Friedman, J. The Elements of Statistical Learning (2nd Edition). (2009).
79+
Springer Series in Statistics
80+
8. Golub, G., & Van Loan, C. Matrix Computations (3rd Edition). (1996). Johns Hopkins University Press.
81+
82+
83+
References 7, 2, 1, 3, and 4 contain discussions likely to be useful to users. References 1, 2, 6, 5,
84+
8, 3, and 4 are useful in understanding the implementation.

doc/modules/earth_bibliography.bib

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
@book{Bjorck1996,
2+
address = {Philadelphia},
3+
author = {Bjorck, Ake},
4+
isbn = {0898713609},
5+
publisher = {Society for Industrial and Applied Mathematics},
6+
title = {{Numerical Methods for Least Squares Problems}},
7+
year = {1996}
8+
}
9+
@techreport{Friedman1993,
10+
author = {Friedman, Jerome H.},
11+
institution = {Stanford University Department of Statistics},
12+
title = {{Technical Report No. 110: Fast MARS.}},
13+
url = {http://scholar.google.com/scholar?hl=en\&btnG=Search\&q=intitle:Fast+MARS\#0},
14+
year = {1993}
15+
}
16+
@techreport{Friedman1991a,
17+
author = {Friedman, JH},
18+
institution = {Stanford University Department of Statistics},
19+
publisher = {Stanford University Department of Statistics},
20+
title = {{Technical Report No. 108: Estimating functions of mixed ordinal and categorical variables using adaptive splines}},
21+
url = {http://scholar.google.com/scholar?hl=en\&btnG=Search\&q=intitle:Estimating+functions+of+mixed+ordinal+and+categorical+variables+using+adaptive+splines\#0},
22+
year = {1991}
23+
}
24+
@article{Friedman1991,
25+
author = {Friedman, JH},
26+
journal = {The annals of statistics},
27+
number = {1},
28+
pages = {1--67},
29+
title = {{Multivariate adaptive regression splines}},
30+
url = {http://www.jstor.org/stable/10.2307/2241837},
31+
volume = {19},
32+
year = {1991}
33+
}
34+
@book{Golub1996,
35+
author = {Golub, Gene and {Van Loan}, Charles},
36+
edition = {3},
37+
publisher = {Johns Hopkins University Press},
38+
title = {{Matrix Computations}},
39+
year = {1996}
40+
}
41+
@book{Hastie2009,
42+
address = {New York},
43+
author = {Hastie, Trevor and Tibshirani, Robert and Friedman, Jerome},
44+
edition = {2},
45+
publisher = {Springer Science+Business Media},
46+
title = {{Elements of Statistical Learning: Data Mining, Inference, and Prediction}},
47+
year = {2009}
48+
}
49+
@book{Stewart1998,
50+
address = {Philadelphia},
51+
author = {Stewart, G. W.},
52+
isbn = {0898714141},
53+
publisher = {Society for Industrial and Applied Mathematics},
54+
title = {{Matrix Algorithms Volume 1: Basic Decompositions}},
55+
year = {1998}
56+
}
57+
@misc{Millborrow2012,
58+
author = {Millborrow, Stephen},
59+
publisher = {CRAN},
60+
title = {{earth: Multivariate Adaptive Regression Spline Models}},
61+
url = {http://cran.r-project.org/web/packages/earth/index.html},
62+
year = {2012}
63+
}

doc/supervised_learning.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,5 @@ Supervised learning
2222
modules/feature_selection.rst
2323
modules/label_propagation.rst
2424
modules/isotonic.rst
25+
modules/earth.rst
2526
modules/calibration.rst

examples/earth/README.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
.. _earth_examples:
2+
3+
Earth examples
4+
----------------
5+
6+
Examples concerning the :mod:`sklearn.earth` package.

examples/earth/plot_sine_wave.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
'''
2+
==============================================
3+
Fitting an EarthRegressor model to a sine wave
4+
==============================================
5+
6+
7+
In this example, a simple sine model is used to generate an artificial data set. An :class:`EarthRegressor` model
8+
is then fitted to that data set and the resulting predictions are plotted against the original data.
9+
10+
'''
11+
print(__doc__)
12+
13+
import numpy as np
14+
import pylab as pl
15+
from sklearn.earth import EarthRegressor
16+
17+
# Create some fake data
18+
np.random.seed(2)
19+
m = 10000
20+
n = 10
21+
X = 80 * np.random.uniform(size=(m, n)) - 40
22+
y = 100 * \
23+
np.abs(np.sin((X[:, 6]) / 10) - 4.0) + \
24+
20 * np.random.normal(size=m)
25+
26+
# Fit an EarthRegressor model
27+
model = EarthRegressor(max_degree=3, minspan_alpha=.5)
28+
model.fit(X, y)
29+
30+
# Print the model
31+
print(model.trace())
32+
print(model.summary())
33+
34+
# Plot the model
35+
pl.figure()
36+
y_hat = model.predict(X)
37+
pl.plot(X[:, 6], y, 'r.')
38+
pl.plot(X[:, 6], y_hat, 'b.')
39+
pl.show()

examples/earth/plot_v_function.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
'''
2+
======================================================
3+
Fitting an EarthRegressor model to a v-shaped function
4+
======================================================
5+
6+
7+
In this example, a simple piecewise linear model is used to generate an artificial data set. An :class:`Earth` model
8+
is then fitted to that data set and the resulting predictions are plotted against the original data.
9+
10+
'''
11+
print(__doc__)
12+
13+
import numpy as np
14+
from sklearn.earth import EarthRegressor
15+
import pylab as pl
16+
17+
# Create some fake data
18+
np.random.seed(2)
19+
m = 1000
20+
n = 10
21+
X = 80 * np.random.uniform(size=(m, n)) - 40
22+
y = np.abs(X[:, 6] - 4.0) + 5 * np.random.normal(size=m)
23+
24+
# Fit an EarthRegressor model
25+
model = EarthRegressor(max_degree=1)
26+
model.fit(X, y)
27+
28+
# Print the model
29+
print(model.trace())
30+
print(model.summary())
31+
32+
# Plot the model
33+
y_hat = model.predict(X)
34+
pl.figure()
35+
pl.plot(X[:, 6], y, 'r.')
36+
pl.plot(X[:, 6], y_hat, 'b.')
37+
pl.show()

sklearn/additive/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)