Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] RandomActivation #4703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 34 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
cb63500
First version of RandomActivation
IssamLaradji May 11, 2015
d50aa32
added activation function support
IssamLaradji May 11, 2015
4846ed8
removed the identity function
IssamLaradji May 11, 2015
97fb1f8
renamed to randombasisfunction
IssamLaradji May 12, 2015
2347e5e
1) added example
IssamLaradji May 24, 2015
60f6e9b
added sanity checks.
IssamLaradji May 24, 2015
93f1b98
added the second example that illustrates neural network behavior wit…
IssamLaradji May 25, 2015
e235ccf
fixed relative import
IssamLaradji May 26, 2015
c3b2181
removed softmax; added doc and plotting code.
IssamLaradji Jun 7, 2015
bbfc4f7
moved randombasisfunction to the unsupervised section and made changes
IssamLaradji Jun 13, 2015
52a6d52
fixed typos and made use of validation curve
IssamLaradji Jun 13, 2015
8041c66
fixed typos and ordering of the docs
IssamLaradji Jun 13, 2015
0c6b699
fixed .rst typos
IssamLaradji Jun 14, 2015
33a553f
fixed a small typo in the .rst file
IssamLaradji Jun 14, 2015
689a407
fixed __init__ ordering and updated references and the "see other" se…
IssamLaradji Oct 27, 2015
2fd5b62
First version of RandomActivation
IssamLaradji May 11, 2015
607ffc9
added activation function support
IssamLaradji May 11, 2015
874698a
removed the identity function
IssamLaradji May 11, 2015
49cc3ca
renamed to randombasisfunction
IssamLaradji May 12, 2015
f24613a
1) added example
IssamLaradji May 24, 2015
ae9e0dc
added sanity checks.
IssamLaradji May 24, 2015
fa83d4f
added the second example that illustrates neural network behavior wit…
IssamLaradji May 25, 2015
f401be7
fixed relative import
IssamLaradji May 26, 2015
132e45c
removed softmax; added doc and plotting code.
IssamLaradji Jun 7, 2015
e8541b4
moved randombasisfunction to the unsupervised section and made changes
IssamLaradji Jun 13, 2015
894797e
fixed typos and made use of validation curve
IssamLaradji Jun 13, 2015
258c5f7
fixed typos and ordering of the docs
IssamLaradji Jun 13, 2015
09d92fc
fixed .rst typos
IssamLaradji Jun 14, 2015
089ee6b
fixed a small typo in the .rst file
IssamLaradji Jun 14, 2015
0931713
fixed __init__ ordering and updated references and the "see other" se…
IssamLaradji Oct 27, 2015
5f3ab9f
Combined the two plotting examples into one. Moved random NN to the u…
IssamLaradji Nov 4, 2015
2a25836
Merge remote-tracking branch 'origin/RandomActivation' into RandomAct…
IssamLaradji Nov 4, 2015
a104cd2
Moved random NN to the unsupervised section of the documentation.
IssamLaradji Nov 4, 2015
25444d9
using MLPs base instead. Updated __init__ to include RandomBasisFunct…
IssamLaradji Nov 4, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions doc/modules/neural_networks_unsupervised.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,132 @@ Neural network models (unsupervised)
.. currentmodule:: sklearn.neural_network


.. _random_basis_function:

Random basis function
=====================

The Random basis function :math: `f(X): R \rightarrow R` that maps matrix
:math: `X` into another feature space where the number of features is less, equal
or higher than the original feature space. The output matrix :math: `H` is
computed as follows:

.. math::

H = g(Xw + b)

where :math: `g(\cdot): R \rightarrow R` is the activation function, :math: `w`
is the weight parameter vector, and :math: `b` is the intercept vector.

:math: `w \in R^{d \times k}`, and :math: `b \in R^{d}` are generated based
on the uniform distribution scaled between two values, set by the user.


The example code below illustrates using this function::

>>> from sklearn.neural_network import RandomBasisFunction
>>> X = [[0, 0], [1, 1]]
>>> fe = RandomBasisFunction(random_state=1, n_outputs=2)
>>> fe.fit(X)
RandomBasisFunction(activation='tanh', intercept=True, n_outputs=2,
random_state=1, weight_scale='auto')
>>> fe.transform(X)
array([[-0.69896184, -0.76098975],
[-0.97981807, -0.73662692]])

This function can be used to initialize a single-hidden layer feedforward network.

Randomly weighted single-hidden layer feedforward network
=========================================================

Randomly weighted neural networks (RW-NN) is a supervised learning algorithm
that trains a single-hidden layer feedforward network (SLFN) with the help of randomization.
It computes :math:`\w1 \in R^{d \times h}`, :math:`\w2 \in R^{h \times o}`, and
:math:`\b \in R^{d}` such that:

.. math::

g(Xw1 + b)w2 \approx y

where :math:`g(\cdot): R \rightarrow R` is the activation function; :math:`w1 \in R^{d \times k}`
is the weight parameter vector between the input layer of the network and
the hidden layer; :math:`w2 \in R^{k \times o}` is the weight parameter vector between the hidden
layer of the network and the output layer; :math:`b \in R^{d}` is the intercept vector
for the hidden layer. Figure 1 shows an example of such network.

.. figure:: ../auto_examples/neural_networks/images/plot_slfn_001.png
:target: ../auto_examples/neural_networks/plot_slfn.html
:align: center
:scale: 100%

The algorithm takes the following steps:

* Generate the matrices :math:`w1 \in R^{d \times k}` and :math:`b \in R^d` with random values using the uniform distribution;
* compute :math:`H = g(Xw1 + b)`; and
* solve for :math:`w2` using a linear model, such as, ridge regression which is defined as :math:`(H^T H + (1 / C) * I)^{-1} H^T y` - where
`C` is the regularization term.

:math:`k` is the number of hidden neurons. Larger :math:`k` allows for higher capacity to learn complex functions.
:math:`H`, or the values in the hidden neurons, represent random combinations of the training dataset features that are randomly weighted.
This technique provides an approximation of the solution returned by training SLFN using backpropagation. This is because
unlike backpropagation, this technique does not propagate the errors resulting from solving :math:`w2` to the previous layer.

For classification, one can use a pipeline comprising the :class:`RandomBasisFunction` and :class:`RidgeClassifier` as
shown in the following example::

>>> from sklearn.neural_network import RandomBasisFunction
>>> from sklearn.linear_model import RidgeClassifier
>>> from sklearn.pipeline import make_pipeline

>>> X = [[0, 0], [1, 1]]
>>> y = [0, 1]

>>> reg = make_pipeline(RandomBasisFunction(random_state=1), RidgeClassifier(alpha=0))
>>> reg.fit(X, y)
Pipeline(steps=[('randombasisfunction', RandomBasisFunction(activation='tanh', intercept=True, n_outputs=10,
random_state=1, weight_scale='auto')), ('ridgeclassifier', RidgeClassifier(alpha=0, class_weight=None, copy_X=True, fit_intercept=True,
max_iter=None, normalize=False, solver='auto', tol=0.001))])

>>> reg.predict(X)
array([0, 1])

For regression, one can use a pipeline comprising the :class:`RandomBasisFunction` and :class:`Ridge` as
shown in the following example::

>>> from sklearn.neural_network import RandomBasisFunction
>>> from sklearn.linear_model import Ridge
>>> from sklearn.pipeline import make_pipeline

>>> X = [[0, 0], [1, 1]]
>>> y = [0.5, 0.2]

>>> reg = make_pipeline(RandomBasisFunction(random_state=1), Ridge(alpha=0))
>>> reg.fit(X, y)
Pipeline(steps=[('randombasisfunction', RandomBasisFunction(activation='tanh', intercept=True, n_outputs=10,
random_state=1, weight_scale='auto')), ('ridge', Ridge(alpha=0, copy_X=True, fit_intercept=True, max_iter=None,
normalize=False, solver='auto', tol=0.001))])

>>> reg.predict(X)
array([ 0.5, 0.2])

The references below show examples of how tuning some of the hyper-parameters of the pipeline affect the resulting
decision function::

* :ref:`example_neural_networks_plot_random_neural_network.py`



.. topic:: References:

* `"Understanding the difficulty of training deep feedforward neural networks."
<http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf>`_
Schmidt, Wouter F., Martin A. Kraaijveld, and Robert PW Duin.

* `"Feedforward neural networks with random weights."
<http://homepage.tudelft.nl/a9p19/papers/icpr_92_random.pdf>`_
Schmidt, Wouter F., Martin A. Kraaijveld, and Robert PW Duin.


.. _rbm:

Restricted Boltzmann machines
Expand Down
163 changes: 163 additions & 0 deletions examples/neural_networks/plot_random_neural_network.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
"""
===========================================
Effect of parameters in RandomBasisFunction
===========================================

This example generates plots that illustrate the impact of varying the RandomBasisFunction parameters on the decision
function of the random neural network model.

This generates three plots, each corresponding to varying one single parameter. The plots correspond to varying the
parameter alpha, weight_scale, and n_output, respectively.

If there is high bias in the model, which can lead to a high training error, then decreasing alpha,
increasing weight_scale, and/or increasing n_output decreases bias and therefore reduces underfitting.
Similarly, if there is high variance in the model, which is when the training error poorly approximates the testing
error, then increasing alpha, decreasing weight_scale, and/or decreasing n_output would decrease variance and therefore
reduces overfitting.

One way to find a balance between bias and variance when tuning these parameters is by
testing a range of values using cross-validation as seen in this example.

"""
print(__doc__)


# Author: Issam H. Laradji
# License: BSD 3 clause

import numpy as np

from matplotlib import pyplot as plt
from matplotlib.colors import ListedColormap

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.neural_network import RandomBasisFunction
from sklearn.linear_model import Ridge
from sklearn.pipeline import make_pipeline
from sklearn.utils.fixes import expit as logistic_sigmoid


# To be removed (no predict_proba in Ridge)
def predict_proba(clf, x):
return logistic_sigmoid(clf.predict(x))

h = .02 # step size in the mesh
rng = np.random.RandomState(1)

alpha_list = np.logspace(-4, 4, 5)
weight_scale_list = np.logspace(-2, 2, 5)
n_outputs_list = [2, 10, 100, 200, 500]



def plot(names, classifiers, title):
X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
random_state=rng, n_clusters_per_class=1)

linearly_separable = (X, y)

datasets = [make_moons(noise=1., random_state=rng),
make_circles(noise=0.2, factor=0.5, random_state=rng),
linearly_separable]

figure = plt.figure(figsize=(17, 9))
figure.suptitle(title)
i = 1
# iterate over datasets
for X, y in datasets:
# initialize standard scaler
scaler = StandardScaler()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4,
random_state=1)
# Compute the mean and standard deviation of each feature of the
# training set and scale the training set
X_train = scaler.fit_transform(X_train)

# Using the same mean and standard deviation, scale the testing set
X_test = scaler.transform(X_test)

x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))

# just plot the dataset first
cm_bright = ListedColormap(['#FF0000', '#0000FF'])
ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
# Plot the training points
ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright)
# and testing points
ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright,
alpha=0.6)
ax.set_xlim(xx.min(), xx.max())
ax.set_ylim(yy.min(), yy.max())
ax.set_xticks(())
ax.set_yticks(())
i += 1

# iterate over classifiers
for name, clf in zip(names, classifiers):
ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
clf.fit(X_train, y_train)
score = clf.score(X_test, y_test)

# Plot the decision boundary.
Z = predict_proba(clf, np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot
Z = Z.reshape(xx.shape)

ax.contourf(xx, yy, Z, cmap=plt.cm.RdBu, alpha=.8)

# Plot also the training points
ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright)
# and testing points
ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright,
alpha=0.6)

ax.set_xlim(xx.min(), xx.max())
ax.set_ylim(yy.min(), yy.max())
ax.set_xticks(())
ax.set_yticks(())
ax.set_title(name)
ax.text(xx.max() - .3, yy.min() + .3, ('%.2f' % score).lstrip('0'),
size=15, horizontalalignment='right')
i += 1

classifiers = []
names = []
for alpha in alpha_list:
clf = make_pipeline(RandomBasisFunction(weight_scale=1.), Ridge(alpha=alpha))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should random_state be fixed? Should RidgeCV be used? Why not RidgeClassifier[CV]?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this comes from ignorance, but should this not actually be using LogisticRegression?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman This kind of network (e.g. Extreme Learning Machines) usually uses linear/ridge regression so it can also take advantage of the closed-form solution. Ridge can be used for regression problems and RidgeClassifier for classification problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman ridge was used in conjunction with an ad-hoc predict_proba at line 46 above.

From what I can see, LinearClassifierMixin still doesn't provide a generic predict_proba (and I'm not sure that it should). But RidgeClassifier maybe could.

I guess it's not an issue with this PR either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a predict_proba anyhow? It should just be decision_function, right?
And then we can savely use ridge (if we want to allude to the EML stuff). Not that ridge is actually faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Do you think this PR is ready for its final review ? sorry if I have overlooked some comments about the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't address the predict_proba comment.


classifiers.append(clf)
names.append("alpha = " + str(alpha))

title = "Effect of varying alpha for fixed weight_scale=1"
plot(names, classifiers, title)

classifiers = []
names = []
for weight_scale in weight_scale_list:
clf = make_pipeline(RandomBasisFunction(weight_scale=weight_scale), Ridge(alpha=1.))

classifiers.append(clf)
names.append("weight_scale = " + str(weight_scale))

title = "Effect of varying weight_scale for fixed alpha=1"
plot(names, classifiers, title)

classifiers = []
names = []
for n_outputs in n_outputs_list:
clf = make_pipeline(RandomBasisFunction(n_outputs=n_outputs), Ridge(alpha=1.))

classifiers.append(clf)
names.append("n_output = " + str(n_outputs))

title = "Effect of varying n_output in RandomBasisFunction"
plot(names, classifiers, title)

plt.show()
71 changes: 71 additions & 0 deletions examples/neural_networks/plot_random_nn_overfitting.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
"""
===========================================================================
Impact of increasing the number of hidden neurons in random neural networks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove "increasing". The title is already pretty long. Do you think it would make sense to merge this with the other example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, I think it's better to merge the two examples together into one example with a global title "Effect of parameters in RandomBasisFunction"

===========================================================================

This illustrates how the random neural network behaves when increasing
the number of hidden neurons. Larger number of hidden neurons increases
training score, but might reduce the testing score as a result of overfitting.

The example generates a plot showing the how training and testing scores change
with the number of hidden neurons on a small dataset.

"""
print(__doc__)


# Author: Issam H. Laradji
# License: BSD 3 clause

import numpy as np

from sklearn.neural_network import RandomBasisFunction
from sklearn.linear_model import Ridge
from sklearn.pipeline import make_pipeline
from sklearn.learning_curve import validation_curve

###############################################################################
# Generate sample data
n_samples_train, n_samples_test = 100, 50
n_features = 50

np.random.seed(0)

coef = np.random.randn(n_features)
X = np.random.randn(n_samples_train + n_samples_test, n_features)
y = np.dot(X, coef)

# Split train and test data
X_train, X_test = X[:n_samples_train], X[n_samples_train:]
y_train, y_test = y[:n_samples_train], y[n_samples_train:]

###############################################################################
# Compute train and test errors
n_hidden_list = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

rnn = make_pipeline(RandomBasisFunction(), Ridge(alpha=0))

train_scores, test_scores = validation_curve(rnn, X, y,
param_name="randombasisfunction__n_outputs",
param_range=n_hidden_list, scoring='r2')

train_scores_mean = np.mean(train_scores, axis=1)
test_scores_mean = np.mean(test_scores, axis=1)


###############################################################################
# Plot results functions

import pylab as pl

pl.plot(n_hidden_list, train_scores_mean, label='Train')
pl.plot(n_hidden_list, test_scores_mean, label='Test')

pl.legend(loc='lower left')
pl.title("Random neural network on training vs. testing scores")
pl.xlabel('number of neurons in the hidden layer ')
pl.ylabel('The $R^2$ score')

pl.ylim([0.1, 1.01])

pl.show()
Loading