-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] RandomActivation #4703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] RandomActivation #4703
Changes from all commits
cb63500
d50aa32
4846ed8
97fb1f8
2347e5e
60f6e9b
93f1b98
e235ccf
c3b2181
bbfc4f7
52a6d52
8041c66
0c6b699
33a553f
689a407
2fd5b62
607ffc9
874698a
49cc3ca
f24613a
ae9e0dc
fa83d4f
f401be7
132e45c
e8541b4
894797e
258c5f7
09d92fc
089ee6b
0931713
5f3ab9f
2a25836
a104cd2
25444d9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,163 @@ | ||
""" | ||
=========================================== | ||
Effect of parameters in RandomBasisFunction | ||
=========================================== | ||
|
||
This example generates plots that illustrate the impact of varying the RandomBasisFunction parameters on the decision | ||
function of the random neural network model. | ||
|
||
This generates three plots, each corresponding to varying one single parameter. The plots correspond to varying the | ||
parameter alpha, weight_scale, and n_output, respectively. | ||
|
||
If there is high bias in the model, which can lead to a high training error, then decreasing alpha, | ||
increasing weight_scale, and/or increasing n_output decreases bias and therefore reduces underfitting. | ||
Similarly, if there is high variance in the model, which is when the training error poorly approximates the testing | ||
error, then increasing alpha, decreasing weight_scale, and/or decreasing n_output would decrease variance and therefore | ||
reduces overfitting. | ||
|
||
One way to find a balance between bias and variance when tuning these parameters is by | ||
testing a range of values using cross-validation as seen in this example. | ||
|
||
""" | ||
print(__doc__) | ||
|
||
|
||
# Author: Issam H. Laradji | ||
# License: BSD 3 clause | ||
|
||
import numpy as np | ||
|
||
from matplotlib import pyplot as plt | ||
from matplotlib.colors import ListedColormap | ||
|
||
from sklearn.model_selection import train_test_split | ||
from sklearn.preprocessing import StandardScaler | ||
from sklearn.datasets import make_moons, make_circles, make_classification | ||
from sklearn.neural_network import RandomBasisFunction | ||
from sklearn.linear_model import Ridge | ||
from sklearn.pipeline import make_pipeline | ||
from sklearn.utils.fixes import expit as logistic_sigmoid | ||
|
||
|
||
# To be removed (no predict_proba in Ridge) | ||
def predict_proba(clf, x): | ||
return logistic_sigmoid(clf.predict(x)) | ||
|
||
h = .02 # step size in the mesh | ||
rng = np.random.RandomState(1) | ||
|
||
alpha_list = np.logspace(-4, 4, 5) | ||
weight_scale_list = np.logspace(-2, 2, 5) | ||
n_outputs_list = [2, 10, 100, 200, 500] | ||
|
||
|
||
|
||
def plot(names, classifiers, title): | ||
X, y = make_classification(n_features=2, n_redundant=0, n_informative=2, | ||
random_state=rng, n_clusters_per_class=1) | ||
|
||
linearly_separable = (X, y) | ||
|
||
datasets = [make_moons(noise=1., random_state=rng), | ||
make_circles(noise=0.2, factor=0.5, random_state=rng), | ||
linearly_separable] | ||
|
||
figure = plt.figure(figsize=(17, 9)) | ||
figure.suptitle(title) | ||
i = 1 | ||
# iterate over datasets | ||
for X, y in datasets: | ||
# initialize standard scaler | ||
scaler = StandardScaler() | ||
|
||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4, | ||
random_state=1) | ||
# Compute the mean and standard deviation of each feature of the | ||
# training set and scale the training set | ||
X_train = scaler.fit_transform(X_train) | ||
|
||
# Using the same mean and standard deviation, scale the testing set | ||
X_test = scaler.transform(X_test) | ||
|
||
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5 | ||
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5 | ||
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), | ||
np.arange(y_min, y_max, h)) | ||
|
||
# just plot the dataset first | ||
cm_bright = ListedColormap(['#FF0000', '#0000FF']) | ||
ax = plt.subplot(len(datasets), len(classifiers) + 1, i) | ||
# Plot the training points | ||
ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright) | ||
# and testing points | ||
ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, | ||
alpha=0.6) | ||
ax.set_xlim(xx.min(), xx.max()) | ||
ax.set_ylim(yy.min(), yy.max()) | ||
ax.set_xticks(()) | ||
ax.set_yticks(()) | ||
i += 1 | ||
|
||
# iterate over classifiers | ||
for name, clf in zip(names, classifiers): | ||
ax = plt.subplot(len(datasets), len(classifiers) + 1, i) | ||
clf.fit(X_train, y_train) | ||
score = clf.score(X_test, y_test) | ||
|
||
# Plot the decision boundary. | ||
Z = predict_proba(clf, np.c_[xx.ravel(), yy.ravel()]) | ||
|
||
# Put the result into a color plot | ||
Z = Z.reshape(xx.shape) | ||
|
||
ax.contourf(xx, yy, Z, cmap=plt.cm.RdBu, alpha=.8) | ||
|
||
# Plot also the training points | ||
ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright) | ||
# and testing points | ||
ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, | ||
alpha=0.6) | ||
|
||
ax.set_xlim(xx.min(), xx.max()) | ||
ax.set_ylim(yy.min(), yy.max()) | ||
ax.set_xticks(()) | ||
ax.set_yticks(()) | ||
ax.set_title(name) | ||
ax.text(xx.max() - .3, yy.min() + .3, ('%.2f' % score).lstrip('0'), | ||
size=15, horizontalalignment='right') | ||
i += 1 | ||
|
||
classifiers = [] | ||
names = [] | ||
for alpha in alpha_list: | ||
clf = make_pipeline(RandomBasisFunction(weight_scale=1.), Ridge(alpha=alpha)) | ||
|
||
classifiers.append(clf) | ||
names.append("alpha = " + str(alpha)) | ||
|
||
title = "Effect of varying alpha for fixed weight_scale=1" | ||
plot(names, classifiers, title) | ||
|
||
classifiers = [] | ||
names = [] | ||
for weight_scale in weight_scale_list: | ||
clf = make_pipeline(RandomBasisFunction(weight_scale=weight_scale), Ridge(alpha=1.)) | ||
|
||
classifiers.append(clf) | ||
names.append("weight_scale = " + str(weight_scale)) | ||
|
||
title = "Effect of varying weight_scale for fixed alpha=1" | ||
plot(names, classifiers, title) | ||
|
||
classifiers = [] | ||
names = [] | ||
for n_outputs in n_outputs_list: | ||
clf = make_pipeline(RandomBasisFunction(n_outputs=n_outputs), Ridge(alpha=1.)) | ||
|
||
classifiers.append(clf) | ||
names.append("n_output = " + str(n_outputs)) | ||
|
||
title = "Effect of varying n_output in RandomBasisFunction" | ||
plot(names, classifiers, title) | ||
|
||
plt.show() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
""" | ||
=========================================================================== | ||
Impact of increasing the number of hidden neurons in random neural networks | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would remove "increasing". The title is already pretty long. Do you think it would make sense to merge this with the other example? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yah, I think it's better to merge the two examples together into one example with a global title "Effect of parameters in RandomBasisFunction" |
||
=========================================================================== | ||
|
||
This illustrates how the random neural network behaves when increasing | ||
the number of hidden neurons. Larger number of hidden neurons increases | ||
training score, but might reduce the testing score as a result of overfitting. | ||
|
||
The example generates a plot showing the how training and testing scores change | ||
with the number of hidden neurons on a small dataset. | ||
|
||
""" | ||
print(__doc__) | ||
|
||
|
||
# Author: Issam H. Laradji | ||
# License: BSD 3 clause | ||
|
||
import numpy as np | ||
|
||
from sklearn.neural_network import RandomBasisFunction | ||
from sklearn.linear_model import Ridge | ||
from sklearn.pipeline import make_pipeline | ||
from sklearn.learning_curve import validation_curve | ||
|
||
############################################################################### | ||
# Generate sample data | ||
n_samples_train, n_samples_test = 100, 50 | ||
n_features = 50 | ||
|
||
np.random.seed(0) | ||
|
||
coef = np.random.randn(n_features) | ||
X = np.random.randn(n_samples_train + n_samples_test, n_features) | ||
y = np.dot(X, coef) | ||
|
||
# Split train and test data | ||
X_train, X_test = X[:n_samples_train], X[n_samples_train:] | ||
y_train, y_test = y[:n_samples_train], y[n_samples_train:] | ||
|
||
############################################################################### | ||
# Compute train and test errors | ||
n_hidden_list = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100] | ||
|
||
rnn = make_pipeline(RandomBasisFunction(), Ridge(alpha=0)) | ||
|
||
train_scores, test_scores = validation_curve(rnn, X, y, | ||
param_name="randombasisfunction__n_outputs", | ||
param_range=n_hidden_list, scoring='r2') | ||
|
||
train_scores_mean = np.mean(train_scores, axis=1) | ||
test_scores_mean = np.mean(test_scores, axis=1) | ||
|
||
|
||
############################################################################### | ||
# Plot results functions | ||
|
||
import pylab as pl | ||
|
||
pl.plot(n_hidden_list, train_scores_mean, label='Train') | ||
pl.plot(n_hidden_list, test_scores_mean, label='Test') | ||
|
||
pl.legend(loc='lower left') | ||
pl.title("Random neural network on training vs. testing scores") | ||
pl.xlabel('number of neurons in the hidden layer ') | ||
pl.ylabel('The $R^2$ score') | ||
|
||
pl.ylim([0.1, 1.01]) | ||
|
||
pl.show() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should
random_state
be fixed? ShouldRidgeCV
be used? Why notRidgeClassifier[CV]
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if this comes from ignorance, but should this not actually be using
LogisticRegression
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman This kind of network (e.g. Extreme Learning Machines) usually uses linear/ridge regression so it can also take advantage of the closed-form solution. Ridge can be used for regression problems and RidgeClassifier for classification problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman ridge was used in conjunction with an ad-hoc
predict_proba
at line 46 above.From what I can see,
LinearClassifierMixin
still doesn't provide a genericpredict_proba
(and I'm not sure that it should). ButRidgeClassifier
maybe could.I guess it's not an issue with this PR either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there a predict_proba anyhow? It should just be
decision_function
, right?And then we can savely use ridge (if we want to allude to the EML stuff). Not that ridge is actually faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Do you think this PR is ready for its final review ? sorry if I have overlooked some comments about the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't address the
predict_proba
comment.