[MRG] GSoC 2014: Standard Extreme Learning Machines #3306

IssamLaradji · 2014-06-22T08:34:21Z

Finished implementing the standard extreme learning machines (ELMs). I am getting the following results with 550 hidden neurons against the digits datasets,

Training accuracy using the logistic activation function: 0.999444
Training accuracy using the tanh activation function: 1.000000

Fortunately, this algorithm is much easier to implement and debug than multi-layer perceptron :).
I will push a test file soon.

@ogrisel , @larsmans

coveralls · 2014-06-22T18:22:14Z

Coverage increased (+0.0%) when pulling e5e363d on IssamLaradji:Extreme-Learning-Machines into 68b0a28 on scikit-learn:master.

larsmans · 2014-06-23T09:10:14Z

sklearn/neural_network/extreme_learning_machines.py

+            Training data, where n_samples in the number of samples
+            and n_features is the number of features.
+
+        y : numpy array of shape (n_samples)


y should be an "array-like" and be validated as such.

Thanks for bringing this up. I made the changes in multi-layer perceptron as well.

IssamLaradji · 2014-06-27T08:57:40Z

Hi, I am wondering what verbose extreme learning machines should display. Any ideas ?

Thanks

IssamLaradji · 2014-06-30T06:42:12Z

Travis is acting strange, in that it raises an error for test_multilabel_classification(), although, in my local machine, the test_multilabel_classification() method in test_elm runs correctly with 1000 different seeds. Also, the pull request passed the local test after executing make test on the whole library.

Is there a chance that Travis uses libraries different (or a modified version) from the local for testing ?

arjoly · 2014-06-30T08:01:46Z

This might be worth having a look at https://github.com/dclambert/Python-ELM.

larsmans · 2014-06-30T11:15:56Z

Training squared error loss would seem appropriate for verbose output. Not every estimator has verbose output, though (naive Bayes doesn't because it runs instantly on typical problem sizes).

coveralls · 2014-06-30T22:27:38Z

Coverage increased (+0.07%) when pulling 2be2941 on IssamLaradji:Extreme-Learning-Machines into 68b0a28 on scikit-learn:master.

IssamLaradji · 2014-07-01T02:29:13Z

Thanks, displaying the training error as verbose is such a useful idea.

ogrisel · 2014-07-01T18:39:34Z

However, Travis raises an error for test_multilabel_classification(). Is there a chance that Travis uses libraries different (or a modified version) from the local for testing ?

The version of numpy / scipy used by the various travis workers are given in the environment variable of each build. You can see the exact setup in:

IssamLaradji · 2014-07-02T02:26:38Z

@ogrisel thanks I will dig deeper to see where multi-label classification is being affected.

IssamLaradji · 2014-07-02T02:41:58Z

Hi guys, I implemented weighted and regularized ELMs - here are their awesome results on the imbalanced dataset. :) :)

Non-Regularized ELMs (Large C)

Regularized ELMs (Small C)

agramfort · 2014-07-02T12:49:09Z

sklearn/neural_network/extreme_learning_machines.py

+
+        # compute regularized output coefficients using eq. 3 in reference [1]
+        left_part = pinv2(
+            safe_sparse_dot(H.T, H_tmp) + identity(self.n_hidden) / self.C)


you should use ridge implementation here.

Hi @agramfort , isn't this technically ridge regression? I am minimizing the L2 norm of the coefficients in the objective function - like in the equation below. Or do you mean I should use scikit-learn implementation of ridge ? Thanks.

this does not look like ridge but you seem to do

(H'H + 1/C Id)^{-1} H'

and this is really a ridge solution where H is X and y is y and C = 1/alpha

Sorry, the equation I gave is for weighted ELMs as it contains the weight term W which is not part of ridge. However, the implementation contains both versions - with W and without W.
The version without W computes the formulae you mentioned, (H'H + 1/C Id)^{-1} H'y.
Thanks.

without w then it is a ridge

IssamLaradji · 2014-07-05T06:14:57Z

Pushed a lot of improvements.

Added sequential ELM support - with partial_fit
Added relevant tests for sequential ELM and weighted ELM

Created two examples.

Weighted ELM plot
Training vs. Testing with respect to hidden neurons

Will be leaving the documentation till the end - after I implement the remaining part which is kernel support and after the code is reviewed. Thanks.

agramfort · 2014-07-05T07:10:26Z

examples/neural_networks/plot_weighted_elm.py

+plot_decision_function(
+    clf_weightless, axes[0], 'ELM(class_weight=None, C=10e5)')
+plot_decision_function(
+    clf_weight_auto, axes[1], 'ELM(class_weight=\'auto\', C=10e5)')


rather than using ' use " to define the string : 'ELM(class_weight="auto", C=10e5)'

IssamLaradji · 2014-07-05T10:36:38Z

@agramfort thanks for your comments. I pushed the updated code.

IssamLaradji · 2014-07-13T08:58:48Z

Updates,

ELM is now using ridge-regression as off-the-shelf solver to compute its solutions.
Added support for kernels - linear, poly, rbf, sigmoid.
Is there a way we could reuse the fast, efficient SVM kernel methods?
Thanks.

larsmans · 2014-07-13T09:49:44Z

There are kernels in sklearn.metrics. The ones in sklearn.svm are buried deep down in the C++ code for LibSVM.

amueller · 2015-05-07T18:09:03Z

It would be good to have an empirical case where the partial fit actually helps. The incremental fitting is what really makes this code non-trivial. If this is helpful, I'd say merge this with renaming / reassignment of credit, and later refactor into Ridge.

If not, maybe just add a transformer and an example?

mblondel · 2015-05-07T18:41:33Z

For a stateless transformer, I presume the fit is mostly needed for input checking and setting the rng? The rng could be set in the first call to transform, although this might break your common tests. In any case, you can just call fit once outside of the for loop.

So, indeeded, the incremental fitting is useful in the n_features < n_hidden < n_samples regime. But this is the usual out-of-core learning setting: your features are too big so you need to build them on the fly and call partial fit on small batches. @agramfort had an example using polynomial features in his pydata talk :)

If we really want to go the estimator way (rather than the transformer way), there is actually a more elegant and concise way to solve the problem using conjugate gradient with a LinearOperator. This technique can be used to solve the system of linear equations without ever materializing the transformed features of size n_samples x n_hidden. This is because conjugate gradient only needs to compute products between the n_hidden x n_hidden matrix and a vector. This should be like 10 lines of code. See https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L63 for an example of how this works.

amueller · 2015-05-07T19:18:41Z

Yeah stateless transformers and the common tests don't work well together, they are currently manually excluded, and that is something we / I need to fix.
The problem with setting the random weights on the first transform is that this would break if people want to use a transformer object on two different datasets. n_features is usually inferred in fit and if you use another one in transform that is an error. I guess you could set it in transform unless it was set in fit, and if you explicitly want to use it on another dataset, you have to call fit. That is slightly magic, though.

amueller · 2015-05-07T19:25:25Z

For better ways to solve the problem: well it could be that n_features is small enough that you could fit n_samples x n_features into ram, but not n_samples x n_hidden. Not sure what typical n_features and n_hidden are.

You are the expert in solving linear problems, I am certainly not, so if there are smarter ways to solve this, then we should go for these.
I didn't mentor this GSoC, I just heard multiple times "this just needs a final review"..

mblondel · 2015-05-08T00:37:47Z

For better ways to solve the problem: well it could be that n_features is small enough that you could fit n_samples x n_features into ram, but not n_samples x n_hidden. Not sure what typical n_features and n_hidden are.

Sorry, when I was talking about generating features on the fly, I was referring to the features generated by the random projection + activation transformer. This is the same setting with polynomial features as well: your original features fit in memory but not the combination features obtained by PolynomialFeatures. But the principle is the same even if you start from your raw data, as long as the transformer used is stateless (e.g., FeatureHasher). In all cases, we loop over small batches of data, transform them and call partial_fit.

GaelVaroquaux · 2015-05-08T09:04:01Z

For partial fit pipelines, I would only support stateless transformers for the moment.

+1: let's tackle the simple things first.

mblondel · 2015-05-08T09:20:02Z

@amueller: What's the plan? Will you need to detect whether a transformer is stateless?

amueller · 2015-05-08T14:48:50Z

Did I say I have a plan? ;)
Three possible ways?

detect (I have no idea how)
annotate as stateless
via api: stateless transformers don't need to call fit. Then we don't need to call fit in partial_fit. This would also allow users to provide a statefull transformer fit on a subset of the data.

amueller · 2015-05-08T14:50:30Z

Can we decide what to do with this PR first?
@IssamLaradji put a lot of work into it, and it has been sitting around for way to long. If we feel that the algorithm isn't suited for a classifier / regressor class, we should see what we can salvage and add transformers / examples etc.

mblondel · 2015-05-08T15:01:17Z

+1 for a transformer on my side. Instead of using a pipeline of two transformers as I initially suggested, we can maybe create just one transformer that does the random projection and applies the activation functions. This should be fairly straightforward to implement. For the examples, showing how to do grid search with a pipeline would be nice. For the name, of the transformer, maybe RandomActivationTransformer?

amueller · 2015-05-08T15:03:30Z

Do you think the interative ridge regression here has value or are there better ways to partial_fit ridge regression?

mblondel · 2015-05-08T15:28:44Z

The idea of accumulating the n_hidden x n_hidden matrix is nice but this won't scale if n_hidden is large. If we implement a general partial_fit out of this algorithm, this will crash when people try it on high dimensional data like bag of words. We can add it and recommend not to use it when n_features is large. This would still be useful in some settings where n_samples is huge but n_features is reasonably small. For n_features large, I guess one should use SGD's partial_fit.

amueller · 2015-05-08T15:36:48Z

Ok. So lets do the transformer? @IssamLaradji do you want to do that?
Or do you think you don't have time?

I'm not sure about RandomActivationTransformer. Maybe NonlinearProjection though projection kinda means to a lower dim space. NonlinearRandomFeatures? RandomFeatures?

vene · 2015-05-08T15:43:08Z

I don't like RandomFeatures, it's way too generic. From the name I'd expect it to simply ignore X and return random features. Out of all the names here, it seems to me like RandomActivation is the most specific (it best conveys what the object does). (I'd remove the Transformer suffix).

jnothman · 2015-05-09T09:32:12Z

aside @amueller re stateless transformers: for this purpose, transformers
that depend only on the type or number of columns of the input should also
be acceptable, just to make things tricky!

On 9 May 2015 at 01:43, Vlad Niculae [email protected] wrote:

I don't like RandomFeatures, it's way too generic. From the name I'd
expect it to simply ignore X and return random features. Out of all the
names here, it seems to me like RandomActivation is the most specific (it
best conveys what the object does). (I'd remove the Transformer suffix).

—
Reply to this email directly or view it on GitHub
#3306 (comment)
.

IssamLaradji · 2015-05-09T10:16:05Z

@amueller yah sure! I can do the transformer.

So I will open a new pull request for this.

Should the file, containing the algorithm, be under the scikit-learn main directory ?
I mean, would it be something like from sklearn import RandomActivation ?

Would the parameters be something like,
-weight_scale; which sets the range of values for the uniform random sampling.

activation_function; which could be identity, relu, logistic and so on.

PS: I think for ridge regression there is also feature-wise batch support which scales with n_features rather than n_samples.

mblondel · 2015-05-09T10:22:03Z

One possible place would be the pre-processing module.

mblondel · 2015-05-09T10:23:50Z

Actually how about putting it in the neural_network module?

IssamLaradji · 2015-05-09T10:23:57Z

sounds good.

IssamLaradji · 2015-05-09T10:28:12Z

right, it's only used by neural network algorithms as far as I know, so having it in neural_network module is better imo.

IssamLaradji · 2015-05-11T13:51:38Z

#4703 this is a rough implementation of the RandomActivation algorithm

amueller · 2015-05-11T21:48:15Z

How about "RandomBasisFunction" as a name?

ekerazha · 2015-05-13T07:43:20Z

The decoupled approach was the same approach of https://github.com/dclambert/Python-ELM
You had a "random_layer" and you could also pipeline it before a Ridge regression.

Moreover, it also included a MELM-GRBF implementation.

mblondel · 2015-05-13T08:46:23Z

Thanks, I added the project to https://github.com/scikit-learn/scikit-learn/wiki/Third-party-projects-and-code-snippets

ProfFan · 2016-01-08T14:23:55Z

Just FYI:
Recently Anton Akusok et al. has implemented ELM in python with MAGMA-based acceleration, under the name of hpelm (pypi:https://pypi.python.org/pypi/hpelm).

agramfort · 2019-02-25T10:16:57Z

I don't think this will ever get merged. Closing. Feel free to reopen if you disagree.

sveitser mentioned this pull request Jun 23, 2014

Display Code Coverage on Github Pull Request Page Shippable/support#239

Closed

larsmans reviewed Jun 23, 2014
View reviewed changes

IssamLaradji mentioned this pull request Jun 30, 2014

[MRG] Generic multi layer perceptron #3204

Closed

4 tasks

agramfort reviewed Jul 2, 2014
View reviewed changes

agramfort reviewed Jul 5, 2014
View reviewed changes

jnothman mentioned this pull request May 11, 2015

[MRG] RandomActivation #4703

Closed

3 tasks

dchambers mentioned this pull request Aug 13, 2015

Code coverage demo BladeRunnerJS/fell#5

Merged

amueller added the Waiting for Reviewer label Dec 10, 2015

agramfort closed this Feb 25, 2019

Uh oh!

[MRG] GSoC 2014: Standard Extreme Learning Machines #3306

[MRG] GSoC 2014: Standard Extreme Learning Machines #3306

Uh oh!

Conversation

IssamLaradji commented Jun 22, 2014

Uh oh!

coveralls commented Jun 22, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IssamLaradji commented Jun 27, 2014

Uh oh!

IssamLaradji commented Jun 30, 2014

Uh oh!

arjoly commented Jun 30, 2014

Uh oh!

larsmans commented Jun 30, 2014

Uh oh!

coveralls commented Jun 30, 2014

Uh oh!

IssamLaradji commented Jul 1, 2014

Uh oh!

ogrisel commented Jul 1, 2014

Uh oh!

IssamLaradji commented Jul 2, 2014

Uh oh!

IssamLaradji commented Jul 2, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IssamLaradji commented Jul 5, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IssamLaradji commented Jul 5, 2014

Uh oh!

IssamLaradji commented Jul 13, 2014

Uh oh!

larsmans commented Jul 13, 2014

Uh oh!

amueller commented May 7, 2015

Uh oh!

mblondel commented May 7, 2015

Uh oh!

amueller commented May 7, 2015

Uh oh!

amueller commented May 7, 2015

Uh oh!

mblondel commented May 8, 2015

Uh oh!

GaelVaroquaux commented May 8, 2015

Uh oh!

mblondel commented May 8, 2015 via email

Uh oh!

amueller commented May 8, 2015

Uh oh!

amueller commented May 8, 2015

Uh oh!

mblondel commented May 8, 2015

Uh oh!

amueller commented May 8, 2015

Uh oh!

mblondel commented May 8, 2015

Uh oh!

amueller commented May 8, 2015

Uh oh!

vene commented May 8, 2015

Uh oh!

jnothman commented May 9, 2015