Thanks to visit codestin.com
Credit goes to github.com

Skip to content

PG-NMF performance is terrible #2537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
larsmans opened this issue Oct 20, 2013 · 11 comments
Closed

PG-NMF performance is terrible #2537

larsmans opened this issue Oct 20, 2013 · 11 comments

Comments

@larsmans
Copy link
Member

There's a benchmark script in benchmarks/bench_plot_nmf.py that compares our implementation Lin's projected gradient algorithm against Lee & Seung's old gradient descent algorithm, which it implements in 20 lines of code. Turns out that the PG algorithm is completely incapable of beating that baseline, regardless of initialization and despite all the recent optimizations to it. In fact, the baseline is typically faster by a significant margin.

One problem with ProjectedGradientNMF is that its default tolerance is much too small. When I set the tolerance in the benchmark to a lower value, PG-NMF comes closer to the baseline and can beat it some of the time, but often enough the baseline is still faster than the fastest PG-NMF run.

(I'd have included the plot here, but it seems to be broken and doesn't display the Lee/Seung algorithm's timings.)

I've already tried rewriting the PG algorithm in Cython, leaving only the dot products to Python. That shaves off about a quarter of the running time of fit_transform, not enough to make it really fast.

larsmans added a commit that referenced this issue Oct 20, 2013
Tolerance lowered to make it run in reasonable time; see #2537.
@larsmans
Copy link
Member Author

In fact on the topic extraction example, the "naive" algorithm is competitive with the PG one as well, despite it not being optimized. The Cython parts are needed to make PG run really faster.

@ogrisel
Copy link
Member

ogrisel commented Oct 20, 2013

Weird. I remember the gradient based used to be slower when we decided not to include it. @vene do you agree? If so we could try to rerun that bench for old versions of scikit-learn. Maybe we introduced a performance regression somewhere.

@larsmans
Copy link
Member Author

The fact that the PG algorithm has a default iteration maximum of 200_2000_19 might be part of the problem.

@larsmans
Copy link
Member Author

PR with the Lee/Seung algo as an estimator coming up. Does SVD initialization make sense for that algo? It doesn't seem to speed it up.

@larsmans
Copy link
Member Author

#2540

@larsmans
Copy link
Member Author

This is funny. I'm running the benchmark against the code in #2540 on an Intel CPU (previous tests were on an AMD one) and while GD NMF with random init is still much faster than PG-NMF with NNDSVD (w/ similar losses), PG with random init wins big time (w/ higher loss, though). Here's the full output:

50 samples, 50 features
=======================
benchmarking nndsvd-nmf: 
/home/larsb/src/scikit-learn/sklearn/decomposition/nmf.py:678: UserWarning: Iteration limit reached during fit
  warnings.warn("Iteration limit reached during fit")
Frobenius loss: 1.26653
Took: 1.17s

benchmarking nndsvda-nmf: 
Frobenius loss: 1.30022
Took: 0.74s

benchmarking nndsvdar-nmf: 
Frobenius loss: 1.26733
Took: 1.43s

benchmarking random-nmf
Frobenius loss: 1.97622
Took: 0.01s

benchmarking alt-random-nmf
Frobenius loss: 1.36531
Took: 0.06s

50 samples, 275 features
=======================
benchmarking nndsvd-nmf: 
Frobenius loss: 1.87179
Took: 2.83s

benchmarking nndsvda-nmf: 
Frobenius loss: 1.87149
Took: 0.86s

benchmarking nndsvdar-nmf: 
Frobenius loss: 1.87186
Took: 2.85s

benchmarking random-nmf
Frobenius loss: 2.83175
Took: 0.02s

benchmarking alt-random-nmf
Frobenius loss: 1.92951
Took: 0.16s

50 samples, 500 features
=======================
benchmarking nndsvd-nmf: 
Frobenius loss: 1.96151
Took: 7.74s

benchmarking nndsvda-nmf: 
Frobenius loss: 1.95967
Took: 3.20s

benchmarking nndsvdar-nmf: 
Frobenius loss: 1.96127
Took: 8.30s

benchmarking random-nmf
Frobenius loss: 3.42233
Took: 0.03s

benchmarking alt-random-nmf
Frobenius loss: 2.00164
Took: 0.66s

275 samples, 50 features
=======================
benchmarking nndsvd-nmf: 
Frobenius loss: 1.86894
Took: 3.08s

benchmarking nndsvda-nmf: 
Frobenius loss: 1.86501
Took: 2.17s

benchmarking nndsvdar-nmf: 
Frobenius loss: 1.86778
Took: 3.05s

benchmarking random-nmf
Frobenius loss: 2.63858
Took: 0.02s

benchmarking alt-random-nmf
Frobenius loss: 1.92583
Took: 0.17s

275 samples, 275 features
=======================
benchmarking nndsvd-nmf: 
Frobenius loss: 3.21209
Took: 6.39s

benchmarking nndsvda-nmf: 
Frobenius loss: 3.21114
Took: 6.68s

benchmarking nndsvdar-nmf: 
Frobenius loss: 3.21206
Took: 6.01s

benchmarking random-nmf
Frobenius loss: 3.73968
Took: 0.07s

benchmarking alt-random-nmf
Frobenius loss: 3.25365
Took: 1.34s

275 samples, 500 features
=======================
benchmarking nndsvd-nmf: 
Frobenius loss: 3.33038
Took: 8.78s

benchmarking nndsvda-nmf: 
Frobenius loss: 3.32915
Took: 14.50s

benchmarking nndsvdar-nmf: 
Frobenius loss: 3.33014
Took: 9.39s

benchmarking random-nmf
Frobenius loss: 3.82807
Took: 0.08s

benchmarking alt-random-nmf
Frobenius loss: 3.37307
Took: 2.03s

500 samples, 50 features
=======================
benchmarking nndsvd-nmf: 
Frobenius loss: 1.96108
Took: 7.17s

benchmarking nndsvda-nmf: 
Frobenius loss: 1.95591
Took: 6.38s

benchmarking nndsvdar-nmf: 
Frobenius loss: 1.96156
Took: 6.61s

benchmarking random-nmf
Frobenius loss: 2.69132
Took: 0.08s

benchmarking alt-random-nmf
Frobenius loss: 2.03582
Took: 0.74s

500 samples, 275 features
=======================
benchmarking nndsvd-nmf: 
Frobenius loss: 3.33577
Took: 10.62s

benchmarking nndsvda-nmf: 
Frobenius loss: 3.33561
Took: 13.94s

benchmarking nndsvdar-nmf: 
Frobenius loss: 3.33573
Took: 10.92s

benchmarking random-nmf
Frobenius loss: 3.78364
Took: 0.13s

benchmarking alt-random-nmf
Frobenius loss: 3.37690
Took: 1.86s

500 samples, 500 features
=======================
benchmarking nndsvd-nmf: 
Frobenius loss: 3.53813
Took: 21.15s

benchmarking nndsvda-nmf: 
Frobenius loss: 3.53700
Took: 28.33s

benchmarking nndsvdar-nmf: 
Frobenius loss: 3.53766
Took: 22.97s

benchmarking random-nmf
Frobenius loss: 3.98320
Took: 0.07s

benchmarking alt-random-nmf
Frobenius loss: 3.57509
Took: 2.22s

@agramfort
Copy link
Member

FYI a colleague working with audio once told me that our init with svd for
NMF was a bad idea... Can we try on the faces and visually check the result?

@larsmans
Copy link
Member Author

Yes we can, please do ;)

@agramfort
Copy link
Member

Yes we can, please do ;)

I tried that before too :)

@vene
Copy link
Member

vene commented Oct 28, 2013

Similar results on a similar (more complete) benchmark on "real" data:
Results on 20 newsgroups
Results on faces
It looks like PG with random init messes up the convergence criterion. Ignoring regularization, multiplicative NMF seems to be a strong winner in this scenario.

@ogrisel
Copy link
Member

ogrisel commented Oct 28, 2013

Thanks for the benchmarks @vene. On the dense face data it seems that PG with a non-random init is still interesting: even if much slower it is able to reach better reconstructions on this kind of data. But maybe the convergence criterion is too lax for the multiplicative updates method.

On the text / sparse data on the other hand, multiplicative updates with random init is a clear winner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants