-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
PG-NMF performance is terrible #2537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Tolerance lowered to make it run in reasonable time; see #2537.
In fact on the topic extraction example, the "naive" algorithm is competitive with the PG one as well, despite it not being optimized. The Cython parts are needed to make PG run really faster. |
Weird. I remember the gradient based used to be slower when we decided not to include it. @vene do you agree? If so we could try to rerun that bench for old versions of scikit-learn. Maybe we introduced a performance regression somewhere. |
The fact that the PG algorithm has a default iteration maximum of 200_2000_19 might be part of the problem. |
PR with the Lee/Seung algo as an estimator coming up. Does SVD initialization make sense for that algo? It doesn't seem to speed it up. |
This is funny. I'm running the benchmark against the code in #2540 on an Intel CPU (previous tests were on an AMD one) and while GD NMF with random init is still much faster than PG-NMF with NNDSVD (w/ similar losses), PG with random init wins big time (w/ higher loss, though). Here's the full output:
|
FYI a colleague working with audio once told me that our init with svd for |
Yes we can, please do ;) |
I tried that before too :) |
Similar results on a similar (more complete) benchmark on "real" data: |
Thanks for the benchmarks @vene. On the dense face data it seems that PG with a non-random init is still interesting: even if much slower it is able to reach better reconstructions on this kind of data. But maybe the convergence criterion is too lax for the multiplicative updates method. On the text / sparse data on the other hand, multiplicative updates with random init is a clear winner. |
There's a benchmark script in
benchmarks/bench_plot_nmf.py
that compares our implementation Lin's projected gradient algorithm against Lee & Seung's old gradient descent algorithm, which it implements in 20 lines of code. Turns out that the PG algorithm is completely incapable of beating that baseline, regardless of initialization and despite all the recent optimizations to it. In fact, the baseline is typically faster by a significant margin.One problem with
ProjectedGradientNMF
is that its default tolerance is much too small. When I set the tolerance in the benchmark to a lower value, PG-NMF comes closer to the baseline and can beat it some of the time, but often enough the baseline is still faster than the fastest PG-NMF run.(I'd have included the plot here, but it seems to be broken and doesn't display the Lee/Seung algorithm's timings.)
I've already tried rewriting the PG algorithm in Cython, leaving only the dot products to Python. That shaves off about a quarter of the running time of
fit_transform
, not enough to make it really fast.The text was updated successfully, but these errors were encountered: