Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@philkr
Copy link
Contributor

@philkr philkr commented Jan 5, 2016

It has bothered me for a while that ADAM is quite a bit slower than SGD. This PR speeds up the GPU implementation of ADAM (and if desired I can do the same for other solvers too).

I created a simple python benchmark for solvers, using random input data and a couple of large inner products: solver_bench.zip. Before this PR:

Benchmarking forward              ...      11.96 ms / it
Benchmarking backward             ...       0.32 ms / it
Benchmarking forward-backward     ...      15.98 ms / it
Benchmarking SGD-solver           ...      25.17 ms / it
Benchmarking Adam-solver          ...      40.01 ms / it
Benchmarking AdaDelta-solver      ...      50.09 ms / it
Benchmarking AdaGrad-solver       ...      35.00 ms / it
Benchmarking Nesterov-solver      ...      30.95 ms / it
Benchmarking RMSProp-solver       ...      36.77 ms / it

After this PR:

...
Benchmarking SGD-solver           ...      21.98 ms / it
Benchmarking Adam-solver          ...      23.97 ms / it
Benchmarking AdaDelta-solver      ...      24.00 ms / it
Benchmarking AdaGrad-solver       ...      21.98 ms / it
Benchmarking Nesterov-solver      ...      21.98 ms / it
Benchmarking RMSProp-solver       ...      21.99 ms / it

@philkr philkr added the JD label Jan 5, 2016
@jeffdonahue
Copy link
Contributor

Thanks @philkr, this is a nice simple speedup for GPU training, and passes the TestGradientBasedSolver checks indicating all the solvers should work as they did before besides the performance improvement.

Looks like Travis is failing due to some minor lint issues. Also, could you CamelCase the new GPU kernel names and remove the _device suffixes to match existing ones (e.g. adadelta_update_device -> AdaDeltaUpdate)? I can merge this after those fixes.

@philkr
Copy link
Contributor Author

philkr commented Jan 6, 2016

Lint and CamelCase should be fixed now, and also squashed into a single commit.

@philkr
Copy link
Contributor Author

philkr commented Jan 6, 2016

ok, SGD is now slightly faster too (about 10%). Still don't quite understand why this worked, but I guess it has to do with minimizing the number of kernel calls (2 before, 1 now). Ran all tests again and passed.

@jeffdonahue
Copy link
Contributor

Sweet, thanks again!

jeffdonahue added a commit that referenced this pull request Jan 6, 2016
Speeding up ADAM (and maybe other solvers too)
@jeffdonahue jeffdonahue merged commit b23e9b1 into BVLC:master Jan 6, 2016
@philkr philkr deleted the faster_solver branch January 15, 2016 18:26
@seanbell seanbell mentioned this pull request Mar 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants