Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@scarlehoff
Copy link
Member

@scarlehoff scarlehoff commented Jun 1, 2021

As the title says, this enables using genrep=True when fitting models in parallel.

Since we already have the mechanism for generating more than one replica (with the replica range) I've changed the parallel_models: int to parallel_models: bool and the replicas are given with the replica range. In this way running in parallel is the same as running sequentially in terms of seeds and data.

The process is:

  1. Generate all replicas before starting to fit (just as it was done before when using -r)
  2. If parallel_models: True then all replicas are exactly the same (same trvl, same fktable, same invcovmat)
  3. Create an output which is a stack of all the replicas (so the output is (n_replicas, n_data) instead of (1, n_data))
  4. Fit normally (I left the output in PR Fit many replicas in parallel #1153 zs (1, n_data) so that this could be implemented easily and it worked out! :) )

For reviewers: this is implemented in the first commit in n3fit/src/n3fit/model_trainer.py . Most other changes are changes of variable names to use the plural form (like seed -> seeds) plus some typos I found (quite a few in checks.py)

Working on top of PR #1251 (which is smaller than this PR and the changes are even less impactful so please go to that one first)

A draft for now since I'd like to know how reproducible results are between running sequential / parallel / one by one and add that to the docs.

@scarlehoff scarlehoff marked this pull request as draft June 1, 2021 10:58
@scarlehoff scarlehoff added the run-fit-bot Starts fit bot from a PR. label Jun 1, 2021
@github-actions
Copy link

github-actions bot commented Jun 1, 2021

Greetings from your nice fit 🤖 !
I have good news for you, I just finished my tasks:

Check the report carefully, and please buy me a ☕ , or better, a GPU 😉!

@scarlehoff scarlehoff removed the run-fit-bot Starts fit bot from a PR. label Jun 1, 2021
@scarlehoff
Copy link
Member Author

Perfect also for full fits https://vp.nnpdf.science/h2_Zz2ySRU6BnqaOjgGiOg==/

(done all 100 replicas in ~3.5 hours in 2 GPUs, so about 5 minutes per replica per GPU, 11-12GB of memory each.)

@scarlehoff scarlehoff marked this pull request as ready for review June 14, 2021 08:28
@scarlehoff scarlehoff changed the title [WIP] Generate replicas when fitting models in parallel Generate replicas when fitting models in parallel Jun 14, 2021
@Radonirinaunimi
Copy link
Member

I have had a look into this and have been playing with it and did not notice any issues. On a single rtx 2060 with 6 GB of memory, it took about 10 hrs to perform the exact same fits, and the results are exactly the same as @scarlehoff reported. The changes LGTM and the comments I added both here and in #1251 are very minor (nothing conceptual).

@scarlehoff scarlehoff force-pushed the use_n3pdf_interface_with_members branch from 99000fb to 67a22b2 Compare June 23, 2021 12:36
@scarlehoff scarlehoff force-pushed the n3fit_replica_range_parallel branch from a510d7c to b6426c9 Compare June 23, 2021 12:56
@scarrazza scarrazza merged commit 78e45f0 into use_n3pdf_interface_with_members Jun 23, 2021
@scarrazza scarrazza deleted the n3fit_replica_range_parallel branch June 23, 2021 18:06
@Zaharid Zaharid added the enhancement New feature or request label Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants