Changing functions which are still using NNPDF pseudodata #1424

scarlehoff · 2021-10-05T14:06:01Z

I'm currently removing the libNNPDF dependencies and decided to start with RandomGenerator which seems the lowest hanging fruit as we now have make_replica which the fit now uses.

However some actions are still using NNPDF pseudodata (which is very bad because that means the pseudodata could perfectly be 100% different from the python version) either calling pseudodata o MakeReplica

These are:

n3fit_data_utils.py
chi2grids.py::computed_pseudorreplicas_chi2* @Zaharid
mc_gen.py::one_art_data_residuals
mc_gen.py::art_rep_generation
results.py::closure_pseudodata_replicas @Zaharid
~~- [ ] filter.py~~

Sadly these functions have been mainly developed/touched for people who already left the collaboration so please @Zaharid @siranipour if you could have a close look or give pointers (maybe some functions can be totally removed?) they would be much appreciated.

Also, some of these doesn't seem to be used anywhere (computed_pseudorreplicas_chi2 for instance, I just tried removing it with no consequences for the test...) s

*that function clearly states #TODO: Everythning about this function is horrible. We need to rewrite and I would agree but I think making it use the python pseudodata is more pressing.

Edit: The RandomGenerator cannot be completely taken out from vp since until there is a python-only closure test. I've instead moved the import inside the appropriate function.

scarlehoff · 2021-10-05T14:29:32Z

Also, some of these are not used anywhere that I can see (just tried removing computed_psedorreplicas_chi2 with no consequences in the test) and I would prefer not to spend any time porting a function that is there just because someone forgot to remove it... so please let me know if you know whether some of these functions are still being important.

Zaharid · 2021-10-05T15:23:07Z

computed_psedorreplicas_chi2 was used in the alpha s determination, and would be good to have some version of it (especially one that is efficient). The code in mc_gen.py was used for various studies on systematics, which might be picked again some day.
I think closure_pseudodata_replicas can be deleted safely. AFAICT it was only used by a master student very long ago.

scarlehoff · 2021-10-05T22:12:06Z

computed_psedorreplicas_chi2 was used in the alpha s determination, and would be good to have some version of it (especially one that is efficient).

Could you give me some snippet of code that uses it? Otherwise testing it will be a nightmare.

Zaharid · 2021-10-06T07:05:48Z

Uh, there were a bunch of typo "fixes" that appear to have broken the alpha_s runcards.
This one should be working:

https://vp.nnpdf.science/xCa611EyTBazh0PLu7Lp0g==/

Zaharid · 2021-10-06T07:15:24Z

I also see these were not updated to use groups...

Anyhow, here a runcard that "works" after I make this change

diff --git a/validphys2/src/validphys/paramfits/dataops.py b/validphys2/src/validphys/paramfits/dataops.py
index fed61f8b2..773898e48 100644
--- a/validphys2/src/validphys/paramfits/dataops.py
+++ b/validphys2/src/validphys/paramfits/dataops.py
@@ -71,12 +71,12 @@ def get_parabola(asvals, chi2vals):
 #TODO: Export the total here. Not having it is causing huge pain elsewhere.
 @table
 @check_fits_different
-def fits_matched_pseudoreplicas_chi2_table(fits, fits_computed_pseudoreplicas_chi2):
+def fits_matched_pseudoreplicas_chi2_table(fits, fits_computed_psedorreplicas_chi2):
     """Collect the chi^2 of the pseudoreplicas in the fits a single table,
     groped by nnfit_id.
     The columns come in two levels, fit name and (total chi², n).
     The indexes also come in two levels: nnfit_id and experiment name."""
-    return pd.concat(fits_computed_pseudoreplicas_chi2, axis=1, keys=map(str,fits))
+    return pd.concat(fits_computed_psedorreplicas_chi2, axis=1, keys=map(str,fits))

fits:
    - NNPDF31_nnlo_as_0117_uncorr_s2

meta:
   author: Zahari Kassabov
   title: Pseudorreplica raw data for the second batch of proton only fits at NLO
   keywords: [as]

use_t0: False

use_cuts: True

experiments:
    from_: fit

theoryid: 53

fitting:
    from_: fit

dataseed:
    from_: fitting

datacuts:
    from_: fit

pdf:
    from_: fit

template_text: |
   

    {@fits_matched_pseudorreplicas_chi2_table@}

actions_:
    - - report:
           main: True

scarlehoff · 2021-10-06T07:37:30Z

I'll fix the typos as I go. Thanks for the runcard, seem to work.

I've changed the runcard to

experiments:
  - experiment: NMC
    datasets:
      - {dataset: NMC}
  - experiment: SLAC
    datasets:
      - {dataset: SLACP}

So it doesn't take forever.

scarlehoff · 2021-10-06T12:04:09Z

Since I don't need to care about pre-3.1 compatibility (#1405) and that we have a group mechanism that deprecates half of the function I've decided to redo computed_pseudoreplicas_chi2 to work with n3fit with no regards for backwards compatibility.

However, some terrible considerations

In order to generate the pseuodata I'm doing:

    from validphys.n3fit_data import replica_mcseed
    from validphys.pseudodata import make_replica
    all_data_replicas = []
    for replica in fitted_replica_indexes:
        value_of_mcseed = replica_mcseed(replica, mcseed, True)
        all_data_replicas.append(make_replica(dataset_inputs_loaded_cd_with_cuts, value_of_mcseed))
    r_data = np.array(all_data_replicas).T

where the input is

def computed_pseudoreplicas_chi2(
        mcseed, 
        dataset_inputs_loaded_cd_with_cuts,
        fitted_replica_indexes,
        ...

It looks like I should be able to have make_replica directly as the input without the loop but don't know how (I would need to tell it to take the replicas from fitted_replica_indexes which is the part I'm failing to do).

So, rather than reviewers, from this PR it would be nice if someone else would deal with massaging the new/ported actions to be vp-palatable otherwise I won't ever finish (in particular things like the above would be much faster for you @Zaharid to do than for me to ask -> find out how -> debug -> ask again -> etc...).

Also, a working runcard (a modification of yours) that can be used:

fits:
    - 210629-n3fit-001

meta:
   author: juacrumar
   title: Pseudorreplica raw data chi2 for NNPDF4.0
   keywords: [as]

use_t0: False

use_cuts: True

dataset_inputs:
  - {dataset: NMCPD_dw_ite}
  - {dataset: D0WMASY, cfac: [QCD]}

theoryid: 200
genrep:
    from: _fit

fitting:
    from_: fit

mcseed:
    from_: fit

datacuts:
    from_: fit

pdf:
    from_: fit

template_text: |
   

    {@fits_matched_pseudoreplicas_chi2_table@}

actions_:
  - report(main=True)

scarlehoff · 2021-10-06T15:44:02Z

The code in mc_gen.py was used for various studies on systematics, which might be picked again some day.

I would actually drop mc_gen.py since it has not been used in a long time and a few bugs have crept in:

here only the first group is considered (the return is in the wrong level)

nnpdf/validphys2/src/validphys/mc_gen.py

Line 88 in 4c27de4

return real_data, art_replicas, normart_replicas, art_data
here only the last is (the variables are created inside the loop):

nnpdf/validphys2/src/validphys/mc_gen.py

Line 223 in 4c27de4

residual = one_art_data-real_data[one_data_index]

Probably studies that were done with it (looking at the example runcards) were done with BIGEXP and so these bugs never made an appearance.

Anyway, my version should work just the same with BIGEXP and will produce something that I think is reasonable when more than one group exist for the one that was actually used in tests. For the other one I have removed it since 1) it is broken 2) it's a functionality not included so no point in recreating the action.

Zaharid · 2021-10-06T15:47:00Z

@scarlehoff agreed. Probably it is better to do these things from scratch anyway.

Zaharid · 2021-10-06T15:57:12Z

Since I don't need to care about pre-3.1 compatibility (#1405) and that we have a group mechanism that deprecates half of the function I've decided to redo computed_pseudoreplicas_chi2 to work with n3fit with no regards for backwards compatibility.

However, some terrible considerations

In order to generate the pseuodata I'm doing:
    from validphys.n3fit_data import replica_mcseed
    from validphys.pseudodata import make_replica
    all_data_replicas = []
    for replica in fitted_replica_indexes:
        value_of_mcseed = replica_mcseed(replica, mcseed, True)
        all_data_replicas.append(make_replica(dataset_inputs_loaded_cd_with_cuts, value_of_mcseed))
    r_data = np.array(all_data_replicas).T
where the input is
def computed_pseudoreplicas_chi2(
        mcseed, 
        dataset_inputs_loaded_cd_with_cuts,
        fitted_replica_indexes,
        ...
It looks like I should be able to have make_replica directly as the input without the loop but don't know how (I would need to tell it to take the replicas from fitted_replica_indexes which is the part I'm failing to do).

I am not sure I understand the issue here: What is there to do other than perhaps refactoring the loop into its own provider?

scarlehoff · 2021-10-06T16:21:17Z

But the loop exists (it's make_replica collected over a list of replicas) and the list of replicas exists as well (fitted_replica_indexes)

What I don't know how to do is how to collect make_replica over fitted_replica_indexes telling it to understand it as "replicas".

In particular I've tried doing:

     fitted_make_replicas = collect('make_replica', ('fitted_replica_indexes',))

But I get:

[ERROR]: Bad configuration encountered:
A parameter is required: fitted_replica_indexes.

(even if instead of a list I make it output a NSList with replica as the key).

I would like to be able the same thing I've done for mc_gen but for fitted_replica_indexes instead of replicas.

make_replicas = collect('make_replica', ('replicas',))

I guess I could make fitted_replica_indexes as a provider? But given that I have that list in the input at computed_pseudoreplicas_chi2 I don't understand why I cannot use it at the make_replicas level as well.

Zaharid · 2021-10-06T16:37:58Z

You can only collect over something that is known at "compile time", but not at "run time".

Actions (i.e. functions in provider modules), such as fitted_replica_indexes, are executed at run time, at the time when the graph is already known and so can't be used to build the graph (which is collect does, by adding a node to it for each namespace).

To do things at compile time we have production rules (i.e things defined in some Config class). These can e.g. output an NSList that can be collected over, but cannot use the output of actions.

The distinction between the two is fairly arbitrary, other than it is nice to have things that are slow and can be checked as actions so they can fail quickly or execute successfully.

In this case however I don't think we would gain much: The case where we want to work with single individual replicas is fairly niche so we can as well have the corresponding for loop in the code.

scarlehoff · 2021-10-06T16:46:09Z

Well. There is a way, implementing an as_input() in MC PDFs with a "fitting replicas" or whatever key.

But if you are happy with the current form I am happy to leave it like this. It just looked "unvalidphy-sy" to me (so I thought it would look horrendous to everbody else)

I'll take out the silly comments, deal with filter.py and this will be ready from my side.

scarlehoff · 2021-10-07T10:45:48Z

Uh, there's no MakeClosure or similar in python? For some reason I was convinced it was done at some point? (/cc @siranipour)

As a compromise I've moved the import inside the onyl function that will ever use RandomGenerator, which is only called in turn by setupfit.

siranipour · 2021-10-07T11:25:42Z

But the loop exists (it's make_replica collected over a list of replicas) and the list of replicas exists as well (fitted_replica_indexes)

What I don't know how to do is how to collect make_replica over fitted_replica_indexes telling it to understand it as "replicas".

In particular I've tried doing:
     fitted_make_replicas = collect('make_replica', ('fitted_replica_indexes',))

I needed this functionality a few PRs ago, can you try:

fitted_make_rpelica = collect('make_replica', ('fitreplicas',))

which leverages the following production rule

nnpdf/validphys2/src/validphys/config.py

Line 241 in 4c27de4

def produce_fitreplicas(self, fit):

.

With regards to the MakeClosure, I don't believe we ever did do this. Or at least I didn't, are they not in filter.py somewhere?

scarlehoff · 2021-10-07T12:37:23Z

Ah! Thank you! I can use indeed pdfreplicas in the way you mentioned. I knew the functionality was there already somewhere.

scarlehoff · 2021-10-07T12:40:48Z

@siranipour @Zaharid to the best of my knowledge all pseudodata is now python-generated everywhere for all vp. Let me know if I missed something.

validphys2/src/validphys/filters.py

Co-authored-by: siranipour <[email protected]>

validphys2/examples/mc_gen_example.yaml

validphys2/src/validphys/chi2grids.py

Zaharid · 2021-10-21T17:12:24Z

Overall this looks fine to me. Especially the part where it removes a lot of code.

scarlehoff · 2021-11-10T18:30:14Z

Please approve and merge if this one does indeed look good.

siranipour

Nice to see a lot of the code I found confusing is now gone

validphys2/src/validphys/config.py

validphys2/src/validphys/filters.py

validphys2/src/validphys/config.py

validphys2/src/validphys/chi2grids.py

siranipour

Happy to merge once the test passes

removing unused n3fit code

8584111

scarlehoff added the destroyingc++ label Oct 5, 2021

remove computed_psedorreplicas_chi2

f5ab2cd

free computed_pseudorreplicas_chi2 from NNPDF

4af0923

scarlehoff added 2 commits October 6, 2021 14:05

remove deprecated function in results.py

7480d7f

liberate mc_gen.py from libNNPDF

51e1c0c

add a warning in filter

a3b59a8

add fitted make replicas

08d0eb7

scarlehoff marked this pull request as ready for review October 7, 2021 12:37

scarlehoff requested review from Zaharid and siranipour October 7, 2021 12:39

siranipour reviewed Oct 7, 2021

View reviewed changes

validphys2/src/validphys/filters.py Outdated Show resolved Hide resolved

Update validphys2/src/validphys/filters.py

623a21f

Co-authored-by: siranipour <[email protected]>

Zaharid reviewed Oct 8, 2021

View reviewed changes

validphys2/examples/mc_gen_example.yaml Outdated Show resolved Hide resolved

use example resource

8dd917c

This was referenced Oct 12, 2021

Update all example resources #1432

Closed

Change ThPredictions to python predictions #1430

Merged

Zaharid reviewed Oct 21, 2021

View reviewed changes

validphys2/src/validphys/chi2grids.py Outdated Show resolved Hide resolved

scarlehoff and others added 3 commits October 22, 2021 13:56

remove unnecesary variable

b15dd54

Merge branch 'master' into removing_cpp_pseudoreplicas

41536c3

update test

bac91da

siranipour reviewed Nov 12, 2021

View reviewed changes

validphys2/src/validphys/config.py Show resolved Hide resolved

validphys2/src/validphys/filters.py Show resolved Hide resolved

validphys2/src/validphys/config.py Outdated Show resolved Hide resolved

validphys2/src/validphys/chi2grids.py Show resolved Hide resolved

remove arange

38bd7f4

siranipour approved these changes Nov 12, 2021

View reviewed changes

scarlehoff merged commit 117a9a2 into master Nov 12, 2021

scarlehoff deleted the removing_cpp_pseudoreplicas branch November 12, 2021 13:05

Zaharid added the Refactoring label May 12, 2022

Changing functions which are still using NNPDF pseudodata #1424

Changing functions which are still using NNPDF pseudodata #1424

Uh oh!

Conversation

scarlehoff commented Oct 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scarlehoff commented Oct 5, 2021

Uh oh!

Zaharid commented Oct 5, 2021

Uh oh!

scarlehoff commented Oct 5, 2021

Uh oh!

Zaharid commented Oct 6, 2021

Uh oh!

Zaharid commented Oct 6, 2021

Uh oh!

scarlehoff commented Oct 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scarlehoff commented Oct 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scarlehoff commented Oct 6, 2021

Uh oh!

Zaharid commented Oct 6, 2021

Uh oh!

Zaharid commented Oct 6, 2021

Uh oh!

scarlehoff commented Oct 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zaharid commented Oct 6, 2021

Uh oh!

scarlehoff commented Oct 6, 2021

Uh oh!

scarlehoff commented Oct 7, 2021

Uh oh!

siranipour commented Oct 7, 2021

Uh oh!

scarlehoff commented Oct 7, 2021

Uh oh!

scarlehoff commented Oct 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Zaharid commented Oct 21, 2021

Uh oh!

scarlehoff commented Nov 10, 2021

Uh oh!

siranipour left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

siranipour left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

scarlehoff commented Oct 5, 2021 •

edited

Loading

scarlehoff commented Oct 6, 2021 •

edited

Loading

scarlehoff commented Oct 6, 2021 •

edited

Loading

scarlehoff commented Oct 6, 2021 •

edited

Loading

scarlehoff commented Oct 7, 2021 •

edited

Loading