Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@scarlehoff
Copy link
Member

@scarlehoff scarlehoff commented Oct 5, 2021

I'm currently removing the libNNPDF dependencies and decided to start with RandomGenerator which seems the lowest hanging fruit as we now have make_replica which the fit now uses.

However some actions are still using NNPDF pseudodata (which is very bad because that means the pseudodata could perfectly be 100% different from the python version) either calling pseudodata o MakeReplica

These are:

  • n3fit_data_utils.py
  • chi2grids.py::computed_pseudorreplicas_chi2* @Zaharid
  • mc_gen.py::one_art_data_residuals
  • mc_gen.py::art_rep_generation
  • results.py::closure_pseudodata_replicas @Zaharid
    - [ ] filter.py

Sadly these functions have been mainly developed/touched for people who already left the collaboration so please @Zaharid @siranipour if you could have a close look or give pointers (maybe some functions can be totally removed?) they would be much appreciated.

Also, some of these doesn't seem to be used anywhere (computed_pseudorreplicas_chi2 for instance, I just tried removing it with no consequences for the test...) s

*that function clearly states #TODO: Everythning about this function is horrible. We need to rewrite and I would agree but I think making it use the python pseudodata is more pressing.

Edit: The RandomGenerator cannot be completely taken out from vp since until there is a python-only closure test. I've instead moved the import inside the appropriate function.

@scarlehoff
Copy link
Member Author

Also, some of these are not used anywhere that I can see (just tried removing computed_psedorreplicas_chi2 with no consequences in the test) and I would prefer not to spend any time porting a function that is there just because someone forgot to remove it... so please let me know if you know whether some of these functions are still being important.

@Zaharid
Copy link
Contributor

Zaharid commented Oct 5, 2021

computed_psedorreplicas_chi2 was used in the alpha s determination, and would be good to have some version of it (especially one that is efficient). The code in mc_gen.py was used for various studies on systematics, which might be picked again some day.
I think closure_pseudodata_replicas can be deleted safely. AFAICT it was only used by a master student very long ago.

@scarlehoff
Copy link
Member Author

computed_psedorreplicas_chi2 was used in the alpha s determination, and would be good to have some version of it (especially one that is efficient).

Could you give me some snippet of code that uses it? Otherwise testing it will be a nightmare.

@Zaharid
Copy link
Contributor

Zaharid commented Oct 6, 2021

Uh, there were a bunch of typo "fixes" that appear to have broken the alpha_s runcards.
This one should be working:

https://vp.nnpdf.science/xCa611EyTBazh0PLu7Lp0g==/

@Zaharid
Copy link
Contributor

Zaharid commented Oct 6, 2021

I also see these were not updated to use groups...

Anyhow, here a runcard that "works" after I make this change

diff --git a/validphys2/src/validphys/paramfits/dataops.py b/validphys2/src/validphys/paramfits/dataops.py
index fed61f8b2..773898e48 100644
--- a/validphys2/src/validphys/paramfits/dataops.py
+++ b/validphys2/src/validphys/paramfits/dataops.py
@@ -71,12 +71,12 @@ def get_parabola(asvals, chi2vals):
 #TODO: Export the total here. Not having it is causing huge pain elsewhere.
 @table
 @check_fits_different
-def fits_matched_pseudoreplicas_chi2_table(fits, fits_computed_pseudoreplicas_chi2):
+def fits_matched_pseudoreplicas_chi2_table(fits, fits_computed_psedorreplicas_chi2):
     """Collect the chi^2 of the pseudoreplicas in the fits a single table,
     groped by nnfit_id.
     The columns come in two levels, fit name and (total chi², n).
     The indexes also come in two levels: nnfit_id and experiment name."""
-    return pd.concat(fits_computed_pseudoreplicas_chi2, axis=1, keys=map(str,fits))
+    return pd.concat(fits_computed_psedorreplicas_chi2, axis=1, keys=map(str,fits))
fits:
    - NNPDF31_nnlo_as_0117_uncorr_s2

meta:
   author: Zahari Kassabov
   title: Pseudorreplica raw data for the second batch of proton only fits at NLO
   keywords: [as]

use_t0: False

use_cuts: True

experiments:
    from_: fit

theoryid: 53

fitting:
    from_: fit

dataseed:
    from_: fitting

datacuts:
    from_: fit

pdf:
    from_: fit

template_text: |
   

    {@fits_matched_pseudorreplicas_chi2_table@}

actions_:
    - - report:
           main: True

@scarlehoff
Copy link
Member Author

scarlehoff commented Oct 6, 2021

I'll fix the typos as I go. Thanks for the runcard, seem to work.

I've changed the runcard to

experiments:
  - experiment: NMC
    datasets:
      - {dataset: NMC}
  - experiment: SLAC
    datasets:
      - {dataset: SLACP}

So it doesn't take forever.

@scarlehoff
Copy link
Member Author

scarlehoff commented Oct 6, 2021

Since I don't need to care about pre-3.1 compatibility (#1405) and that we have a group mechanism that deprecates half of the function I've decided to redo computed_pseudoreplicas_chi2 to work with n3fit with no regards for backwards compatibility.

However, some terrible considerations

In order to generate the pseuodata I'm doing:

    from validphys.n3fit_data import replica_mcseed
    from validphys.pseudodata import make_replica
    all_data_replicas = []
    for replica in fitted_replica_indexes:
        value_of_mcseed = replica_mcseed(replica, mcseed, True)
        all_data_replicas.append(make_replica(dataset_inputs_loaded_cd_with_cuts, value_of_mcseed))
    r_data = np.array(all_data_replicas).T

where the input is

def computed_pseudoreplicas_chi2(
        mcseed, 
        dataset_inputs_loaded_cd_with_cuts,
        fitted_replica_indexes,
        ...

It looks like I should be able to have make_replica directly as the input without the loop but don't know how (I would need to tell it to take the replicas from fitted_replica_indexes which is the part I'm failing to do).

So, rather than reviewers, from this PR it would be nice if someone else would deal with massaging the new/ported actions to be vp-palatable otherwise I won't ever finish (in particular things like the above would be much faster for you @Zaharid to do than for me to ask -> find out how -> debug -> ask again -> etc...).

Also, a working runcard (a modification of yours) that can be used:

fits:
    - 210629-n3fit-001

meta:
   author: juacrumar
   title: Pseudorreplica raw data chi2 for NNPDF4.0
   keywords: [as]

use_t0: False

use_cuts: True

dataset_inputs:
  - {dataset: NMCPD_dw_ite}
  - {dataset: D0WMASY, cfac: [QCD]}

theoryid: 200
genrep:
    from: _fit

fitting:
    from_: fit

mcseed:
    from_: fit

datacuts:
    from_: fit

pdf:
    from_: fit

template_text: |
   

    {@fits_matched_pseudoreplicas_chi2_table@}

actions_:
  - report(main=True)

@scarlehoff
Copy link
Member Author

The code in mc_gen.py was used for various studies on systematics, which might be picked again some day.

I would actually drop mc_gen.py since it has not been used in a long time and a few bugs have crept in:

Probably studies that were done with it (looking at the example runcards) were done with BIGEXP and so these bugs never made an appearance.

Anyway, my version should work just the same with BIGEXP and will produce something that I think is reasonable when more than one group exist for the one that was actually used in tests. For the other one I have removed it since 1) it is broken 2) it's a functionality not included so no point in recreating the action.

@Zaharid
Copy link
Contributor

Zaharid commented Oct 6, 2021

@scarlehoff agreed. Probably it is better to do these things from scratch anyway.

@Zaharid
Copy link
Contributor

Zaharid commented Oct 6, 2021

Since I don't need to care about pre-3.1 compatibility (#1405) and that we have a group mechanism that deprecates half of the function I've decided to redo computed_pseudoreplicas_chi2 to work with n3fit with no regards for backwards compatibility.

However, some terrible considerations

In order to generate the pseuodata I'm doing:

    from validphys.n3fit_data import replica_mcseed
    from validphys.pseudodata import make_replica
    all_data_replicas = []
    for replica in fitted_replica_indexes:
        value_of_mcseed = replica_mcseed(replica, mcseed, True)
        all_data_replicas.append(make_replica(dataset_inputs_loaded_cd_with_cuts, value_of_mcseed))
    r_data = np.array(all_data_replicas).T

where the input is

def computed_pseudoreplicas_chi2(
        mcseed, 
        dataset_inputs_loaded_cd_with_cuts,
        fitted_replica_indexes,
        ...

It looks like I should be able to have make_replica directly as the input without the loop but don't know how (I would need to tell it to take the replicas from fitted_replica_indexes which is the part I'm failing to do).

I am not sure I understand the issue here: What is there to do other than perhaps refactoring the loop into its own provider?

@scarlehoff
Copy link
Member Author

scarlehoff commented Oct 6, 2021

But the loop exists (it's make_replica collected over a list of replicas) and the list of replicas exists as well (fitted_replica_indexes)

What I don't know how to do is how to collect make_replica over fitted_replica_indexes telling it to understand it as "replicas".

In particular I've tried doing:

     fitted_make_replicas = collect('make_replica', ('fitted_replica_indexes',))

But I get:

[ERROR]: Bad configuration encountered:
A parameter is required: fitted_replica_indexes.

(even if instead of a list I make it output a NSList with replica as the key).

I would like to be able the same thing I've done for mc_gen but for fitted_replica_indexes instead of replicas.

make_replicas = collect('make_replica', ('replicas',))

I guess I could make fitted_replica_indexes as a provider? But given that I have that list in the input at computed_pseudoreplicas_chi2 I don't understand why I cannot use it at the make_replicas level as well.

@Zaharid
Copy link
Contributor

Zaharid commented Oct 6, 2021

You can only collect over something that is known at "compile time", but not at "run time".

Actions (i.e. functions in provider modules), such as fitted_replica_indexes, are executed at run time, at the time when the graph is already known and so can't be used to build the graph (which is collect does, by adding a node to it for each namespace).

To do things at compile time we have production rules (i.e things defined in some Config class). These can e.g. output an NSList that can be collected over, but cannot use the output of actions.

The distinction between the two is fairly arbitrary, other than it is nice to have things that are slow and can be checked as actions so they can fail quickly or execute successfully.

In this case however I don't think we would gain much: The case where we want to work with single individual replicas is fairly niche so we can as well have the corresponding for loop in the code.

@scarlehoff
Copy link
Member Author

Well. There is a way, implementing an as_input() in MC PDFs with a "fitting replicas" or whatever key.

But if you are happy with the current form I am happy to leave it like this. It just looked "unvalidphy-sy" to me (so I thought it would look horrendous to everbody else)

I'll take out the silly comments, deal with filter.py and this will be ready from my side.

@scarlehoff
Copy link
Member Author

Uh, there's no MakeClosure or similar in python? For some reason I was convinced it was done at some point? (/cc @siranipour)

As a compromise I've moved the import inside the onyl function that will ever use RandomGenerator, which is only called in turn by setupfit.

@siranipour
Copy link
Contributor

But the loop exists (it's make_replica collected over a list of replicas) and the list of replicas exists as well (fitted_replica_indexes)

What I don't know how to do is how to collect make_replica over fitted_replica_indexes telling it to understand it as "replicas".

In particular I've tried doing:

     fitted_make_replicas = collect('make_replica', ('fitted_replica_indexes',))

I needed this functionality a few PRs ago, can you try:

fitted_make_rpelica = collect('make_replica', ('fitreplicas',))

which leverages the following production rule

def produce_fitreplicas(self, fit):
.

With regards to the MakeClosure, I don't believe we ever did do this. Or at least I didn't, are they not in filter.py somewhere?

@scarlehoff
Copy link
Member Author

Ah! Thank you! I can use indeed pdfreplicas in the way you mentioned. I knew the functionality was there already somewhere.

@scarlehoff scarlehoff marked this pull request as ready for review October 7, 2021 12:37
@scarlehoff
Copy link
Member Author

scarlehoff commented Oct 7, 2021

@siranipour @Zaharid to the best of my knowledge all pseudodata is now python-generated everywhere for all vp. Let me know if I missed something.

@Zaharid
Copy link
Contributor

Zaharid commented Oct 21, 2021

Overall this looks fine to me. Especially the part where it removes a lot of code.

@scarlehoff
Copy link
Member Author

Please approve and merge if this one does indeed look good.

Copy link
Contributor

@siranipour siranipour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see a lot of the code I found confusing is now gone

Copy link
Contributor

@siranipour siranipour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to merge once the test passes

@scarlehoff scarlehoff merged commit 117a9a2 into master Nov 12, 2021
@scarlehoff scarlehoff deleted the removing_cpp_pseudoreplicas branch November 12, 2021 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants