Add dream.load_geant4_csv #203

jl-wynen · 2023-09-11T13:27:42Z

Fixes #177

Mostly mimics the structure produced by #199 for NeXus files.

I tested it with data_dream_HF_mil_closed_alldets_1e9.csv provided as part of the requirements page on Confluence. The older file used as part of the tutorial in Scipp here does not work as it does not contain a det ID coordinate. This is required to split the data into the separate instrument components.

Here is what the loaded data looks like:

There are some open questions:

Can the files contain event weight? If the test file is meant to contain events, then the detector is illuminated completely uniformly.
Can we publish the test data? If not, how do we test the loader?
Should this combine module, segment, and counter into a 'subsegment' as in Add dream.load_nexus for latest NeXus files #199?
Should we compute voxel positions from the event positions (average over events per voxel). As it stands, there is no voxel coordinate.
The NeXus loader produces a detector_number coord. We cannot do this here. Is this ok?
The loader in this PR makes a position coord for each event because that is needed by the instrument view and coord transforms. Should we drop {x,y,z}_pos? Or should we not produce position?

celinedurniak · 2023-09-13T23:51:06Z

Thanks for the implementation @jl-wynen .
Here are my answers to your final questions.

Can the files contain event weight? If the test file is meant to contain events, then the detector is illuminated completely uniformly.
Not with the current version of the file writer in GEANT4.
A voxel might be hit several times. But this will appear as a list of events whose coordinates will be contained within the boundaries of the detector voxels.

Can we publish the test data? If not, how do we test the loader?
The data can be published. But for testing, do you need a smaller file? I have one of 6.5MB

Should this combine module, segment, and counter into a 'subsegment' as in
Add dream.load_nexus for latest NeXus files #199?
Not at the moment.

Should we compute voxel positions from the event positions (average over events per voxel). As it stands, there is no voxel coordinate.
This is not required at the moment. But once the detector_numbers are added (see reply below), it will be easy to map detected events with the corresponding detectors’ voxels

The NeXus loader produces a detector_number coord. We cannot do this here. Is this ok?
The CSV files contain metadata about wires, strips, counts. So, not having the detector number is not an issue. And once the mapping in the NeXus file has been approved by ECDC, detector_number could be added to the “csv” loader.

_The loader in this PR makes a position coord for each event because that is needed by the instrument view and coord transforms. Should we drop {x,y,z}pos? Or should we not produce position?
I do not have any preferences. What is the convention used for the other instruments?

SimonHeybrock · 2023-09-14T07:06:13Z

src/ess/dream/io/geant4.py

+
+
+def _group(dg: sc.DataGroup) -> sc.DataGroup:
+    return dg.group('counter', 'segment', 'module', 'strip', 'wire')


I don't think this is the correct order? I thought "strip" should be innermost (indexing voxels along a wire). Not sure about the others either.

I chose the order based on the respective dim-lengths where the longest is on the inside. I'm happy to change it if it makes more sense to follow a physical order. @celinedurniak ?

I don't think wire is the longest actually, there are only 32 or so? Strips should be many more. Regardless, I think we should follow the logical order (I presume detector_number in NeXus files will also come in logical order)?

The number of strips and wires depends on the "detector bank"

Mantle: 256 strips, 32 wires

Endcap backward and forward: 16 strips, 16 wires.

HR: 32 strips, 16 wires

Strips are numbered along z axis. For the wires, it depends on the geometry (radially for the mantle and the endcaps). For the endcaps, there are also the SUMOs.

SimonHeybrock · 2023-09-14T07:07:23Z

src/ess/dream/io/geant4.py

+def _load_raw_events(filename: Union[str, os.PathLike]) -> sc.DataArray:
+    table = sc.io.load_csv(filename, sep='\t', header_parser='bracket', data_columns=[])
+    table = table.rename_dims(row='event')
+    return sc.DataArray(sc.ones(sizes=table.sizes), coords=table.coords)


@celinedurniak Should we add variances for the weights (all ones)?

@jl-wynen I think this is still missing?

src/ess/dream/io/geant4.py

SimonHeybrock · 2023-09-14T07:11:48Z

src/ess/dream/io/geant4.py

+    detector_groups: sc.DataArray, detector_id_name: str, detector_id: sc.Variable
+) -> Optional[sc.DataArray]:
+    try:
+        return detector_groups[detector_id_name, detector_id].value.copy()


Is this copy necessary? Later you anyway group by component names, so another copy will be made.

Nothing gets grouped after this. In the case of mantle and high res, the object returned here is returned to the user.
I can change it such that it doesn't copy the endcaps because they are concatenated anyway.

SimonHeybrock · 2023-09-14T07:15:02Z

src/ess/dream/io/geant4.py

+        if (det := _extract_detector(groups, detector_id_name, i)) is not None
+    ]
+    if endcaps_list:
+        endcaps = sc.concat(endcaps_list, data.dim)


concat (and thus copy) could be avoided by using bin instead of group above?

Instead of splitting the endcaps, and using concat to merge them again, based on:

MANTLE_DETECTOR_ID = sc.index(7) HIGH_RES_DETECTOR_ID = sc.index(8) ENDCAPS_DETECTOR_IDS = tuple(map(sc.index, (3, 4, 5, 6)))

Make bin edges as sc.array(dims=[detector_id_name], values=[3,7,8,9], unit=None_ or something like that? Then use groups = data.bins(edges).

... but to be honest I do not know if that is really faster. So I'd say: leave it as it is, until we have evidence saying otherwise.

SimonHeybrock · 2023-09-14T07:15:37Z

src/ess/dream/io/geant4.py

+        endcap_forward = endcaps[endcaps.coords['z_pos'] > sc.scalar(0, unit='mm')]
+        endcap_backward = endcaps[endcaps.coords['z_pos'] < sc.scalar(0, unit='mm')]


Boolean indexing might be quite slow, have you considered using bin?

No. And I would want to have a benchmark to see if this is actually faster because it needs multiple passes over the events, too.

Why does nit need multiple passes?

At first I thought because it needs to find min and max. But we can just use +/- Inf.
Then it still needs to bin + copy each bin.

SimonHeybrock · 2023-09-14T07:16:41Z

src/ess/dream/io/geant4.py

+    return {
+        key: val
+        for key, val in zip(
+            ('mantle', 'high_resolution', 'endcap_forward', 'endcap_backward'),
+            (mantle, high_res, endcap_forward, endcap_backward),
+        )
+        if val is not None
+    }


Seems simpler to just init the dict step by step above, avoiding, e.g., also the else for the endcap handling?

nvaytet · 2023-10-02T08:40:54Z

src/ess/dream/io/geant4.py

+        endcaps = endcaps.bin(
+            z_pos=sc.array(
+                dims=['z_pos'],
+                values=[-np.inf, 0.0, np.inf],


So this assumes that the origin is always in the middle of the detectors? i.e. the sample position. Is that always the case in Geant4?

In GEANT4, the origin is at the sample position

nvaytet · 2023-10-02T08:44:23Z

tests/dream/io/geant4_test.py

+        'mantle',
+        'high_resolution',
+        'endcap_forward',
+        'endcap_backward',


In a file that I got from @celinedurniak there were more entries than this: see #184 (comment)

Do we need to support both or is this now the new standard layout we will get?

I think the SUMOs in that file correspond to the endcaps. Is this correct, @celinedurniak ?

@jl-wynen you are correct. SUMOs are sub-divisions of the endcap detectors (i.e., the concentric partial rings) numbered from 3 to 6

nvaytet · 2023-10-02T08:45:26Z

src/ess/dream/io/geant4.py

+def _load_raw_events(filename: Union[str, os.PathLike]) -> sc.DataArray:
+    table = sc.io.load_csv(filename, sep='\t', header_parser='bracket', data_columns=[])
+    table = table.rename_dims(row='event')
+    return sc.DataArray(sc.ones(sizes=table.sizes), coords=table.coords)


@jl-wynen I think this is still missing?

jl-wynen requested review from SimonHeybrock and celinedurniak September 11, 2023 13:27

celinedurniak approved these changes Sep 13, 2023

View reviewed changes

SimonHeybrock reviewed Sep 14, 2023

View reviewed changes

jl-wynen mentioned this pull request Sep 15, 2023

Ditch mantid from CI #208

Merged

jl-wynen force-pushed the load_dream_csv branch from 1741819 to d396ba4 Compare September 15, 2023 12:16

jl-wynen added 8 commits September 27, 2023 15:19

Add dream.load_geant4_csv

89a3428

Skip detector if not in file

f16077c

Drop processed coords

f08447c

Group using known indices

f081b34

Avoid extra copy

fe85d36

Build dict piecewise

5bc9550

Use binning to split endcaps

4840594

Group into sectors as well

e0a95cc

jl-wynen force-pushed the load_dream_csv branch from cde2050 to e0a95cc Compare September 27, 2023 13:20

jl-wynen added 4 commits September 28, 2023 10:34

Drop sector coord when unused

fb5972d

Add data module for dream

a454f47

Add tests for dream.load_geant4_csv

c537be3

Add StringIO, BytesIO to type hint

5d714cb

jl-wynen marked this pull request as ready for review September 28, 2023 08:38

jl-wynen force-pushed the load_dream_csv branch from a5cf776 to f2411b3 Compare September 28, 2023 09:47

Add pandas to ci env

9b458bb

jl-wynen force-pushed the load_dream_csv branch from f2411b3 to 9b458bb Compare September 28, 2023 09:56

nvaytet reviewed Oct 2, 2023

View reviewed changes

jl-wynen and others added 2 commits October 2, 2023 12:54

Set variances and unit of events

cb010a5

Merge branch 'main' into load_dream_csv

d63d207

nvaytet approved these changes Oct 3, 2023

View reviewed changes

SimonHeybrock approved these changes Oct 3, 2023

View reviewed changes

jl-wynen merged commit 7025e17 into main Oct 3, 2023

jl-wynen deleted the load_dream_csv branch October 3, 2023 07:41

SimonHeybrock mentioned this pull request Nov 20, 2023

Dream instrument view #184

Merged



		def _group(dg: sc.DataGroup) -> sc.DataGroup:
		return dg.group('counter', 'segment', 'module', 'strip', 'wire')

		endcap_forward = endcaps[endcaps.coords['z_pos'] > sc.scalar(0, unit='mm')]
		endcap_backward = endcaps[endcaps.coords['z_pos'] < sc.scalar(0, unit='mm')]

Uh oh!

Add dream.load_geant4_csv #203

Add dream.load_geant4_csv #203

Uh oh!

Conversation

jl-wynen commented Sep 11, 2023

Uh oh!

celinedurniak commented Sep 13, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants