Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jl-wynen
Copy link
Member

Fixes #177

Mostly mimics the structure produced by #199 for NeXus files.

I tested it with data_dream_HF_mil_closed_alldets_1e9.csv provided as part of the requirements page on Confluence. The older file used as part of the tutorial in Scipp here does not work as it does not contain a det ID coordinate. This is required to split the data into the separate instrument components.

Here is what the loaded data looks like:

load_geant4_csv

There are some open questions:

  • Can the files contain event weight? If the test file is meant to contain events, then the detector is illuminated completely uniformly.
  • Can we publish the test data? If not, how do we test the loader?
  • Should this combine module, segment, and counter into a 'subsegment' as in Add dream.load_nexus for latest NeXus files #199?
  • Should we compute voxel positions from the event positions (average over events per voxel). As it stands, there is no voxel coordinate.
  • The NeXus loader produces a detector_number coord. We cannot do this here. Is this ok?
  • The loader in this PR makes a position coord for each event because that is needed by the instrument view and coord transforms. Should we drop {x,y,z}_pos? Or should we not produce position?

@celinedurniak
Copy link
Collaborator

Thanks for the implementation @jl-wynen .
Here are my answers to your final questions.

Can the files contain event weight? If the test file is meant to contain events, then the detector is illuminated completely uniformly.
Not with the current version of the file writer in GEANT4.
A voxel might be hit several times. But this will appear as a list of events whose coordinates will be contained within the boundaries of the detector voxels.

Can we publish the test data? If not, how do we test the loader?
The data can be published. But for testing, do you need a smaller file? I have one of 6.5MB

Should this combine module, segment, and counter into a 'subsegment' as in
Add dream.load_nexus for latest NeXus files #199?

Not at the moment.

Should we compute voxel positions from the event positions (average over events per voxel). As it stands, there is no voxel coordinate.
This is not required at the moment. But once the detector_numbers are added (see reply below), it will be easy to map detected events with the corresponding detectors’ voxels

The NeXus loader produces a detector_number coord. We cannot do this here. Is this ok?
The CSV files contain metadata about wires, strips, counts. So, not having the detector number is not an issue. And once the mapping in the NeXus file has been approved by ECDC, detector_number could be added to the “csv” loader.

_The loader in this PR makes a position coord for each event because that is needed by the instrument view and coord transforms. Should we drop {x,y,z}pos? Or should we not produce position?
I do not have any preferences. What is the convention used for the other instruments?



def _group(dg: sc.DataGroup) -> sc.DataGroup:
return dg.group('counter', 'segment', 'module', 'strip', 'wire')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the correct order? I thought "strip" should be innermost (indexing voxels along a wire). Not sure about the others either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose the order based on the respective dim-lengths where the longest is on the inside. I'm happy to change it if it makes more sense to follow a physical order. @celinedurniak ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think wire is the longest actually, there are only 32 or so? Strips should be many more. Regardless, I think we should follow the logical order (I presume detector_number in NeXus files will also come in logical order)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of strips and wires depends on the "detector bank"

  • Mantle: 256 strips, 32 wires
  • Endcap backward and forward: 16 strips, 16 wires.
  • HR: 32 strips, 16 wires

Strips are numbered along z axis. For the wires, it depends on the geometry (radially for the mantle and the endcaps). For the endcaps, there are also the SUMOs.

def _load_raw_events(filename: Union[str, os.PathLike]) -> sc.DataArray:
table = sc.io.load_csv(filename, sep='\t', header_parser='bracket', data_columns=[])
table = table.rename_dims(row='event')
return sc.DataArray(sc.ones(sizes=table.sizes), coords=table.coords)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@celinedurniak Should we add variances for the weights (all ones)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jl-wynen I think this is still missing?

detector_groups: sc.DataArray, detector_id_name: str, detector_id: sc.Variable
) -> Optional[sc.DataArray]:
try:
return detector_groups[detector_id_name, detector_id].value.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this copy necessary? Later you anyway group by component names, so another copy will be made.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing gets grouped after this. In the case of mantle and high res, the object returned here is returned to the user.
I can change it such that it doesn't copy the endcaps because they are concatenated anyway.

if (det := _extract_detector(groups, detector_id_name, i)) is not None
]
if endcaps_list:
endcaps = sc.concat(endcaps_list, data.dim)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concat (and thus copy) could be avoided by using bin instead of group above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of splitting the endcaps, and using concat to merge them again, based on:

MANTLE_DETECTOR_ID = sc.index(7)
HIGH_RES_DETECTOR_ID = sc.index(8)
ENDCAPS_DETECTOR_IDS = tuple(map(sc.index, (3, 4, 5, 6)))

Make bin edges as sc.array(dims=[detector_id_name], values=[3,7,8,9], unit=None_ or something like that? Then use groups = data.bins(edges).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but to be honest I do not know if that is really faster. So I'd say: leave it as it is, until we have evidence saying otherwise.

Comment on lines 67 to 68
endcap_forward = endcaps[endcaps.coords['z_pos'] > sc.scalar(0, unit='mm')]
endcap_backward = endcaps[endcaps.coords['z_pos'] < sc.scalar(0, unit='mm')]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boolean indexing might be quite slow, have you considered using bin?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. And I would want to have a benchmark to see if this is actually faster because it needs multiple passes over the events, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does nit need multiple passes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought because it needs to find min and max. But we can just use +/- Inf.
Then it still needs to bin + copy each bin.

Comment on lines 73 to 80
return {
key: val
for key, val in zip(
('mantle', 'high_resolution', 'endcap_forward', 'endcap_backward'),
(mantle, high_res, endcap_forward, endcap_backward),
)
if val is not None
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems simpler to just init the dict step by step above, avoiding, e.g., also the else for the endcap handling?

@jl-wynen jl-wynen marked this pull request as ready for review September 28, 2023 08:38
endcaps = endcaps.bin(
z_pos=sc.array(
dims=['z_pos'],
values=[-np.inf, 0.0, np.inf],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this assumes that the origin is always in the middle of the detectors? i.e. the sample position. Is that always the case in Geant4?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In GEANT4, the origin is at the sample position

Comment on lines +50 to +53
'mantle',
'high_resolution',
'endcap_forward',
'endcap_backward',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a file that I got from @celinedurniak there were more entries than this: see #184 (comment)

Do we need to support both or is this now the new standard layout we will get?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the SUMOs in that file correspond to the endcaps. Is this correct, @celinedurniak ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jl-wynen you are correct. SUMOs are sub-divisions of the endcap detectors (i.e., the concentric partial rings) numbered from 3 to 6

def _load_raw_events(filename: Union[str, os.PathLike]) -> sc.DataArray:
table = sc.io.load_csv(filename, sep='\t', header_parser='bracket', data_columns=[])
table = table.rename_dims(row='event')
return sc.DataArray(sc.ones(sizes=table.sizes), coords=table.coords)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jl-wynen I think this is still missing?

@jl-wynen jl-wynen merged commit 7025e17 into main Oct 3, 2023
@jl-wynen jl-wynen deleted the load_dream_csv branch October 3, 2023 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Load DREAM Geant4 csv files

5 participants