-
Couldn't load subscription status.
- Fork 1
Restructure data model #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructure data model #6
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6 +/- ##
==========================================
- Coverage 93.22% 83.68% -9.54%
==========================================
Files 4 6 +2
Lines 118 190 +72
==========================================
+ Hits 110 159 +49
- Misses 8 31 +23 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidetorre99 awesome work here! Things are looking much cleaner and it's great to see you thinking about this stuff 🙂 below is nitpicky, only writing it out to get you thinking, up to you what you do with the info!
My high level nit would be that I'm not sure the separation is quite right within EtomoData
class EtomoData(BaseModel):
edf_metadata: EdfData
tilt_angles: TiltAngleData
transforms: TransformDataWhen modelling data you generally have two paradigms, array of structures or structure of arrays.
In AlnFile we went with AoS, modelling the per tilt parameters in https://github.com/teamtomo/alnfile/blob/cf7220a294b95376664cb064f0d7abc0bbc96606/src/alnfile/data_model/global_alignments.py#L8
I think I went with AoS in AlnFile because it more closely matched what you already had and it handled the global/local separation nicely...
In this case you've gone with structure of arrays but with an additional level in the hierarchy, the boundaries between members of that structure feel a bit unnatural to me...
The choice between AoS and SoA is basically arbitrary for this use case - no perf concerns, just ergonomics... the file structure in IMOD (xf, tlt) means people think about things in terms of SoA so I think it's the right choice here. To maximize ergonomics I would suggest something like the flat SoA below - you could add properties for the padded xf/tlt etc if you liked
class EtomoData(BaseModel):
basename: str
tilt_series_extension: str
tilt_axis_angle: float | None = None
excluded_views: Set[int]
n_images: int
xf: np.ndarray
tlt: list[int]
rawtlt: list[int]
xtilt: list[int]You could then move the class methods on the models that no longer exist into bare functions in imod_utils.py or something
Does this make sense/feel cleaner to you?
|
Thank you so much — that was super helpful! I fully agree with your suggestion! I actually started out this way, but then I got a bit lost trying to keep the structure as close as possible to alnfile. This way looks much cleaner to me, I agree. Does this implementation reflect what you had in mind in your comment? Thanks again! |
|
Makes total sense - super fast turnaround! New model looks great few last things to consider before merge (last round I promise!)
|
|
Great! no worries at all, this feedback is super helpful! This last commit includes the following changes: Does this align with what you were envisioning? |
|
🥳🥳🥳 let's goooo! |
@alisterburt, here I tried to make the data model explicit, as in alnfile, and added xf_to_array to output the xf as a (n, 2, 3) array (again, identical to alnfile).
I'd love to hear your thoughts!