Restructure data model #6

davidetorre99 · 2025-10-21T16:37:30Z

@alisterburt, here I tried to make the data model explicit, as in alnfile, and added xf_to_array to output the xf as a (n, 2, 3) array (again, identical to alnfile).
I'd love to hear your thoughts!

import etomofiles

# Read etomo alignment data
df = etomofiles.read("/path/to/etomo/directory")

print(df.head())

import etomofiles

# Read alignment data
df = etomofiles.read("TS_001/")

# Get xf as numpy array:
xf_matrices = etomofiles.xf_to_array(df)

# Also works directly with files
xf_matrices = etomofiles.xf_to_array("TS_001/TS_001.xf")

# Choose row ordering convention
xf_xy = etomofiles.xf_to_array(df)  # default xy
# Each matrix is [[A11, A12, DX], [A21, A22, DY]]

xf_yx = etomofiles.xf_to_array(df, yx=True)  # yx
# Each matrix is [[A22, A21, DY], [A12, A11, DX]]

codecov · 2025-10-21T16:40:31Z

Codecov Report

❌ Patch coverage is 83.06011% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.68%. Comparing base (1258413) to head (aa907c0).
⚠️ Report is 9 commits behind head on main.

Files with missing lines	Patch %	Lines
src/etomofiles/imod_utils.py	66.66%	28 Missing ⚠️
src/etomofiles/__init__.py	75.00%	2 Missing ⚠️
src/etomofiles/data_model/etomo_data.py	96.42%	1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (1258413) and HEAD (aa907c0). Click for more details.

HEAD has 96 uploads less than BASE

Flag BASE (1258413) HEAD (aa907c0)

108 12

Additional details and impacted files

@@            Coverage Diff             @@
##             main       #6      +/-   ##
==========================================
- Coverage   93.22%   83.68%   -9.54%     
==========================================
  Files           4        6       +2     
  Lines         118      190      +72     
==========================================
+ Hits          110      159      +49     
- Misses          8       31      +23

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

alisterburt

@davidetorre99 awesome work here! Things are looking much cleaner and it's great to see you thinking about this stuff 🙂 below is nitpicky, only writing it out to get you thinking, up to you what you do with the info!

My high level nit would be that I'm not sure the separation is quite right within EtomoData

class EtomoData(BaseModel):
    edf_metadata: EdfData
    tilt_angles: TiltAngleData
    transforms: TransformData

When modelling data you generally have two paradigms, array of structures or structure of arrays.

In AlnFile we went with AoS, modelling the per tilt parameters in https://github.com/teamtomo/alnfile/blob/cf7220a294b95376664cb064f0d7abc0bbc96606/src/alnfile/data_model/global_alignments.py#L8

I think I went with AoS in AlnFile because it more closely matched what you already had and it handled the global/local separation nicely...

In this case you've gone with structure of arrays but with an additional level in the hierarchy, the boundaries between members of that structure feel a bit unnatural to me...

The choice between AoS and SoA is basically arbitrary for this use case - no perf concerns, just ergonomics... the file structure in IMOD (xf, tlt) means people think about things in terms of SoA so I think it's the right choice here. To maximize ergonomics I would suggest something like the flat SoA below - you could add properties for the padded xf/tlt etc if you liked

class EtomoData(BaseModel):
    basename: str
    tilt_series_extension: str
    tilt_axis_angle: float | None = None
    excluded_views: Set[int]
    n_images: int
    xf: np.ndarray
    tlt: list[int]
    rawtlt: list[int]
    xtilt: list[int]

You could then move the class methods on the models that no longer exist into bare functions in imod_utils.py or something

Does this make sense/feel cleaner to you?

src/etomofiles/data_model/edf.py

src/etomofiles/data_model/tlt.py

src/etomofiles/imod_utils.py

src/etomofiles/__init__.py

davidetorre99 · 2025-10-21T22:57:28Z

Thank you so much — that was super helpful!

I fully agree with your suggestion! I actually started out this way, but then I got a bit lost trying to keep the structure as close as possible to alnfile. This way looks much cleaner to me, I agree.

Does this implementation reflect what you had in mind in your comment?

Thanks again!

alisterburt · 2025-10-21T23:58:38Z

Makes total sense - super fast turnaround! New model looks great

few last things to consider before merge (last round I promise!)

I see df_to_xf() in the example and you marked it as resolved but don't see it in the implementation?
maybe drop EdfData for a parse_edf() in IMOD utils?

davidetorre99 · 2025-10-22T09:03:42Z

Great! no worries at all, this feedback is super helpful!

This last commit includes the following changes:
-Replaced the EDF Pydantic model with a parse_edf() function in imod_utils
-Switched to using a TypedDict (EdfMetadata) instead of Pydantic for EDF parsing (does that seem reasonable to you?)
-Simplified the data_model to a single Pydantic model: EtomoDataFile (previously EtomoData)
(- df_to_xf, sorry, I first noted it down and then forgot it on the previous commit)

Does this align with what you were envisioning?

alisterburt · 2025-10-22T09:33:26Z

🥳🥳🥳 let's goooo!

davidetorre99 added 3 commits October 21, 2025 18:29

restructure, make data_model explicit, xf_to_array

f973d14

fix: typo in README

63810d1

fix: add pydantic>=2.0.0 to dependencies

301ffa7

alisterburt reviewed Oct 21, 2025

View reviewed changes

flat SOA, move read functions to utils, some renaming

b0a5f0a

simplify to single data model, rename xf_to_array -> df_to_xf

aa907c0

alisterburt approved these changes Oct 22, 2025

View reviewed changes

davidetorre99 merged commit e188807 into teamtomo:main Oct 22, 2025
15 of 17 checks passed

davidetorre99 deleted the restructure-data-model branch October 22, 2025 09:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Restructure data model #6

Restructure data model #6

Uh oh!

davidetorre99 commented Oct 21, 2025

Uh oh!

codecov bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

alisterburt left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidetorre99 commented Oct 21, 2025

Uh oh!

alisterburt commented Oct 21, 2025

Uh oh!

davidetorre99 commented Oct 22, 2025

Uh oh!

alisterburt commented Oct 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Restructure data model #6

Restructure data model #6

Uh oh!

Conversation

davidetorre99 commented Oct 21, 2025

Uh oh!

codecov bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

alisterburt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidetorre99 commented Oct 21, 2025

Uh oh!

alisterburt commented Oct 21, 2025

Uh oh!

davidetorre99 commented Oct 22, 2025

Uh oh!

alisterburt commented Oct 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Oct 21, 2025 •

edited

Loading