Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@davidetorre99
Copy link
Collaborator

@alisterburt, here I tried to make the data model explicit, as in alnfile, and added xf_to_array to output the xf as a (n, 2, 3) array (again, identical to alnfile).
I'd love to hear your thoughts!

import etomofiles

# Read etomo alignment data
df = etomofiles.read("/path/to/etomo/directory")

print(df.head())
import etomofiles

# Read alignment data
df = etomofiles.read("TS_001/")

# Get xf as numpy array:
xf_matrices = etomofiles.xf_to_array(df)

# Also works directly with files
xf_matrices = etomofiles.xf_to_array("TS_001/TS_001.xf")

# Choose row ordering convention
xf_xy = etomofiles.xf_to_array(df)  # default xy
# Each matrix is [[A11, A12, DX], [A21, A22, DY]]

xf_yx = etomofiles.xf_to_array(df, yx=True)  # yx
# Each matrix is [[A22, A21, DY], [A12, A11, DX]]

@codecov
Copy link

codecov bot commented Oct 21, 2025

Codecov Report

❌ Patch coverage is 83.06011% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.68%. Comparing base (1258413) to head (aa907c0).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
src/etomofiles/imod_utils.py 66.66% 28 Missing ⚠️
src/etomofiles/__init__.py 75.00% 2 Missing ⚠️
src/etomofiles/data_model/etomo_data.py 96.42% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (1258413) and HEAD (aa907c0). Click for more details.

HEAD has 96 uploads less than BASE
Flag BASE (1258413) HEAD (aa907c0)
108 12
Additional details and impacted files
@@            Coverage Diff             @@
##             main       #6      +/-   ##
==========================================
- Coverage   93.22%   83.68%   -9.54%     
==========================================
  Files           4        6       +2     
  Lines         118      190      +72     
==========================================
+ Hits          110      159      +49     
- Misses          8       31      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@alisterburt alisterburt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidetorre99 awesome work here! Things are looking much cleaner and it's great to see you thinking about this stuff 🙂 below is nitpicky, only writing it out to get you thinking, up to you what you do with the info!

My high level nit would be that I'm not sure the separation is quite right within EtomoData

class EtomoData(BaseModel):
    edf_metadata: EdfData
    tilt_angles: TiltAngleData
    transforms: TransformData

When modelling data you generally have two paradigms, array of structures or structure of arrays.

In AlnFile we went with AoS, modelling the per tilt parameters in https://github.com/teamtomo/alnfile/blob/cf7220a294b95376664cb064f0d7abc0bbc96606/src/alnfile/data_model/global_alignments.py#L8

I think I went with AoS in AlnFile because it more closely matched what you already had and it handled the global/local separation nicely...

In this case you've gone with structure of arrays but with an additional level in the hierarchy, the boundaries between members of that structure feel a bit unnatural to me...

The choice between AoS and SoA is basically arbitrary for this use case - no perf concerns, just ergonomics... the file structure in IMOD (xf, tlt) means people think about things in terms of SoA so I think it's the right choice here. To maximize ergonomics I would suggest something like the flat SoA below - you could add properties for the padded xf/tlt etc if you liked

class EtomoData(BaseModel):
    basename: str
    tilt_series_extension: str
    tilt_axis_angle: float | None = None
    excluded_views: Set[int]
    n_images: int
    xf: np.ndarray
    tlt: list[int]
    rawtlt: list[int]
    xtilt: list[int]

You could then move the class methods on the models that no longer exist into bare functions in imod_utils.py or something

Does this make sense/feel cleaner to you?

@davidetorre99
Copy link
Collaborator Author

Thank you so much — that was super helpful!

I fully agree with your suggestion! I actually started out this way, but then I got a bit lost trying to keep the structure as close as possible to alnfile. This way looks much cleaner to me, I agree.

Does this implementation reflect what you had in mind in your comment?

Thanks again!

@alisterburt
Copy link

Makes total sense - super fast turnaround! New model looks great

few last things to consider before merge (last round I promise!)

  • I see df_to_xf() in the example and you marked it as resolved but don't see it in the implementation?
  • maybe drop EdfData for a parse_edf() in IMOD utils?

@davidetorre99
Copy link
Collaborator Author

Great! no worries at all, this feedback is super helpful!

This last commit includes the following changes:
-Replaced the EDF Pydantic model with a parse_edf() function in imod_utils
-Switched to using a TypedDict (EdfMetadata) instead of Pydantic for EDF parsing (does that seem reasonable to you?)
-Simplified the data_model to a single Pydantic model: EtomoDataFile (previously EtomoData)
(- df_to_xf, sorry, I first noted it down and then forgot it on the previous commit)

Does this align with what you were envisioning?

@alisterburt
Copy link

🥳🥳🥳 let's goooo!

@davidetorre99 davidetorre99 merged commit e188807 into teamtomo:main Oct 22, 2025
15 of 17 checks passed
@davidetorre99 davidetorre99 deleted the restructure-data-model branch October 22, 2025 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants