Read CDS tables where the data is split in separate files #18505

hamogu · 2025-08-08T14:25:42Z

Description

Part of the CDS specs is that data files can be split to keep each ASCII file below 10 MB. There are few (possibly only the Tycho 2 catalog) such cases in CDS, but we want to be complete.
The implementation is in the CDS reader, looking for the very specific case where a ReadMe file is given, and the catalog file is not found with the expected name. In that and only that case, the code checks if split data files exist.

Fixes #4121

By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

github-actions · 2025-08-08T14:25:54Z

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

Do the proposed changes actually accomplish desired goals?
Do the proposed changes follow the Astropy coding guidelines?
Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

astropy/io/ascii/cds.py

dhomeier · 2025-08-08T17:02:25Z

I think the iers test is somewhat nonstandard in that it reuses a iers.IERS_B instance that has already previously opened a valid file. Might perhaps be considered abuse, but it’s probably still better to guard against such cases.

hamogu · 2025-08-08T17:23:57Z

The last commit should take care of the issue.

hamogu · 2025-08-08T17:24:47Z

pre-commit.ci autofix

hamogu · 2025-08-08T17:40:39Z

pre-commit.ci autofix

dhomeier · 2025-08-08T18:11:31Z

CI is passing, so that part looks good, thanks! Don't have time for a full review now, but I'll try to get to it by next week.

mhvk

This looks very nice!

astropy/io/ascii/cds.py

taldcroft

Looks good to me.

astropy/io/ascii/cds.py

Part of the CDS specs is that data files can be split to keep each ASCII file below 10 MB. There are few (possibly only the Tycho 2 catalog) such cases in CDS, but we want to be complete. The implementation is in the CDS reader, looking for the very specific case where a ReadMe file is given, and the catalog file is not found with the expected name. In that and only that case, the code checks if split data files exist. closes astropy#4121

tighten up regular expression reorder imports

for more information, see https://pre-commit.ci

mhvk

Good that @taldcroft found a possible problem. Reapproving (my inline comment is really only in case something else needs to be changed too).

mhvk · 2025-08-11T13:01:43Z

astropy/io/ascii/cds.py

+            # deal with table where the ReadMe is present, but the data is split over several data files
+            if self.header.readme is not None:
+                path = Path(table)
+                pattern = re.compile(r"\.(\d{2,3})(\.gz)?$")


Not worth re-running CI for, but if there are other changes needed, I would change this to,

if f_list := sorted(Path(table).parent.glob(path.name + "*")): pattern = re.compile(r"\.(\d{2,3})(\.gz)?$") numbers = ...

taldcroft

LGTM, thanks!

hamogu · 2025-08-17T00:07:22Z

@dhomeier Do you want to re-review this or can it be merged? (The only test failures is an allowed failure.)

hamogu · 2025-08-21T21:09:46Z

I should add:The only test failure is an allowed failure that has nothing do with this PR (timeout of some https connection)

hamogu requested review from taldcroft and dhomeier as code owners August 8, 2025 14:25

github-actions bot added the io.ascii label Aug 8, 2025

pllim added this to the v7.2.0 milestone Aug 8, 2025

dhomeier reviewed Aug 8, 2025

View reviewed changes

astropy/io/ascii/cds.py Outdated Show resolved Hide resolved

hamogu force-pushed the fix_4121 branch from da3be0d to c9be0fc Compare August 8, 2025 16:58

hamogu force-pushed the fix_4121 branch from f314b14 to c8d10ae Compare August 8, 2025 17:37

hamogu mentioned this pull request Aug 8, 2025

Cycle 4 funding: Moritz io.ascii astropy/astropy-project#405

Open

mhvk approved these changes Aug 8, 2025

View reviewed changes

astropy/io/ascii/cds.py Outdated Show resolved Hide resolved

taldcroft requested changes Aug 9, 2025

View reviewed changes

astropy/io/ascii/cds.py Outdated Show resolved Hide resolved

astropy/io/ascii/cds.py Outdated Show resolved Hide resolved

hamogu added 3 commits August 10, 2025 15:24

add towncrier fragment

38ff905

Add more checks to ensure new code path is not triggered accidentially

808039d

tighten up regular expression reorder imports

hamogu force-pushed the fix_4121 branch from 6933cf1 to 099a2b9 Compare August 10, 2025 19:24

review updated and auto fixes from pre-commit.com hooks

752e8d7

for more information, see https://pre-commit.ci

hamogu force-pushed the fix_4121 branch from 099a2b9 to 752e8d7 Compare August 11, 2025 12:10

mhvk approved these changes Aug 11, 2025

View reviewed changes

taldcroft approved these changes Aug 11, 2025

View reviewed changes

Uh oh!

Read CDS tables where the data is split in separate files #18505

Are you sure you want to change the base?

Read CDS tables where the data is split in separate files #18505

Uh oh!

Conversation

hamogu commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

Uh oh!

dhomeier commented Aug 8, 2025

Uh oh!

hamogu commented Aug 8, 2025

Uh oh!

hamogu commented Aug 8, 2025

Uh oh!

hamogu commented Aug 8, 2025

Uh oh!

dhomeier commented Aug 8, 2025

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

taldcroft left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

mhvk Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

taldcroft left a comment

Choose a reason for hiding this comment

Uh oh!

hamogu commented Aug 17, 2025

Uh oh!

hamogu commented Aug 21, 2025

Uh oh!

Uh oh!

hamogu commented Aug 8, 2025 •

edited

Loading