Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

hamogu
Copy link
Member

@hamogu hamogu commented Aug 8, 2025

Description

CDS/Vizier serves large datafiles in gzipped for by default, but the ".gz" ending is not included in the filename in the ReadMe file and thus the reader will not find the metadata and fail.
This PR simply strips off any ".gz" ending before searching the ReadMe.

fixes #6549

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

@hamogu hamogu added this to the v7.2.0 milestone Aug 8, 2025
Copy link
Contributor

github-actions bot commented Aug 8, 2025

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

@hamogu
Copy link
Member Author

hamogu commented Aug 8, 2025

pre-commit.ci autofix

@@ -108,7 +108,9 @@ def get_cols(self, lines):
# Iterate on names to find if one matches the tablename
# including wildcards.
for pattern in names:
if fnmatch.fnmatch(self.data.table_name, pattern):
if fnmatch.fnmatch(
self.data.table_name.rstrip(".gz"), pattern
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming people don't get naughty and use suffix variations like .tgz, .gzip, .GZ, and so on? 😬

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CDS itself only uses ".gz". This is intentionally narrow to avoid casting too wide a net of catching random stuff, to goal is to make it work with what CDS gives you (in particular, if you get it directly fro the URL as in #6549, which I tested manually with this branch, but I didn't think it was necessary to add another remote-data test for that. Those are more flaky then tests with included files).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very reasonable as long as CDS sticks to gzip. For really large files a format with better and faster decompression could actually make sense, but if you do that on your locally downloaded file, you can also think of xz'ing the readme as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.rstrip is incorrect -- it removes all occurrences of ., g, and z from the filename; you need .removesuffix (python >= 3.9, so OK):

"yaygggzzz.gz".rstrip(".gz")
# 'yay'
"yaygggzzz.gz".removesuffix(".gz")
# 'yaygggzzz'

@hamogu
Copy link
Member Author

hamogu commented Aug 8, 2025

The coverage failure makes no sense. All the allegedly untested lines are in files that have nothing to do with this PR.

@pllim
Copy link
Member

pllim commented Aug 8, 2025

Patch coverage is 100% so I won't worry too much about it.

CDS/Vizier serves large datafiles in gzipped for by default, but the ".gz" ending is not included in the filename in the ReadMe file and thus the reader will not find the metadata and fail.
This PR simply strips off any ".gz" ending before searching the ReadMe.

closes astropy#6549
@hamogu
Copy link
Member Author

hamogu commented Aug 10, 2025

pre-commit.ci autofix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot retrieve gzipped tables from CDS using astropy.table
4 participants