-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Find metadata in CDS readme even for gzipped files #18506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
|
pre-commit.ci autofix |
astropy/io/ascii/cds.py
Outdated
@@ -108,7 +108,9 @@ def get_cols(self, lines): | |||
# Iterate on names to find if one matches the tablename | |||
# including wildcards. | |||
for pattern in names: | |||
if fnmatch.fnmatch(self.data.table_name, pattern): | |||
if fnmatch.fnmatch( | |||
self.data.table_name.rstrip(".gz"), pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming people don't get naughty and use suffix variations like .tgz
, .gzip
, .GZ
, and so on? 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CDS itself only uses ".gz". This is intentionally narrow to avoid casting too wide a net of catching random stuff, to goal is to make it work with what CDS gives you (in particular, if you get it directly fro the URL as in #6549, which I tested manually with this branch, but I didn't think it was necessary to add another remote-data test for that. Those are more flaky then tests with included files).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very reasonable as long as CDS sticks to gzip. For really large files a format with better and faster decompression could actually make sense, but if you do that on your locally downloaded file, you can also think of xz'ing the readme as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.rstrip
is incorrect -- it removes all occurrences of .
, g
, and z
from the filename; you need .removesuffix
(python >= 3.9, so OK):
"yaygggzzz.gz".rstrip(".gz")
# 'yay'
"yaygggzzz.gz".removesuffix(".gz")
# 'yaygggzzz'
The coverage failure makes no sense. All the allegedly untested lines are in files that have nothing to do with this PR. |
Patch coverage is 100% so I won't worry too much about it. |
CDS/Vizier serves large datafiles in gzipped for by default, but the ".gz" ending is not included in the filename in the ReadMe file and thus the reader will not find the metadata and fail. This PR simply strips off any ".gz" ending before searching the ReadMe. closes astropy#6549
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
Description
CDS/Vizier serves large datafiles in gzipped for by default, but the ".gz" ending is not included in the filename in the ReadMe file and thus the reader will not find the metadata and fail.
This PR simply strips off any ".gz" ending before searching the ReadMe.
fixes #6549