-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Read CDS tables where the data is split in separate files #18505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
|
I think the iers test is somewhat nonstandard in that it reuses a |
The last commit should take care of the issue. |
pre-commit.ci autofix |
pre-commit.ci autofix |
CI is passing, so that part looks good, thanks! Don't have time for a full review now, but I'll try to get to it by next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Part of the CDS specs is that data files can be split to keep each ASCII file below 10 MB. There are few (possibly only the Tycho 2 catalog) such cases in CDS, but we want to be complete. The implementation is in the CDS reader, looking for the very specific case where a ReadMe file is given, and the catalog file is not found with the expected name. In that and only that case, the code checks if split data files exist. closes astropy#4121
tighten up regular expression reorder imports
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good that @taldcroft found a possible problem. Reapproving (my inline comment is really only in case something else needs to be changed too).
# deal with table where the ReadMe is present, but the data is split over several data files | ||
if self.header.readme is not None: | ||
path = Path(table) | ||
pattern = re.compile(r"\.(\d{2,3})(\.gz)?$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not worth re-running CI for, but if there are other changes needed, I would change this to,
if f_list := sorted(Path(table).parent.glob(path.name + "*")):
pattern = re.compile(r"\.(\d{2,3})(\.gz)?$")
numbers = ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@dhomeier Do you want to re-review this or can it be merged? (The only test failures is an allowed failure.) |
I should add:The only test failure is an allowed failure that has nothing do with this PR (timeout of some https connection) |
Description
Part of the CDS specs is that data files can be split to keep each ASCII file below 10 MB. There are few (possibly only the Tycho 2 catalog) such cases in CDS, but we want to be complete.
The implementation is in the CDS reader, looking for the very specific case where a ReadMe file is given, and the catalog file is not found with the expected name. In that and only that case, the code checks if split data files exist.
Fixes #4121