-
Notifications
You must be signed in to change notification settings - Fork 169
Fix for not all input objects supporting last_line
#1222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I've converted it to draft only so I know when you're done. For the coverage, we don't have it turned on in GitHub right now, but you can get it by doing python -m pytest -v --cov=cclib --cov-report=html -k 'test_parser' test and open |
Ok, just right off the bat two io tests are failing because the depend on stdin, which is disabled by pytest. The recommended way around this is to emulate with StringIO, but the tests seem to depend on stdin's specific seek method for some reason? Not really sure what's going on but try to fix them up as best I can |
stdin = io.StringIO(contents) | ||
except TypeError: | ||
stdin = io.StringIO(unicode(contents)) | ||
stdin.seek = sys.stdin.seek |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this is necessary for the test
I've had a look-over the IO internals and there's a fair bit to do I think. We support lots of different file objects, which is great, but they all get processed at different points. Some processing is done in I think a class based approach is probably the way forward here; have one |
For (my) reference, a non-exhaustive list of the types of input we can read from at various points:
|
Two parsers (adf and Gaussian) call |
FYI I think some of my type annotations are wrong and shouldn't be trusted. The code will have to be read as you've done.
A bad part of this is that it isn't clear what functionality other people are even using. I doubt that anyone is passing URLs directly or using the streaming functionality (that's defeated if you're gonna call |
Yeah no problem at all, I'll go through the type annotations at the end once everything's working ok. Totally agree re. not clear what's actually being used. I presume at one point all these different inputs were being used by someone, but whether that's still the case is hard to say for sure. The weirdest one for me to understand is the unseekable stream, which I suspect is really just to support |
Ok, must of the grunt work on this is now done and it's ready for review. There's a few more things that would be nice to do at some point, but can't presently without badly breaking backwards compatibility (which should be mostly maintained as is). Some things I don't like:
|
Also the test is failing for some reason relating to logging that I can't fathom, do we overload the logging module as part of CI? |
Yeah agreed. Yes it's separate, but all logging done outside of the parser is now with the same Haha no problem. They have just closed the bug report though so what good it will do remains to be seen... |
This is why I'm tiring of open source. That person obviously didn't read any of what you said and assumed you were intentionally using distutils. We could propose an issue to PySCF since all their downstream projects would be affected. |
I promise to give this a review by the end of my day today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not done, but here is a first batch of questions.
cclib/parser/turbomoleparser.py
Outdated
# A list of previous lines to allow look-behind functionality. | ||
self.last_lines = collections.deque([""] * 10, 10) | ||
|
||
def sort_input(self, file_names: list) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting this alongside the parser is ok, but the reason it was a bare function is that self
isn't necessary. I could see it being an abstract class method in the future, for other parsers that can take multiple files like Molpro, but a top-level (private) function would work just as well.
Same comment about using list
as a type annotation. (It translates to typing.List[Any]
which isn't great.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ok I see your point. My main motivation for making it a class method was to take advantage of inheritance to automatically pick the correct sorting function, seeing as how we've already got the correct parser class at this point. Should make future additions easier because there's no need to maintain a table of functions or similar.
I'll change it to a classmethod, although if we'd rather go with non-class function I'm not totally opposed either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I actually rely on cclib.io.ccio.sort_turbomole_outputs()
in my own code, which this change obviously breaks. For me that's not a problem, but for others who might rely on it this is a non backwards compatible change. Think it's worth adding back cclib.io.ccio.sort_turbomole_outputs()
as an alias to the new class method (or whatever we end up going with)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaking a function like that across a major or even minor (but not patch) release is ok, at least when you consider how many people were likely using it compared to ccread
or ccopen
.
Keeping it as an alias is ok, and if we had an official way of doing deprecation (it's never come up) we could get rid of it after 1.8.x.
But I am thinking the method is the better approach. It just isn't clear to me yet which one of instance/class/static it should be. Class is just a good compromise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I'll add an alias and make it emit a warning on first use, that way at least the user/developer is aware they're using a deprecated function.
Class or abstract makes the most sens I think, and to me class always feels more natural than abstract...
Yeah indeed, happily though they have now reopened the issue, so maybe some hope does yet remain :) Hmm yeah I did consider opening an issue on PySCF too, the problem for them though is that there's no easy workaround if they need that ctypeslib function (except to manually fix the logger afterwards I guess?) |
…ogger chosen, rather than relying on the name of the logging object
We can wait for changes on NumPy's end. PySCF isn't doing anything wrong by using ctypeslib. But letting them know would be good too, since I've seen they perform workaround for other NumPy API changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last thing with log levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Parser now checks before accessing
last_line
in case it's not available, andfileinput.FileInput
is now correctly wrapped withFileWrapper
(so it now does havelast_line
)In theory, we probably want all input types to be wrapped with FileWrapper, but I didn't make this change because 1) there's a weird amount of IO boilerplate code and I don't understand what a lot of it is doing and 2) I'm not sure how well covered by tests IO is...