Thanks to visit codestin.com
Credit goes to github.com

Skip to content

POC: add validation of formatters to converters #25662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ksunden
Copy link
Member

@ksunden ksunden commented Apr 11, 2023

PR Summary

Closes #24951

Certainly needs tests/docs prior to merge, but opening to further the conversation.

This introduces a method to Converters to validate that a given formatter works with with the Converter.

Currently it is implemented as a staticmethod which returns a bool and then matplotlib warns if that bool is false.
One option that I would consider would be to push the warning into the method, which could then suggest the valid types with a more targetted error message.

I chose a method so that more complicated logic could be used, but so far all instances I have implemented or considered do simple isinstance checks.
One alternative if this was sufficient would be to make the thing that gets propegated a list (tuple?) of Formatter classes and then do the isinstance check directly rather than calling the method.
This could potentially also have APIs for registering formatters as valid for a given converter (or just be directly mutable as a list).
The cost, though is that you are limited to only isinstance checks there.
(Of course with the method implementation, individual classes are free to implement such registration APIs, but lacks standardization)

If the converter is not set, it does not get checked.
Default behavior of ConversionInterface (and any subclasses which do not override the method) is to accept any formatter (i.e. return True indiscriminantly).
Thus except where downstream inherits from DateConverter or StrCategoryConverter it is entirely opt-in to this additional check.
Converters that expect essentially just float values (and should work with many different formatters that are not all enumerated) retain default behavior (internally, the example is DecimalConverter).
It is only a warning, not an error, mostly to ensure that if the list is wrong, you can still do what you want.

One slight wrinkle that is perhaps slightly awkward is that the NullFormatter does need to return True as that occurs during some setup/initialization phases (as well as just the semantics that you should be able to use NullFormatter to disable labels anyway)

@spencerkclark I am particularly interested in your opinion here
Here is the appropriate diff for nc-time-axis that I tested with:

diff --git a/nc_time_axis/__init__.py b/nc_time_axis/__init__.py
index a5276ad..76c4d74 100644
--- a/nc_time_axis/__init__.py
+++ b/nc_time_axis/__init__.py
@@ -582,6 +582,12 @@ class NetCDFTimeConverter(mdates.DateConverter):
 
         return result
 
+    @staticmethod
+    def validate_formatter(formatter):
+        return isinstance(formatter, (mticker.NullFormatter,
+                                      AutoCFTimeFormatter,
+                                      CFTimeFormatter))
+
 
 def is_numlike(x):
     """

The growing pain here is that without the above change, the NetCDFTimeConverter class from cftime will warn for the correct formatters and not the mpl-default date formatters (that don't work properly).
This is because of the mdates.DateConverter inheritance that is not overridden without the above diff. As such, not sure there is any path that doesn't inherit the incorrect behavior without a change in nc-time-axis.

Since this affects downstream with warnings (not direct errors, at least), I would like to consult with at least the likes of Pandas (xarray's converting/formatting support is largely via nc-time-axis, above).
If there are other downstream projects we should consult with, happy to hear more.

I do not (yet) affirmatively test that the warning is issued, though it did fail a test which sets FixedFormatter on a date axis and already ignores a warning against using such without FixedLocator.

Questions:

  • Is the information sufficient to make determinations?
    • Would a classmethod be better, or is an instance method necessary? (considering all other functionality of ConversionInterface is staticmethod, that is where I started)
  • Would list of types be easier to manage, even if it limits flexibility?
  • is checking in set_formatter both sufficient and not overzealous?
    • What about if the converter is changed? should we ensure warning then?
    • What if you want to change both, can it be done without either warning or setting to None?

PR Checklist

Documentation and Tests

  • Has pytest style unit tests (and pytest passes)
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).
  • New plotting related features are documented with examples.

Release Notes

  • New features are marked with a .. versionadded:: directive in the docstring and documented in doc/users/next_whats_new/
  • API changes are marked with a .. versionchanged:: directive in the docstring and documented in doc/api/next_api_changes/
  • Release notes conform with instructions in next_whats_new/README.rst or next_api_changes/README.rst

@rcomer
Copy link
Member

rcomer commented Apr 14, 2023

suggest the valid types with a more targetted error message

I think this would be valuable: in the case of #24951, the user didn’t originally know about nc-time-axis. They were using xarray, which uses nc-time-axis under the hood. So if the message only said “not this formatter”, it would not be obvious what to do instead.

@tacaswell
Copy link
Member

Does it make more sense to put validation logic on the formatters (which make a judgement about if they can consume value from a given converter) instead? It would mean that every converter does not have to know about NullFormatter (as it could always declare itself compatible) which I think is a good code smell.

Converters are supposed to go to some unitless value which for values which have a natural 0 is fine (e.g. mass, weight, number of thing, etc) so most converters should work with most formatters . On the other hand time is the most common thing that has no natural 0, the float that comes out of the converter always has to be relative to some point (and with some precision) which is the source of our problems. Thus if our date formatters were very cranky about what converters they can work with we would catch most (all) of the cftime / pandas issues. I think we can add a null return True method to the base Forammter class (and be defensive about duck-implementations).

I would also suggest letting the validate function/method raise instead of return False. That way we can defer to the implementations the ability to format the error messages based on what they know.

@ksunden
Copy link
Member Author

ksunden commented Apr 17, 2023

See #24951 (comment) for why I implemented it in this direction. @spencerkclark had indicated a willingness to consider dropping the inheritance afterwards, which would resolve that worry in this particular case. Without doing "no subclasses" I don't know what checks we can do that would catch the current implementation. (which, to be fair, it doesn't do the right thing without a change, as noted above)

As for raising instead of returning bool, that is fair, I was on the fence about that so just picked one to start with.

@jklymak
Copy link
Member

jklymak commented Apr 17, 2023

I'm not quite following the details here. It pretty quickly got into details about how the warnings are presented etc. However, it sounds like a converter will have to whitelist a bunch of Formatters? What method does the user have to override this if they want to, for instance, write their own Formatter?

I'm tending to lean towards @tacaswell opinion that the Formatters should decide what converter(s) they are happy to work with. But, as usually, I'm probably a couple of steps behind following what the idea is here.

@ksunden
Copy link
Member Author

ksunden commented Apr 17, 2023

Essentially, the problem is that there is no good way to disallow the "broken" formatter from nc-time-axis as it is implemented from our, incompatible, converters.

The original report is someone trying to use our formatter with their converter.
Their Converter subclasses from mpl.dates.DateConverter, so isinstance of the Converter from our formatter is insufficient.

That is why I inverted the check. That said, it still does not succeed in doing what we want without a change downstream. It all kind of breaks down because whatever we implement on our converter is inherited by nc-time-axis unless overridden.
So the only check we could do would be to do not isinstance, but rather type(converter) == DateConverter style checks, which would totally preclude alternate converters from being considered... (as subclassing won't be valid)

With this proposed implementation, subclassing would allow users to write their own formatters. (which, admittedly, I don't really like requiring, but I was drawing on the example I have in front of me...)
If we want to limit it to only isinstance checks, I think we could maintain an allow list and have register/deregister functions, etc.

If we can get nc-time-axis to break the inheritance of their converter (which they seemed willing to do), then yes, I think we can go back to formatters holding the allow list, with all of the "default" ones simply passing.

I don't fully know that the problem of needing to account for the null case (even if only during initialization/setting of the formatter) is actually erased in the other direction either, though... I'd have to think about it a bit more/try it out...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Matplotlib date formatter and cftime and pandas incompatible ?
4 participants