POC: add validation of formatters to converters #25662

ksunden · 2023-04-11T07:02:21Z

PR Summary

Certainly needs tests/docs prior to merge, but opening to further the conversation.

This introduces a method to Converters to validate that a given formatter works with with the Converter.

Currently it is implemented as a staticmethod which returns a bool and then matplotlib warns if that bool is false.
One option that I would consider would be to push the warning into the method, which could then suggest the valid types with a more targetted error message.

I chose a method so that more complicated logic could be used, but so far all instances I have implemented or considered do simple isinstance checks.
One alternative if this was sufficient would be to make the thing that gets propegated a list (tuple?) of Formatter classes and then do the isinstance check directly rather than calling the method.
This could potentially also have APIs for registering formatters as valid for a given converter (or just be directly mutable as a list).
The cost, though is that you are limited to only isinstance checks there.
(Of course with the method implementation, individual classes are free to implement such registration APIs, but lacks standardization)

If the converter is not set, it does not get checked.
Default behavior of ConversionInterface (and any subclasses which do not override the method) is to accept any formatter (i.e. return True indiscriminantly).
Thus except where downstream inherits from DateConverter or StrCategoryConverter it is entirely opt-in to this additional check.
Converters that expect essentially just float values (and should work with many different formatters that are not all enumerated) retain default behavior (internally, the example is DecimalConverter).
It is only a warning, not an error, mostly to ensure that if the list is wrong, you can still do what you want.

One slight wrinkle that is perhaps slightly awkward is that the NullFormatter does need to return True as that occurs during some setup/initialization phases (as well as just the semantics that you should be able to use NullFormatter to disable labels anyway)

@spencerkclark I am particularly interested in your opinion here
Here is the appropriate diff for nc-time-axis that I tested with:

diff --git a/nc_time_axis/__init__.py b/nc_time_axis/__init__.py
index a5276ad..76c4d74 100644
--- a/nc_time_axis/__init__.py
+++ b/nc_time_axis/__init__.py
@@ -582,6 +582,12 @@ class NetCDFTimeConverter(mdates.DateConverter):
 
         return result
 
+    @staticmethod
+    def validate_formatter(formatter):
+        return isinstance(formatter, (mticker.NullFormatter,
+                                      AutoCFTimeFormatter,
+                                      CFTimeFormatter))
+
 
 def is_numlike(x):
     """

The growing pain here is that without the above change, the NetCDFTimeConverter class from cftime will warn for the correct formatters and not the mpl-default date formatters (that don't work properly).
This is because of the mdates.DateConverter inheritance that is not overridden without the above diff. As such, not sure there is any path that doesn't inherit the incorrect behavior without a change in nc-time-axis.

Since this affects downstream with warnings (not direct errors, at least), I would like to consult with at least the likes of Pandas (xarray's converting/formatting support is largely via nc-time-axis, above).
If there are other downstream projects we should consult with, happy to hear more.

I do not (yet) affirmatively test that the warning is issued, though it did fail a test which sets FixedFormatter on a date axis and already ignores a warning against using such without FixedLocator.

Questions:

Is the information sufficient to make determinations?
- Would a classmethod be better, or is an instance method necessary? (considering all other functionality of ConversionInterface is staticmethod, that is where I started)
Would list of types be easier to manage, even if it limits flexibility?
is checking in set_formatter both sufficient and not overzealous?
- What about if the converter is changed? should we ensure warning then?
- What if you want to change both, can it be done without either warning or setting to None?

PR Checklist

Documentation and Tests

Has pytest style unit tests (and pytest passes)
Documentation is sphinx and numpydoc compliant (the docs should build without error).
New plotting related features are documented with examples.

Release Notes

New features are marked with a .. versionadded:: directive in the docstring and documented in doc/users/next_whats_new/
API changes are marked with a .. versionchanged:: directive in the docstring and documented in doc/api/next_api_changes/
Release notes conform with instructions in next_whats_new/README.rst or next_api_changes/README.rst

rcomer · 2023-04-14T19:55:43Z

suggest the valid types with a more targetted error message

I think this would be valuable: in the case of #24951, the user didn’t originally know about nc-time-axis. They were using xarray, which uses nc-time-axis under the hood. So if the message only said “not this formatter”, it would not be obvious what to do instead.

tacaswell · 2023-04-14T22:46:55Z

Does it make more sense to put validation logic on the formatters (which make a judgement about if they can consume value from a given converter) instead? It would mean that every converter does not have to know about NullFormatter (as it could always declare itself compatible) which I think is a good code smell.

Converters are supposed to go to some unitless value which for values which have a natural 0 is fine (e.g. mass, weight, number of thing, etc) so most converters should work with most formatters . On the other hand time is the most common thing that has no natural 0, the float that comes out of the converter always has to be relative to some point (and with some precision) which is the source of our problems. Thus if our date formatters were very cranky about what converters they can work with we would catch most (all) of the cftime / pandas issues. I think we can add a null return True method to the base Forammter class (and be defensive about duck-implementations).

I would also suggest letting the validate function/method raise instead of return False. That way we can defer to the implementations the ability to format the error messages based on what they know.

ksunden · 2023-04-17T17:30:45Z

See #24951 (comment) for why I implemented it in this direction. @spencerkclark had indicated a willingness to consider dropping the inheritance afterwards, which would resolve that worry in this particular case. Without doing "no subclasses" I don't know what checks we can do that would catch the current implementation. (which, to be fair, it doesn't do the right thing without a change, as noted above)

As for raising instead of returning bool, that is fair, I was on the fence about that so just picked one to start with.

jklymak · 2023-04-17T20:39:58Z

I'm not quite following the details here. It pretty quickly got into details about how the warnings are presented etc. However, it sounds like a converter will have to whitelist a bunch of Formatters? What method does the user have to override this if they want to, for instance, write their own Formatter?

I'm tending to lean towards @tacaswell opinion that the Formatters should decide what converter(s) they are happy to work with. But, as usually, I'm probably a couple of steps behind following what the idea is here.

ksunden · 2023-04-17T21:16:07Z

Essentially, the problem is that there is no good way to disallow the "broken" formatter from nc-time-axis as it is implemented from our, incompatible, converters.

The original report is someone trying to use our formatter with their converter.
Their Converter subclasses from mpl.dates.DateConverter, so isinstance of the Converter from our formatter is insufficient.

That is why I inverted the check. That said, it still does not succeed in doing what we want without a change downstream. It all kind of breaks down because whatever we implement on our converter is inherited by nc-time-axis unless overridden.
So the only check we could do would be to do not isinstance, but rather type(converter) == DateConverter style checks, which would totally preclude alternate converters from being considered... (as subclassing won't be valid)

With this proposed implementation, subclassing would allow users to write their own formatters. (which, admittedly, I don't really like requiring, but I was drawing on the example I have in front of me...)
If we want to limit it to only isinstance checks, I think we could maintain an allow list and have register/deregister functions, etc.

If we can get nc-time-axis to break the inheritance of their converter (which they seemed willing to do), then yes, I think we can go back to formatters holding the allow list, with all of the "default" ones simply passing.

I don't fully know that the problem of needing to account for the null case (even if only during initialization/setting of the formatter) is actually erased in the other direction either, though... I'd have to think about it a bit more/try it out...

POC: add validation of formatters to converters

ba0c922

spencerkclark mentioned this pull request Apr 17, 2023

No longer subclass DateConverter in NetCDFTimeConverter SciTools/nc-time-axis#127

Draft

ksunden mentioned this pull request Apr 25, 2023

POC: Add validation of converters to formatters #25766

Draft

7 tasks

jklymak mentioned this pull request Sep 30, 2023

[Bug]: Matplotlib date formatter and cftime and pandas incompatible ? #24951

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

POC: add validation of formatters to converters #25662

POC: add validation of formatters to converters #25662

Uh oh!

ksunden commented Apr 11, 2023

Uh oh!

rcomer commented Apr 14, 2023

Uh oh!

tacaswell commented Apr 14, 2023

Uh oh!

ksunden commented Apr 17, 2023

Uh oh!

jklymak commented Apr 17, 2023

Uh oh!

ksunden commented Apr 17, 2023

Uh oh!

Uh oh!

Uh oh!

POC: add validation of formatters to converters #25662

Are you sure you want to change the base?

POC: add validation of formatters to converters #25662

Uh oh!

Conversation

ksunden commented Apr 11, 2023

PR Summary

PR Checklist

Uh oh!

rcomer commented Apr 14, 2023

Uh oh!

tacaswell commented Apr 14, 2023

Uh oh!

ksunden commented Apr 17, 2023

Uh oh!

jklymak commented Apr 17, 2023

Uh oh!

ksunden commented Apr 17, 2023

Uh oh!

Uh oh!