-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Doc build fails with numpy>=2.1.0 #28780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It's not a proper fix for the problem, but the following patch does permit the documentation to be built with numpy 2.2 (as is needed right now in Debian). It's not obvious to me why the existing definitions of
|
Minimal reproducing example (using
It seems the culprit is the existence of the following function in
I suspect there's a change of precendence how the multiplication
is preferred. T.b.d. whether this change in numpy is intended. The other question is why the Edit: It must be a bit more complicated than this. The minimal example
returns a pure numpy array for numpy 2.2, 2.0 and 1.26. So there's some additional magic involved in getting Semi-OT: What is |
Far as I can tell, it's a toy units library:
And For example, the inches/centimeter cm = BasicUnit('cm', 'centimeters')
inch = BasicUnit('inch', 'inches')
inch.add_conversion_factor(cm, 2.54)
cm.add_conversion_factor(inch, 1/2.54) is used to support matplotlib doing the internal cm to inches conversion via the unit argument: data is in centimeters cms = cm * np.arange(0, 10, 2) plot centimeters axs[0, 0].plot(cms, cms) matplotlib will convert the data to inches for the y axis: axs[0, 1].plot(cms, cms, xunits=cm, yunits=inch) Also I totally think this should maybe be a tutorial and not a gallery entry. And far as I can tell is more supposed to demo the units functionality than explain how to write units. I didn't use it when I was learning the units pipeline - I think the datetime module is much easier to follow and there's also a cleaner unit implementation in test units. evan's test is I think a reasonable example and should maybe be renamed/reworked as "custom datatype" or something. Eta: also, if we want to keep the units examples working, maybe replacing |
Perhaps, but a little caution from the downstream perspective: when adding new build-dependencies, it is always very much appreciated if we could avoid creating circular build dependencies. They lead to a lot of extra complexity and manual steps for those who build modules and docs from source for redistribution, particularly when navigating Python interpreter and API bumps. It's obviously easier if we don't care about docs or tests, but we do :) Bootstrapping numpy2 with matplotlib/scipy/xarray/pandas/numpy build dependency cycles over the last month has been quite awkward, for example. |
How would adding a units library create circular build dependencies? I'm also wondering if we could do like website only docs and dependencies/docs we only build if the dependencies are available (like we already do for tests). Mostly, it's just that we seem to have issues with basic_units breaking the build relatively frequently and I think maintaining a mini units library should be somewhat out of scope for Matplotlib maintainers now that units libraries exist. There is also the option of pulling these examples out since internally the units framework is mostly used for data types and not units, but it'd be nice to show that the units framework works with units libraries. |
Forgive my ignorance, but why would downstream build our docs? I don't think we put much effort into thinking about anyone other than our developers building and deploying our docs, so understanding the use-case would be helpful. I don't think developing a doc dependency on something like pint would be a huge problem, other than perhaps pint is perhaps not how unit support will eventually be supported on arrays? pint is pretty low impact. |
I think arrays need both these things that the units framework (which I think is kinda badly named) is for:
I suggested pint b/c I don't know the units library landscape all that well and we almost definitely don't want to use the units from downstream libraries (which metpy is using pint anyway). |
Isn't that the same? Not a units expert, but aren't they wrapping the original numbers in custom datatypes? |
Yeah, I was differentiating on the UX but it's just
But also |
I'm fine with either: a small custom class or using pint. The problem with basic_unit is that it's a custom full united data with a lot of bells and whistles - that's quite hard to understand. Either custom small or "well known" full units are much less confusing. |
Forgot about it b/c it's a little more complicated but not by much is the
But the more I think about it, the less I think it's appropriate as a gallery example b/c it gets weedy. "How to write units and convertors" is a full on tutorial topic that should probably wait on ROSES. |
I'm fine with defering everything to ROSES.
I fully agree. I've mostly stayed away from it because it's somewhat complicated and does not fit my way of thinking. Part of that may be that it's not actually about units but about the ability to pass in custom data types that can somehow be mapped to numbers. |
I think we should swap in pint to not have to deal w/ this constantly breaking the docs. That lets us get away with writing a relatively small "QuantityConvertor" that demos the current mpl API, and we can leave everything else to ROSES. ETA @llimeht is there a reason pint would make things difficult for you?
I think this is a classic "why/how" meld - the units interface I think originated to handle physical units, but because that's implemented as custom datatypes the implementation supports any custom datatype. |
Can we please leave this as alone as we can get away with? It is complicated, but not that complicated and the integration with numpy is table stakes to actually be useful. pint (and I'll extrapolate to any unit library) brings its own complexity (e.g. it took them a while to get py313 support due to issues with frozen dataclasses) that will not actually eliminate this sort of breakage in the future. |
Just to answer the two questions that were asked and not to reopen the discussion:
If you're writing docs, we assume that users would want them. We ship man pages for programs and API docs for python modules. We not only ship
Build-dep cycles are more problematic as soon as we're talking about an extension module (C, C++, Fortran, cython, etc) - which of course lots of the scientific python ecosystem is. Navigating the circular build-deps requires additional work every time we want to update the Python interpreter version or a key dependency like numpy (where there was substantial API change for numpy C extensions). There are, of course, well-known approaches to working through bootstrapping problems and we can work through those procedures, but it's substantial manual work and if we can avoid it so that volunteer time is better spent, then that's a good outcome. Hope that helps understand my comment. Sorry that it blew up into a much bigger discussion! |
No apologies necessary! And thanks for the thorough explanation! Since pint is demonstrating their use case in their docs (sorry I didn't look first!) can we pull the docs that rely on I currently don't think we're getting the benefits of showing that conversion is possible: plt.plot(Distance(24.0, 'meter'), xunit='inches') because currently the code showing how to do that is pretty over engineered, and this could be done pretty simply if it's a reduced use case. |
Closes matplotlib#28780. The underlying problem is that operations on numpy scalars try to eagerly convert the other operand to an array. As a result `scalar = np .float64 (2); scalar * radians` would result in a numpy scalar. But we don't want that. Instead we enforce `radians.__rmul__(scalar)` by giving the unit a higher `__array_priority__`. See also https://github .com/numpy/numpy/issues/17650. I haven't found any specific change notes on this in numpy 2.1. Interestingly, the full story is even more complex. Also for numpy<2.1 `radians.__rmul__(scalar)` is not called, but there seems another mechanism through __array__ and __array_warp__ catching back in so that the result is again a TaggedValue. But I have not fully investigated why it worked previously. In fact, we want the solution here with going through __rmul__, and that works for all numpy versions. `
The doc build is failing with
We already pinned away from 2.1.0, but that does not protect against 2.1.1, which was just released.
#28779 pins to numpy 2.0. However, the above should be fixed.
The text was updated successfully, but these errors were encountered: