Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Doc build fails with numpy>=2.1.0 #28780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jklymak opened this issue Sep 3, 2024 · 16 comments · Fixed by #29616
Closed

Doc build fails with numpy>=2.1.0 #28780

jklymak opened this issue Sep 3, 2024 · 16 comments · Fixed by #29616
Assignees
Labels
Milestone

Comments

@jklymak
Copy link
Member

jklymak commented Sep 3, 2024

The doc build is failing with

File "/home/circleci/project/galleries/examples/units/radian_demo.py", line 26, in <module>
        axs[0].plot(x, cos(x), xunits=radians)
                       ^^^^^^
      File "/home/circleci/project/galleries/examples/units/basic_units.py", line 382, in cos
        return [math.cos(val.convert_to(radians).get_value()) for val in x]
                         ^^^^^^^^^^^^^^
    AttributeError: 'numpy.float64' object has no attribute 'convert_to'

We already pinned away from 2.1.0, but that does not protect against 2.1.1, which was just released.

#28779 pins to numpy 2.0. However, the above should be fixed.

@QuLogic QuLogic changed the title Doc build fails with numpy>2.1.0 Doc build fails with numpy>=2.1.0 Sep 5, 2024
@llimeht
Copy link

llimeht commented Jan 28, 2025

It's not a proper fix for the problem, but the following patch does permit the documentation to be built with numpy 2.2 (as is needed right now in Debian). It's not obvious to me why the existing definitions of __mul__ and __rmul__ don't cover this, but perhaps this hint will be enough for someone else!

diff --git a/galleries/examples/units/radian_demo.py b/galleries/examples/units/radian_demo.py
index 492a1e97..fb32d27f 100644
--- a/galleries/examples/units/radian_demo.py
+++ b/galleries/examples/units/radian_demo.py
@@ -19,7 +19,7 @@ from basic_units import cos, degrees, radians
 import matplotlib.pyplot as plt
 import numpy as np
 
-x = [val*radians for val in np.arange(0, 15, 0.01)]
+x = [radians * val for val in np.arange(0, 15, 0.01)]
 
 fig, axs = plt.subplots(2)

@timhoffm
Copy link
Member

timhoffm commented Feb 13, 2025

Minimal reproducing example (using basic_units.py):

from basic_units import radians

coeff = np.float64(0.5)

print(repr(coeff * radians))  # -> np.float64(0.5) !!!
print(repr(radians * coeff))  # -> TaggedValue(np.float64(0.5), BasicUnit(rad))

It seems the culprit is the existence of the following function in BasicUnit

    def __array__(self, t=None, context=None, copy=False):
        print(t)
        ret = np.array(1)
        if t is not None:
            return ret.astype(t)
        else:
            return ret

I suspect there's a change of precendence how the multiplication coeff * obj of a numpy scalar coeff and an object is handled.

  • Previously called obj.__rmul__(coeff)
  • But now it seems coeff * obj.__array__(...)

is preferred. T.b.d. whether this change in numpy is intended. The other question is why the BasicUnit must be convertible to an array.


Edit: It must be a bit more complicated than this. The minimal example

import numpy as np

class Vector:
    def __init__(self, components):
        self.components = components

    def __repr__(self):
        return f"Vector({self.components!r})"

    def __rmul__(self, coeff):
        return Vector([coeff * c for c in self.components])
    
    def __array__(self, dtype=None, copy=None):
        return np.array(self.components)


print(np.float64(2) * Vector([1, 2, 3]))

returns a pure numpy array for numpy 2.2, 2.0 and 1.26. So there's some additional magic involved in getting BasicUnit back in earlier versions.

Semi-OT: What is basic_units.py for? Do we need it? It was added early back in 2007. It seems overly complicated for showing that matplotlib can interact with custom types. This is highly non-trivial code and it's completely without explanation. Do we have somebody who understands it / is willing to dig into it?

@story645
Copy link
Member

story645 commented Feb 14, 2025

What is basic_units.py for? Do we need it? It was added early back in 2007. It seems overly complicated for showing that matplotlib can interact with custom types.

Far as I can tell, it's a toy units library:

  • it predates unit libraries like pint
  • it's got a mini parser to support doing arbitrary assignment:
    • cm * np.arange(0, 10, 2)
    • [val*radians for val in np.arange(0, 15, 0.01)]
  • and to support math on unitized data objects,
  • BasicUnits is also trying to support semi arbitrary units conversion -> inches/cm, radians/degrees, secs/herz/minutes, etc
    • some of these conversion factors can be lambdas

And BasicUnitsConvertor is also written so that the plotting function is doing the math to convert between units:

For example, the inches/centimeter

cm = BasicUnit('cm', 'centimeters')
inch = BasicUnit('inch', 'inches')
inch.add_conversion_factor(cm, 2.54)
cm.add_conversion_factor(inch, 1/2.54)

is used to support matplotlib doing the internal cm to inches conversion via the unit argument:

data is in centimeters

cms = cm * np.arange(0, 10, 2)

plot centimeters

axs[0, 0].plot(cms, cms)

matplotlib will convert the data to inches for the y axis:

axs[0, 1].plot(cms, cms, xunits=cm, yunits=inch)

Also I totally think this should maybe be a tutorial and not a gallery entry. And far as I can tell is more supposed to demo the units functionality than explain how to write units.

I didn't use it when I was learning the units pipeline - I think the datetime module is much easier to follow and there's also a cleaner unit implementation in test units. evan's test is I think a reasonable example and should maybe be renamed/reworked as "custom datatype" or something.

Eta: also, if we want to keep the units examples working, maybe replacing basic_units.py with a doc dependency on a units library would make more sense?

@llimeht
Copy link

llimeht commented Feb 14, 2025

Eta: also, if we want to keep the units examples working, maybe replacing basic_units.py with a doc dependency on a units library would make more sense?

Perhaps, but a little caution from the downstream perspective: when adding new build-dependencies, it is always very much appreciated if we could avoid creating circular build dependencies. They lead to a lot of extra complexity and manual steps for those who build modules and docs from source for redistribution, particularly when navigating Python interpreter and API bumps. It's obviously easier if we don't care about docs or tests, but we do :) Bootstrapping numpy2 with matplotlib/scipy/xarray/pandas/numpy build dependency cycles over the last month has been quite awkward, for example.

@story645
Copy link
Member

, it is always very much appreciated if we could avoid creating circular build dependencies

How would adding a units library create circular build dependencies?

I'm also wondering if we could do like website only docs and dependencies/docs we only build if the dependencies are available (like we already do for tests).

Mostly, it's just that we seem to have issues with basic_units breaking the build relatively frequently and I think maintaining a mini units library should be somewhat out of scope for Matplotlib maintainers now that units libraries exist.

There is also the option of pulling these examples out since internally the units framework is mostly used for data types and not units, but it'd be nice to show that the units framework works with units libraries.

@jklymak
Copy link
Member Author

jklymak commented Feb 14, 2025

They lead to a lot of extra complexity and manual steps for those who build modules and docs from source for redistribution, particularly when navigating Python interpreter and API bumps. It's obviously easier if we don't care about docs or tests, but we do :)

Forgive my ignorance, but why would downstream build our docs? I don't think we put much effort into thinking about anyone other than our developers building and deploying our docs, so understanding the use-case would be helpful.

I don't think developing a doc dependency on something like pint would be a huge problem, other than perhaps pint is perhaps not how unit support will eventually be supported on arrays? pint is pretty low impact.

@story645
Copy link
Member

story645 commented Feb 14, 2025

I don't think developing a doc dependency on something like pint would be a huge problem, other than perhaps pint is perhaps not how unit support will eventually be supported on arrays?

I think arrays need both these things that the units framework (which I think is kinda badly named) is for:

  • custom datatypes - datetime, categoricals
  • physical units - pint, astropy.units, metpy.units

I suggested pint b/c I don't know the units library landscape all that well and we almost definitely don't want to use the units from downstream libraries (which metpy is using pint anyway).

@timhoffm
Copy link
Member

I think arrays need both these things that the units framework (which I think is kinda badly named) is for:

custom datatypes - datetime, categoricals
physical units - pint, astropy.units, metpy.units

Isn't that the same? Not a units expert, but aren't they wrapping the original numbers in custom datatypes?

@story645
Copy link
Member

story645 commented Feb 14, 2025

Isn't that the same? Not a units expert, but aren't they wrapping the original numbers in custom datatypes?

Yeah, I was differentiating on the UX but it's just meters(24.0) vs 24.0 * ureg.meter. Which, another way to strip out the toy units library of basic_units would be to:

  1. break up each unit set into it's own dataclass: distance, degrees, frequency
  2. put the conversion factors on the classes
  3. write a convertor for each class, especially since the current implementation is doing per type dispatching anyway
    3. or stash the formatter info on the data class so the convertor can be a generic QuantityConvertor
  4. do explicit object casting before the plot call so that the unit classes don't have to do any math except conversionso something like plt.plot(Distance(24.0, 'meter'), xunit='inches') but that would kill the demoing of it working nicely with like astropy.units.

But also pint is a relatively small and very widely used library and I like demoing that matplotlib supports both variants.

@timhoffm
Copy link
Member

I'm fine with either: a small custom class or using pint. The problem with basic_unit is that it's a custom full united data with a lot of bells and whistles - that's quite hard to understand. Either custom small or "well known" full units are much less confusing.

@story645
Copy link
Member

story645 commented Feb 14, 2025

Forgot about it b/c it's a little more complicated but not by much is the QuantityClass we've already got in the tests

But the more I think about it, the less I think it's appropriate as a gallery example b/c it gets weedy. "How to write units and convertors" is a full on tutorial topic that should probably wait on ROSES.

@timhoffm
Copy link
Member

I'm fine with defering everything to ROSES.

units framework (which I think is kinda badly named)

I fully agree. I've mostly stayed away from it because it's somewhat complicated and does not fit my way of thinking. Part of that may be that it's not actually about units but about the ability to pass in custom data types that can somehow be mapped to numbers.

@story645
Copy link
Member

story645 commented Feb 14, 2025

I'm fine with defering everything to ROSES.

I think we should swap in pint to not have to deal w/ this constantly breaking the docs. That lets us get away with writing a relatively small "QuantityConvertor" that demos the current mpl API, and we can leave everything else to ROSES.

ETA @llimeht is there a reason pint would make things difficult for you?

Part of that may be that it's not actually about units but about the ability to pass in custom data types that can somehow be mapped to numbers.

I think this is a classic "why/how" meld - the units interface I think originated to handle physical units, but because that's implemented as custom datatypes the implementation supports any custom datatype.

@tacaswell
Copy link
Member

Can we please leave this as alone as we can get away with? It is complicated, but not that complicated and the integration with numpy is table stakes to actually be useful.

pint (and I'll extrapolate to any unit library) brings its own complexity (e.g. it took them a while to get py313 support due to issues with frozen dataclasses) that will not actually eliminate this sort of breakage in the future.

@llimeht
Copy link

llimeht commented Feb 15, 2025

Just to answer the two questions that were asked and not to reopen the discussion:

Forgive my ignorance, but why would downstream build our docs?

If you're writing docs, we assume that users would want them. We ship man pages for programs and API docs for python modules. We not only ship python3-matplotlib but also python-matplotlib-doc (other distributors will do the same modulo naming conventions). By doing so, users get the API docs that match their installation with no effort and can use them offline (I've used them more than once on a plane...). To be self-hosting, we have to build the docs; we often find build-failures in the docs themselves that upstreams have missed and that's one part of our contribution to the ecosystem.

ETA @llimeht is there a reason pint would make things difficult for you?

pint needs matplotlib to build its docs, so that would introduce another circular build dependency. When all the packages involved are pure Python packages we can often get away with that, as long as everyone doesn't go API-breaking-mad at the same time. If we ended up in a situation where matplotlib needed a newer pint to build than was currently available, and pint needed a newer matplotlib to build than was currently available, we would have an issue..

Build-dep cycles are more problematic as soon as we're talking about an extension module (C, C++, Fortran, cython, etc) - which of course lots of the scientific python ecosystem is. Navigating the circular build-deps requires additional work every time we want to update the Python interpreter version or a key dependency like numpy (where there was substantial API change for numpy C extensions).

There are, of course, well-known approaches to working through bootstrapping problems and we can work through those procedures, but it's substantial manual work and if we can avoid it so that volunteer time is better spent, then that's a good outcome.

Hope that helps understand my comment. Sorry that it blew up into a much bigger discussion!

@story645
Copy link
Member

story645 commented Feb 16, 2025

Sorry that it blew up into a much bigger discussion!

No apologies necessary! And thanks for the thorough explanation!

Since pint is demonstrating their use case in their docs (sorry I didn't look first!) can we pull the docs that rely on basic_unit and just cross ref to the pint example in the evans example?

I currently don't think we're getting the benefits of showing that conversion is possible:

plt.plot(Distance(24.0, 'meter'), xunit='inches')

because currently the code showing how to do that is pretty over engineered, and this could be done pretty simply if it's a reduced use case.

@QuLogic QuLogic added this to the v3.10.1 milestone Feb 16, 2025
prafulgulani pushed a commit to prafulgulani/matplotlib that referenced this issue Feb 22, 2025
Closes matplotlib#28780.

The underlying problem is that operations on numpy scalars try to
eagerly convert the other operand to an array. As a result `scalar = np
.float64
(2); scalar * radians` would result in a numpy scalar. But we don't want
 that.
Instead we enforce `radians.__rmul__(scalar)` by giving the unit a
higher
`__array_priority__`. See also https://github
.com/numpy/numpy/issues/17650.

I haven't found any specific change notes on this in numpy 2.1.
Interestingly, the full story is even more complex. Also for numpy<2.1
`radians.__rmul__(scalar)` is not called, but there seems another
mechanism through __array__ and __array_warp__ catching back in so that
the result is again a TaggedValue. But I have not fully investigated why
 it worked previously. In fact, we want the solution here with going
 through __rmul__, and that works for all numpy versions.

`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants