Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Numpy2 breaks getting a list of supported dtypes #26778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adrinjalali opened this issue Jun 21, 2024 · 8 comments
Closed

Numpy2 breaks getting a list of supported dtypes #26778

adrinjalali opened this issue Jun 21, 2024 · 8 comments
Labels
33 - Question Question about NumPy usage or development component: numpy.dtype

Comments

@adrinjalali
Copy link
Contributor

adrinjalali commented Jun 21, 2024

numpy2 has removed np.sctypes, which we were using to get a list of all supported dtypes.

Here's a code which we were using for this:

import numpy as np

# whichmodule is vendored from python's pickle module
def whichmodule(obj, name):
    """Find the module an object belong to."""
    module_name = getattr(obj, "__module__", None)
    if module_name is not None:
        return module_name
    # Protect the iteration by using a list copy of sys.modules against dynamic
    # modules that trigger imports of other modules upon calls to getattr.
    for module_name, module in sys.modules.copy().items():
        if (
            module_name == "__main__"
            or module_name == "__mp_main__"  # bpo-42406
            or module is None
        ):
            continue
        try:
            if _getattribute(module, name)[0] is obj:
                return module_name
        except AttributeError:
            pass
    return "__main__"


def get_type_name(t):
    """Helper function to take in a type, and return its name as a string"""
    return f"{whichmodule(t, t.__name__)}.{t.__name__}"

NUMPY_DTYPE_TYPE_NAMES = sorted(
    {
        type_name
        for dtypes in np.sctypes.values()
        for dtype in dtypes  # type: ignore
        if (type_name := get_type_name(dtype)).startswith("numpy")
    }
)

print(NUMPY_DTYPE_TYPE_NAMES)

This is now broken, and the message redirects users to np.dtypes, but I'm not sure if there's a way to get all the names of supported dtypes. The above code on my machine on numpy<2 would give this:

['numpy.clongdouble', 'numpy.complex128', 'numpy.complex64', 'numpy.float16', 'numpy.float32', 'numpy.float64', 'numpy.int16', 'numpy.int32', 'numpy.int64', 'numpy.int8', 'numpy.longdouble', 'numpy.uint16', 'numpy.uint32', 'numpy.uint64', 'numpy.uint8', 'numpy.void']

So the question is, how can I reproduce this with numpy2?

For context, skops' persistence is where it's used.

@melissawm melissawm added 33 - Question Question about NumPy usage or development component: numpy.dtype labels Jun 21, 2024
@matthew-brett
Copy link
Contributor

Actually - we decided to put this back with a hard-coded scalar types list for our Nipy / Numpy 2.0 update : https://github.com/matthew-brett/nipy/blob/numpy-20-fixes/nipy/utils/__init__.py#L58

It seems we both have the same use-case - so I too would like to know how best to do this...

@ngoldbaum
Copy link
Member

We had to leave in np.sctypesDict, so you can still use that to reconstruct the scalar types:

In [2]: set(list(np.sctypeDict.values()))
Out[2]:
{numpy.bool,
 numpy.bytes_,
 numpy.clongdouble,
 numpy.complex128,
 numpy.complex64,
 numpy.datetime64,
 numpy.float16,
 numpy.float32,
 numpy.float64,
 numpy.int16,
 numpy.int32,
 numpy.int64,
 numpy.int8,
 numpy.longdouble,
 numpy.longlong,
 numpy.object_,
 numpy.str_,
 numpy.timedelta64,
 numpy.uint16,
 numpy.uint32,
 numpy.uint64,
 numpy.uint8,
 numpy.ulonglong,
 numpy.void}

Note however that this can also include other globally known dtypes inserted by other packages that have been imported, see e.g. #24699.

In the long run you'll be able to do this by looking at items in the np.dtypes namespace but that namespace is new so you won't be able to rely on it in old versions of numpy for a few years.

@matthew-brett
Copy link
Contributor

Aha - so would we get closer with something like:

[t for t in set(np.sctypeDict.values()) if t.__module__ == 'numpy']

?

In our case, we also sometimes need to type (int, float etc) - but aren't we going back and rewriting np.sctypes at that point?

@ngoldbaum
Copy link
Member

We could bring it back, but one of the goals of the API cleanup work (see NEP 52) was to reduce the number of cryptic ways to access or refer to dtypes.

One issue is that sctypes is the list of numpy scalar types, not the list of dtypes. Sometimes that distinction doesn't matter, sometimes it does.

Another problem with expecting NumPy to have a statically specified list of DTypes that it supports is that makes your code brittle when NumPy adds new dtypes in the future or when users pass in custom dtypes.

I'll also let @rgommers and @seberg chime in since I'm sure they have opinions.

@rgommers
Copy link
Member

I don't have much to add yet. In general we cleaned up a lot of duplicate ways of doing and accessing things. It seems like what you want is still available, and more complete, through np.sctypeDict.values(). So use that?

The if t.__module__ == 'numpy' seems conceptually wrong by the way - if you want dtypes, you want them all normally I'd think. E.g., if a bfloat16 is registered and users can create an ndarray with that dtype, that either is irrelevant to your use case or you need to support it - but I don't see a reason to filter out bfloat16 explicitly.

@adrinjalali
Copy link
Contributor Author

So at least for my usecase, the bfloat16 is exactly the kinda thing we'd need to exclude, since it's not about loading / creating arrays with that type, and rather about not warning users if the data they're loading has some numpy scalar types; we still do want to raise if there are non-numpy types in the data to be loaded.

So np.sctypeDict.values() does indeed solve the issue (with the added module check).

Thank y'all.

@matthew-brett
Copy link
Contributor

Just to give an idea our our use-case : https://github.com/matthew-brett/nipy/blob/numpy-20-fixes/nipy/core/reference/coordinate_system.py#L119 - we check whether the dtype is among the dtypes known to be supported.

@rgommers
Copy link
Member

Just to give an idea our our use-case : https://github.com/matthew-brett/nipy/blob/numpy-20-fixes/nipy/core/reference/coordinate_system.py#L119 - we check whether the dtype is among the dtypes known to be supported.

Seems like you want to allow some subset of all built-in numpy dtypes. So hardcoding them as done at https://github.com/matthew-brett/nipy/blob/66e5c524cd33a348f0ff8fd9a10f42f063112569/nipy/utils/__init__.py#L58-L62 seems correct. numpy.sctypes never gave you the guarantee that you seem to want/need I think. E.g., you want floating-point dtypes here up to float64 but not the long double ones.

SimonSegerblomRex added a commit to SimonSegerblomRex/pylibtiff that referenced this issue Jun 28, 2024
SimonSegerblomRex added a commit to SimonSegerblomRex/pylibtiff that referenced this issue Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
33 - Question Question about NumPy usage or development component: numpy.dtype
Projects
None yet
Development

No branches or pull requests

6 participants
@matthew-brett @rgommers @adrinjalali @ngoldbaum @melissawm and others