Quick sketch of bfloat16 support #1

nenb · 2025-01-07T13:57:38Z

[Description of PR]

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/tutorial.rst
Changes documented in docs/release.rst
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

nenb · 2025-01-07T14:16:45Z

src/zarr/core/common.py

 if TYPE_CHECKING:
    from collections.abc import Awaitable, Callable, Iterator

+try:


Minor query
I wasn't sure where to import this. Given that it hooks into numpy to extend its dtype support, it's likely in the same location where numpy is first imported by zarr.

Where is numpy first imported? 😓

good question, I have no idea.

nenb · 2025-01-07T15:20:22Z

src/zarr/core/metadata/v3.py

            DataType.float16: "f2",
            DataType.float32: "f4",
            DataType.float64: "f8",
+            DataType.bfloat16: "bfloat16",


This will require some input.

The numpy kind codes are not extensible. In particular a lot of the dtypes added by ml_dtypes will not have unique kind codes and/or codes that numpy can interpret. See jax-ml/ml_dtypes#41 for a MRE.

Is there any special need to tie the logic here to numpy kind codes? As an example, numpy will also recognise np.dtype('int16') in the same way as np.dtype('i2').

I think this could be the biggest decision required before support for other dtypes is possible.

I don't think we are tied to the numpy kind codes here. That's a numpy thing, not a zarr thing. We just need to ensure that the zarr dtypes can be unambiguously interpreted by zarr-python to make numpy / cupy arrays that can represent the underlying data. But anchoring our string representations to numpy is useful, because it ties us to something ~standard-ish. so I agree that we should tread carefully here, and this will definitely require input from the broader community... but that shouldn't stop the implementation

nenb · 2025-01-07T16:13:42Z

pyproject.toml

 gpu = [
    "cupy-cuda12x",
 ]
+ml-dtypes = [


A couple of comments.

ml_dypes makes a bunch of extra types available: https://github.com/jax-ml/ml_dtypes#ml_dtypes.

I think some of these like bfloat16 are already widely used and probably fine to adopt immediately.

Others like int2 and int4 subject to certain limitations with numpy. Details are available in the README, but roughly, some or all of the bits are padded with zeros to make them a byte in memory. In practice, I don't think this will be a useful representation for a decent chunk of the community, and so I wouldn't necessarily want to make it the default representation.

Should just a subset of these dtypes be integrated in zarr-python?

although we are pretty numpy-centric, zarr-python has been designed to be device agnostic -- we have infrastructure in place for returning gpu-backed arrays via cupy, for example. so if these narrow ints are represented sub-optimally in numpy, but they have a performant representation on the gpu, I think that's actually OK for us. In other words, "kind of useless in CPU memory, but correct" seems OK to me. we just need to make sure that the stored representation (what zarr-python stores) is what people will expect. So I would say let's be greedy and try to support everything

nenb · 2025-01-07T16:23:57Z

src/zarr/core/metadata/v3.py

-            "<c16": "complex128",
-        }
-        return DataType[dtype_to_data_type[dtype.str]]
+        elif  'v' not in dtype.str.lower():


Same as previous comment.

This is very ugly proxy for the new dtypes. If we were'nt tied to the numpy kind codes, then I think all of this would be a lot cleaner.

nenb · 2025-01-07T16:24:42Z

src/zarr/core/metadata/v3.py

+                "<c16": "complex128",
+            }
+            return DataType[dtype_to_data_type[dtype.str]]
+        return DataType[dtype.name]


Since the new dtypes don't have unique kind codes, I am currently using their name (which numpy recognises) to instantiate the dtype.

Quick sketch of bfloat16 support

b223ac2

nenb commented Jan 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quick sketch of bfloat16 support #1

Quick sketch of bfloat16 support #1

Uh oh!

nenb commented Jan 7, 2025

Uh oh!

nenb Jan 7, 2025 •

edited

Loading

Uh oh!

d-v-b Jan 7, 2025

Uh oh!

nenb Jan 7, 2025

Uh oh!

d-v-b Jan 7, 2025

Uh oh!

nenb Jan 7, 2025

Uh oh!

d-v-b Jan 7, 2025

Uh oh!

nenb Jan 7, 2025

Uh oh!

nenb Jan 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Quick sketch of bfloat16 support #1

Are you sure you want to change the base?

Quick sketch of bfloat16 support #1

Uh oh!

Conversation

nenb commented Jan 7, 2025

Uh oh!

nenb Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d-v-b Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

nenb Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

nenb Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

nenb Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

nenb Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nenb Jan 7, 2025 •

edited

Loading