Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@bmaranville
Copy link
Member

This PR changes:

  • in metadata for Dataset, Attribute, RegionReference
    • shape: number[] => shape: bigint[]
    • maxshape: number[] => maxshape: bigint[]
    • total_size: number => total_size: bigint
  • introduces a new function check_malloc(nbytes: number | bigint): number; that returns a pointer after checking
    • that it is not requesting more than the maximum memory available in the heap (2GB)
    • that the allocation was successful (malloc returns 0 if it fails)

Both of these changes should help when interacting with large datasets (> 2GB), and should address #111

You still won't be able to read such a dataset directly into memory without slicing (you'll run into the memory limit, but hopefully with a helpful error message), but hopefully you'll be able to slice such large datasets successfully now. It seems like it would have been difficult/impossible to address dataset regions with offsets > 2GB in the slice function, before these changes, e.g. calculating the offset and strides might have failed.

@bmaranville bmaranville requested a review from axelboc December 16, 2025 16:39
@bmaranville
Copy link
Member Author

I made a big file to test slicing like this:

import numpy as np
import h5py

N = 1000
data = np.empty((N, N, N), dtype="float32")
x = np.arange(N, dtype="float32")
y = x * N
z = x / N

data += z[None, None, :]
data += y[None, :, None]
data += x[:, None, None]

with h5py.File("big.h5", "w") as output:
    output.create_dataset("data", data=data)

The resulting file is 3.8 GiB, and I am able to slice it without problems. I think slicing would have worked fine for very large files even without this change, as the Javascript MAX_SAFE_INTEGER is huge already $(2^{53} - 1)$.

We could probably use the plain Javascript Number class for shape, maxshape and total_size, so I'm considering dropping this PR in favor of a simpler one that just converts the 64-bit integer outputs for shape on the C side to JS Number outputs in the metadata, but without first truncating them to 32-bit C int (which was the source of the overflow in #111 in the first place)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants