Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: Array copy does not zero out gaps in structured dtypes #10789

@ikehall

Description

@ikehall

Recently updated from numpy 1.11.0 to 1.14.0 and some of my code was immediately broken.

You can simulate the broken bits through the following

my_metadata_dtype = np.dtype({'names':['Thing1','Thing2'], 
                             'offsets':[22, 40], 
                             'formats':['>u2','>i4'], 
                             'itemsize'=256})
list_of_arrays = [np.arange(i, i+128, dtype='>i2') for i in range(5)]
array_of_metadata = np.array(list_of_arrays, dtype=my_metadata_dtype)
other_array_of_metadata = np.array(list_of_arrays)
other_array_of_metadata.dtype = my_metadata_dtype

One would expect that array_of_metadata and other_array_of_metadata should be the same thing.
They are not. I submit that the line creating array_of_metadata is buggy. I believe this produced the same as the lines that produce other_array_of_metadata in previous versions.

But that wasn't even the primary bug I was going to talk about. I discovered that one while hunting down this one.

Take our other_array_of_metadata:
Let's make a new array from a subset of this array (not just a view)

sub_array = np.array(other_array_of_metadata[2:4])
sub_array.dtype = '>i2'

You will see that our sub_array now has strange things in the spaces that were not exposed by our dtype. Not the nice aranges we had constructed before. In 1.11, this code would have preserved the data that was not exposed. Now it appears that the unexposed portions of sub-array are never initialized to anything. This matters because sometimes we want to only expose a part of our metadata to our code, but when we write the metadata back out to disk, want all of it preserved.

I can understand making a design choice to not initialize the unexposed portions of a dtype when constructing new elements from literals or from scratch. But when you already have an element of this dtype, ignoring the empty space seems like an inferior choice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions