Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH: add back the multifield copy->view change #12447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 1, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions doc/release/1.16.0-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,10 @@ Expired deprecations
* ``np.lib.function_base.unique`` was removed, finishing a deprecation cycle
begun in NumPy 1.4. Use `numpy.unique` instead.

* multi-field indexing now returns views instead of copies, finishing a
deprecation cycle begun in NumPy 1.7. The change was previously attempted in
NumPy 1.14 but reverted until now.

Compatibility notes
===================

Expand Down Expand Up @@ -113,6 +117,23 @@ Previously, only the ``dims`` keyword argument was accepted
for specification of the shape of the array to be used
for unraveling. ``dims`` remains supported, but is now deprecated.

multi-field views return a view instead of a copy
-------------------------------------------------
Indexing a structured array with multiple fields, e.g.,
``arr[['f1', 'f3']]``, returns a view into the original array instead of a
copy. The returned view will often have extra padding bytes corresponding to
intervening fields in the original array, unlike before, which will
affect code such as ``arr[['f1', 'f3']].view('float64')``. This change has
been planned since numpy 1.7 and such operations have emitted
``FutureWarnings`` since then and more since 1.12.

To help users update their code to account for these changes, a number of
functions have been added to the ``numpy.lib.recfunctions`` module which
safely allow such operations. For instance, the code above can be replaced
with ``structured_to_unstructured(arr[['f1', 'f3']], dtype='float64')``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be neat if it did produce a view, but right now this produces a copy not a view, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular example produces a copy, because even before this PR a copy was unavoidable.

However, it was actually your suggestion to rework the code so structured_to_unstructured returns a view in many cases, for instance structured_to_unstructured(arr[['f1', 'f2', 'f3']], dtype='float64') actually makes no copies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence should be reworked to avoid the word view though...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for instance structured_to_unstructured(arr[['f1', 'f2', 'f3']], dtype='float64') actually makes no copies.

Is that a statement about what actually happens in master, or just what I was suggesting?

Copy link
Member

@eric-wieser eric-wieser Nov 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was proposing something stronger, where the following would also produce a view:

  • structured_to_unstructured(arr[['f1', 'f3']], dtype='float64')
  • structured_to_unstructured(arr[['f3', 'f1']], dtype='float64')
  • structured_to_unstructured(arr[['f3', 'f2', 'f1']], dtype='float64')

Copy link
Member Author

@ahaldane ahaldane Nov 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity, let's say we're dealing with the array arr = np.ones(3, dtype='f8,f8,f8'). Then arr[['f0', 'f2']] returns a 24-byte structured array where each element is organized as FxF where F is 8 bytes of float memory, and x is 8 bytes of padding. The 3-element array as a whole is FxFFxFFxF (72 bytes).

Is it possible to view this as an unstructured float64 array with the right stride? I suppose this particular 3-field array might be viewed with shape (3, 2) and stride (24, 16). np.ndarray((3,2), strides=(24, 16), dtype='f8', buffer=arr) works.

But it seems it would be quite an involved computation to determine more generally whether appropriate strides exist, given arbitrary field offsets. Fo three fields it is always possible, but with 4 fields it is not, eg np.ones(3, 'f8,f8,f8,f8')[['f0', 'f1, 'f3']] cannot be viewed as an unstructured array.

My feeling is it's too difficult to do the calculation generally, it needs some complex gcd computation.

Copy link
Member

@eric-wieser eric-wieser Nov 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fo three fields it is always possible

Not if they're not contiguous. You're right that it's only possible in some special cases - but reshape can only avoid copies in some special cases, and we decided that was worth doing - so it might also be worthwhile here

But it seems it would be quite an involved computation to determine more generally whether appropriate strides exist, given arbitrary field offsets

The computation is straightforward - pseudo-code:

if len(offsets) == 0:
    stride = 0
    offset = 0
elif len(offsets) == 1:
    stride = 0
    offset = offsets[0]
else:
    offset = offsets[0]
    stride = offsets[1] - offsets[0]
    for i, o in enumerate(offsets[2:],2):
        if o != offset + stride * i:
            raise NotViewable
return offset, stride

Which should give enough information to be passed to as_strided somehow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should work, indeed. I would have to mostly rewrite structured_to_unstructured to account for it, and it seems like not enough of a good enough cost/benefit ratio for the effort, to me. I'd rather leave that to a separate PR if it's desired.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the contract of structured_to_unstructured allows us to make that change in a later release, I'm fine with not doing it now

See the "accessing multiple fields" section of the
`user guide <https://docs.scipy.org/doc/numpy/user/basics.rec.html>`__.


C API changes
=============
Expand Down
8 changes: 3 additions & 5 deletions numpy/core/src/multiarray/arrayobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -656,11 +656,9 @@ array_might_be_written(PyArrayObject *obj)
{
const char *msg =
"Numpy has detected that you (may be) writing to an array returned\n"
"by numpy.diagonal or by selecting multiple fields in a structured\n"
"array. This code will likely break in a future numpy release --\n"
"see numpy.diagonal or arrays.indexing reference docs for details.\n"
"The quick fix is to make an explicit copy (e.g., do\n"
"arr.diagonal().copy() or arr[['f0','f1']].copy()).";
"by numpy.diagonal. This code will likely break in a future numpy\n"
"release -- see numpy.diagonal docs for details. The quick fix is\n"
"to make an explicit copy (e.g., do arr.diagonal().copy()).";
if (PyArray_FLAGS(obj) & NPY_ARRAY_WARN_ON_WRITE) {
/* 2012-07-17, 1.7 */
if (DEPRECATE_FUTUREWARNING(msg) < 0) {
Expand Down
16 changes: 0 additions & 16 deletions numpy/core/src/multiarray/convert.c
Original file line number Diff line number Diff line change
Expand Up @@ -614,22 +614,6 @@ PyArray_View(PyArrayObject *self, PyArray_Descr *type, PyTypeObject *pytype)
}

dtype = PyArray_DESCR(self);

if (type != NULL && !PyArray_EquivTypes(dtype, type) &&
(PyArray_FLAGS(self) & NPY_ARRAY_WARN_ON_WRITE)) {
const char *msg =
"Numpy has detected that you may be viewing or writing to an array "
"returned by selecting multiple fields in a structured array. \n\n"
"This code may break in numpy 1.16 because this will return a view "
"instead of a copy -- see release notes for details.";
/* 2016-09-19, 1.12 */
if (DEPRECATE_FUTUREWARNING(msg) < 0) {
return NULL;
}
/* Only warn once per array */
PyArray_CLEARFLAGS(self, NPY_ARRAY_WARN_ON_WRITE);
}

flags = PyArray_FLAGS(self);

Py_INCREF(dtype);
Expand Down
52 changes: 4 additions & 48 deletions numpy/core/src/multiarray/mapping.c
Original file line number Diff line number Diff line change
Expand Up @@ -1388,55 +1388,15 @@ array_subscript_asarray(PyArrayObject *self, PyObject *op)
return PyArray_EnsureAnyArray(array_subscript(self, op));
}

/*
* Helper function for _get_field_view which turns a multifield
* view into a "packed" copy, as done in numpy 1.15 and before.
* In numpy 1.16 this function should be removed.
*/
NPY_NO_EXPORT int
_multifield_view_to_copy(PyArrayObject **view) {
static PyObject *copyfunc = NULL;
PyObject *viewcopy;

/* return a repacked copy of the view */
npy_cache_import("numpy.lib.recfunctions", "repack_fields", &copyfunc);
if (copyfunc == NULL) {
goto view_fail;
}

PyArray_CLEARFLAGS(*view, NPY_ARRAY_WARN_ON_WRITE);
viewcopy = PyObject_CallFunction(copyfunc, "O", *view);
if (viewcopy == NULL) {
goto view_fail;
}
Py_DECREF(*view);
*view = (PyArrayObject*)viewcopy;

/* warn when writing to the copy */
PyArray_ENABLEFLAGS(*view, NPY_ARRAY_WARN_ON_WRITE);
return 0;

view_fail:
Py_DECREF(*view);
*view = NULL;
return 0;
}

/*
* Attempts to subscript an array using a field name or list of field names.
*
* If an error occurred, return 0 and set view to NULL. If the subscript is not
* a string or list of strings, return -1 and set view to NULL. Otherwise
* return 0 and set view to point to a new view into arr for the given fields.
*
* In numpy 1.15 and before, in the case of a list of field names the returned
* view will actually be a copy by default, with fields packed together.
* The `force_view` argument causes a view to be returned. This argument can be
* removed in 1.16 when we plan to return a view always.
*/
NPY_NO_EXPORT int
_get_field_view(PyArrayObject *arr, PyObject *ind, PyArrayObject **view,
int force_view)
_get_field_view(PyArrayObject *arr, PyObject *ind, PyArrayObject **view)
{
*view = NULL;

Expand Down Expand Up @@ -1597,11 +1557,7 @@ _get_field_view(PyArrayObject *arr, PyObject *ind, PyArrayObject **view,
return 0;
}

/* the code below can be replaced by "return 0" in 1.16 */
if (force_view) {
return 0;
}
return _multifield_view_to_copy(view);
return 0;
}
return -1;
}
Expand Down Expand Up @@ -1629,7 +1585,7 @@ array_subscript(PyArrayObject *self, PyObject *op)
/* return fields if op is a string index */
if (PyDataType_HASFIELDS(PyArray_DESCR(self))) {
PyArrayObject *view;
int ret = _get_field_view(self, op, &view, 0);
int ret = _get_field_view(self, op, &view);
if (ret == 0){
if (view == NULL) {
return NULL;
Expand Down Expand Up @@ -1911,7 +1867,7 @@ array_assign_subscript(PyArrayObject *self, PyObject *ind, PyObject *op)
/* field access */
if (PyDataType_HASFIELDS(PyArray_DESCR(self))){
PyArrayObject *view;
int ret = _get_field_view(self, ind, &view, 1);
int ret = _get_field_view(self, ind, &view);
if (ret == 0){
if (view == NULL) {
return -1;
Expand Down
76 changes: 13 additions & 63 deletions numpy/core/tests/test_multiarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -4979,25 +4979,9 @@ def test_field_names(self):
fn2 = func('f2')
b[fn2] = 3

# In 1.16 code below can be replaced by:
# assert_equal(b[['f1', 'f2']][0].tolist(), (2, 3))
# assert_equal(b[['f2', 'f1']][0].tolist(), (3, 2))
# assert_equal(b[['f1', 'f3']][0].tolist(), (2, (1,)))
with suppress_warnings() as sup:
sup.filter(FutureWarning,
".* selecting multiple fields .*")

assert_equal(b[['f1', 'f2']][0].tolist(), (2, 3))
assert_equal(b[['f2', 'f1']][0].tolist(), (3, 2))
assert_equal(b[['f1', 'f3']][0].tolist(), (2, (1,)))
# view of subfield view/copy
assert_equal(b[['f1', 'f2']][0].view(('i4', 2)).tolist(),
(2, 3))
assert_equal(b[['f2', 'f1']][0].view(('i4', 2)).tolist(),
(3, 2))
view_dtype = [('f1', 'i4'), ('f3', [('', 'i4')])]
assert_equal(b[['f1', 'f3']][0].view(view_dtype).tolist(),
(2, (1,)))
assert_equal(b[['f1', 'f2']][0].tolist(), (2, 3))
assert_equal(b[['f2', 'f1']][0].tolist(), (3, 2))
assert_equal(b[['f1', 'f3']][0].tolist(), (2, (1,)))

# non-ascii unicode field indexing is well behaved
if not is_py3:
Expand All @@ -5007,50 +4991,6 @@ def test_field_names(self):
assert_raises(ValueError, a.__setitem__, u'\u03e0', 1)
assert_raises(ValueError, a.__getitem__, u'\u03e0')

# can be removed in 1.16
def test_field_names_deprecation(self):

def collect_warnings(f, *args, **kwargs):
with warnings.catch_warnings(record=True) as log:
warnings.simplefilter("always")
f(*args, **kwargs)
return [w.category for w in log]

a = np.zeros((1,), dtype=[('f1', 'i4'),
('f2', 'i4'),
('f3', [('sf1', 'i4')])])
a['f1'][0] = 1
a['f2'][0] = 2
a['f3'][0] = (3,)
b = np.zeros((1,), dtype=[('f1', 'i4'),
('f2', 'i4'),
('f3', [('sf1', 'i4')])])
b['f1'][0] = 1
b['f2'][0] = 2
b['f3'][0] = (3,)

# All the different functions raise a warning, but not an error
assert_equal(collect_warnings(a[['f1', 'f2']].__setitem__, 0, (10, 20)),
[FutureWarning])
# For <=1.12 a is not modified, but it will be in 1.13
assert_equal(a, b)

# Views also warn
subset = a[['f1', 'f2']]
subset_view = subset.view()
assert_equal(collect_warnings(subset_view['f1'].__setitem__, 0, 10),
[FutureWarning])
# But the write goes through:
assert_equal(subset['f1'][0], 10)
# Only one warning per multiple field indexing, though (even if there
# are multiple views involved):
assert_equal(collect_warnings(subset['f1'].__setitem__, 0, 10), [])

# make sure views of a multi-field index warn too
c = np.zeros(3, dtype='i8,i8,i8')
assert_equal(collect_warnings(c[['f0', 'f2']].view, 'i8,i8'),
[FutureWarning])

def test_record_hash(self):
a = np.array([(1, 2), (1, 2)], dtype='i1,i2')
a.flags.writeable = False
Expand All @@ -5074,6 +5014,16 @@ def test_empty_structure_creation(self):
np.array([(), (), (), (), ()], dtype={'names': [], 'formats': [],
'offsets': [], 'itemsize': 12})

def test_multifield_indexing_view(self):
a = np.ones(3, dtype=[('a', 'i4'), ('b', 'f4'), ('c', 'u4')])
v = a[['a', 'c']]
assert_(v.base is a)
assert_(v.dtype == np.dtype({'names': ['a', 'c'],
'formats': ['i4', 'u4'],
'offsets': [0, 8]}))
v[:] = (4,5)
assert_equal(a[0].item(), (4, 1, 5))

class TestView(object):
def test_basic(self):
x = np.array([(1, 2, 3, 4), (5, 6, 7, 8)],
Expand Down
1 change: 0 additions & 1 deletion numpy/core/tests/test_records.py
Original file line number Diff line number Diff line change
Expand Up @@ -379,7 +379,6 @@ def test_nonwriteable_setfield(self):
with assert_raises(ValueError):
r.setfield([2,3], *r.dtype.fields['f'])

@pytest.mark.xfail(reason="See gh-10411, becomes real error in 1.16")
def test_out_of_order_fields(self):
# names in the same order, padding added to descr
x = self.data[['col1', 'col2']]
Expand Down
Loading