ENH: fix str/repr for 0d-arrays and int* scalars #8983

ahaldane · 2017-04-24T21:18:26Z

0d arrays now respect the printoptions:

Before this PR:

>>> np.set_printoptions(formatter={'all': lambda x: "test"})
>>> array(1)
array(1)

With this PR:

>>> np.set_printoptions(formatter={'all': lambda x: "test"})
>>> array(1)
array(test)

This PR also cleans up a lot of array2string, and eliminates a lot of unnecessary evaluated code. In particular, the IntegerFormat and other constructors are now only called if the array has the corresponding type. Before, all the constructors for every type were called every time array2string was called.

ahaldane · 2017-04-24T21:19:32Z

I've also left in a large comment with an alternate implementation for your consideration, which is not back-compatible. Currently, no user-specified format functions have access to precision, suppress_small and so on, that extra code would make them accessible.

When it comes time to merge I will remove that comment and one or two others.

eric-wieser · 2017-04-24T21:19:40Z

In particular, the IntegerFormat and other constructors are now only called if the array has the corresponding type.

This is also fixed in #8963 - but hackily, so probably in need of a rebase

eric-wieser · 2017-04-24T21:23:29Z

Actually, I think lazy_get is a little more opaque than the lambda stuff I do in #8963 - what do you think?

Specifically, in a2aea77

ahaldane · 2017-04-24T21:23:50Z

Oh yeah, one more thing. I deprecated the style argument of array2string.

eric-wieser · 2017-04-24T21:26:15Z

numpy/core/arrayprint.py

    elif issubclass(dtypeobj, _nt.integer):
        if issubclass(dtypeobj, _nt.timedelta64):
-            return formatdict['timedelta']
+            return _lazyget(formatter, 'timedelta', TimedeltaFormat, data)


I think this would be more obvious as _lazyget(formatter, 'timedelta', lambda: TimedeltaFormat(data))

In fact, could this be written as formatter.get('timedelta') or TimedeltaFormat(data)?

eric-wieser · 2017-04-24T21:28:31Z

numpy/core/arrayprint.py

+             'str_kind': ['numpystr', 'str'] }
+
+    formatdict = {}
+    for tp, fmtfunc in six.iteritems(formatter):


Do we have six available? Also, .items() would work just fine here

ahaldane · 2017-04-24T21:30:56Z

Yeah the lambda version is pretty nice.

Maybe we can combine the lambdas for the raw types, plus the _expand_formatter_kinds from here.

(Though now that I think about it, my _expand_formatter_kinds slighly changes the behavior since now the *kinds don't show up in get_printoptions)

eric-wieser · 2017-04-24T21:32:01Z

Do you want to split your PRs one last time, and make the arrayprint constructor fixes without the 0d case?

I think I preferred it when gentype_repr did an array promotion - and if that does lead to recursion, we can handle it in #8963 without hanging.

But clearly we need a fix to the formatters for either solution, so it'd be good to merge that first

ahaldane · 2017-04-24T21:44:12Z

Sure, let me review your PR.

eric-wieser · 2017-04-25T16:11:26Z

numpy/core/src/multiarray/scalartypes.c.src

    }
-    return ret;
+    return PyObject_CallFunction(fallback, "O", self);


What happens if we leave this as it was before?

I haven't actually checked, but I am 95% sure we will get infinite recursions because of tests which do things like

>>> np.set_printoptions(formatter={'all':lambda x: str(x)})

we will get infinite recursions

Hopefully they should now be bounded - that was why wanted that other commit merged

That doesn't seem sane as a formatter though anyway - "Make the __str__ for arrays to call str(self)" is obviously very wrong.

Really, that should be written np.set_printoptions(formatter={'all':lambda x: str(x.item())})

#8963 doesn't protect against this kind of recursion (I noted that there).

If we disallow this kind formatter, though, it will be a backwards-compatibility break, which I was trying to avoid.

This formatter makes sense if you assume a clear division between array reprs and scalar reprs. In that case, if x is a scalar, then there shouldn't be any recursion problem.

The recursion problem only arises because we happened to implement a small number of scalar reprs by calling promoting to 0d and calling the array repr.

Also, note the comment I added at the top of the file:

Both scalartypes.c.src and arrayprint.py implement reprs for scalars, but for different purposes. scalartypes.c.src has reprs for when the scalar is printed on its own, while arrayprint.py has reprs for when scalars are printed inside an ndarray.

This is quite apparent in datetime64, for example:

>>> a = np.datetime64('2005-02-25') >>> a numpy.datetime64('2005-02-25') >>> np.array([a]) array(['2005-02-25'], dtype='datetime64[D]')

protect against this kind of recursion (I noted that there).

You noted that it doesn't protect again str(np.array(x)) - although you're right, I guess that is what is happening here.

This is quite apparent in datetime64, for example:

With this patch, what does np.datetime64('2005-02-25')[...] give a repr of?

(And as a follow up, can a test be added for that new repr)

It gives

>>> repr(np.datetime64('2005-02-25')[...]) "array('2005-02-25', dtype='datetime64[D]')"

(it used to give "array(datetime.date(2005, 2, 25), dtype='datetime64[D]')")

eric-wieser · 2017-04-25T16:12:44Z

numpy/core/arrayprint.py

+        format_function = _get_format_function(
+                a.reshape((1,)), precision, suppress_small, formatter)
+        lst = format_function(a[()])
+    elif reduce(product, a.shape) == 0:


Needs a rebase, I think - this won't work well with changes merged from #8963

ahaldane · 2017-04-25T16:12:48Z

Updated with only the 0d array fix.

I split off my arrayprint cleanup to a branch in my repository here. I can still submit it if anyone wants, but it's not really necessary because of #8963.

eric-wieser · 2017-04-25T16:16:15Z

numpy/core/arrayprint.py

-    elif functools.reduce(product, a.shape) == 0:
+        format_function = _get_format_function(
+                a.reshape((1,)), precision, suppress_small, formatter)
+        lst = format_function(a[()])


This is starting to feel like the job of _array2string, in an if rank == 0 case

I think you mean _formatArray, but yes

eric-wieser · 2017-04-25T18:23:30Z

numpy/core/src/multiarray/scalartypes.c.src

    }
-    return ret;
+    return PyObject_CallFunction(fallback, "O", self);


Proposal: return PyObject_Repr( self.item() ) here instead? Since this is a fallback anyway...

Not that I disagree with how you currently have this.

what about something like

"{}({})".format(x.dtype.name, str(x))

Ooh, that's pretty nice - with x = self.item(), I assume.

Also, I think that should always call repr, not str, if it intends to put it inside the parentheses

eric-wieser · 2017-04-25T18:23:56Z

numpy/core/src/multiarray/scalartypes.c.src

-
-    arr = PyArray_FromScalar(self, NULL);
-    if (arr != NULL) {
-        /* XXX: Why are we using str here? */


This comment needs to be kept in some form - because we still have the same code path for both str and repr

ahaldane · 2017-05-17T15:34:19Z

Great reviewing, @eric-wieser

ahaldane · 2017-05-17T18:24:48Z

Just want to note the consequences of this PR can be seen in astropy/astropy#6090. It looks like some of their string representations have changed slightly.

mhvk · 2017-05-18T01:23:42Z

Indeed, this caused quite a few test failures in astropy; these things are somewhat of a pain with doctests. We had a similar problem when the PR that made structured arrays respect the print options was merged, but, like for that one, I think it is well worth it: scalars should not be typeset differently from arrays!

ahaldane · 2017-05-18T03:26:02Z

Looking at the fixes, I actually don't very much like the extra spaces this PR added in many arrays right after the ( (which curiously didn't show up in any numpy tests):

>>> np.array(1.0)
array( 1.)
>>> np.array(True)
array( True, dtype=bool)

(no spaces are added for integer types though)

I kind of want to fix that... I'll take a look.

ahaldane · 2017-05-18T03:50:16Z

Does anyone know why the default is to print float arrays with two spaces? Ie,

>>> np.arange(10.)
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

instead of

>>> np.arange(10.)
array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

?

It's easy to remove the space (just change a 2 to a 1 here) and the few tests that break don't seem to explain the double space.

mhvk · 2017-05-18T13:01:02Z

I think it just leaves space for the sign, i.e., if you have negative numbers in your example, things will line up. For the same reason, False does not get an extra space, unlike True:

In [2]: np.array(True), np.array(False)
Out[2]: (array( True, dtype=bool), array(False, dtype=bool))

Now, arguably, since the array printing code already looks at the number of trailing digits that are required, it might as well also look at whether the space for the sign is required.

mhvk · 2017-05-18T15:03:27Z

@ahaldane - do you think you are going to change it? I have an astropy PR that changes our tests for the new repr (as we try to ensure numpy-dev passes our tests), which I will hold off on if you are. For what it is worth, my sense would be to remove the extraneous space if there are no negative numbers (which means more changes in astropy, but so be it...) and similarly for True.

ahaldane · 2017-05-18T15:13:59Z

Yeah I'd like to try, it doesn't look too hard.

I'll try to get something together later today or tonight.

Delay use of array repr until needed for string representations of the float info parameters. This is to allow getlimits to be imported early without pulling in too much of the repr machinery. See: numpy#8983 (comment)

eric-wieser · 2017-05-18T15:42:50Z

I might be tempted to leave the True/False padding in all but the scalar case, as there's exactly one array for a given shape that does not need the padding - it's a lot easier to compare things when they align:

>>> a
array([ True,  True,  True])
>>> b
array([False,  True,  True])

mhvk · 2017-05-18T15:51:57Z

True, though that same argument would suggest to keep the space for the sign (which it currently does not only for float!?)

In [3]: np.arange(0, -10, -1)
Out[3]: array([ 0, -1, -2, -3, -4, -5, -6, -7, -8, -9])

In [4]: np.arange(0, 10, 1)
Out[4]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [5]: np.arange(0, 10., 1)
Out[5]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

In [6]: np.arange(0, -10., -1)
Out[6]: array([ 0., -1., -2., -3., -4., -5., -6., -7., -8., -9.])

eric-wieser · 2017-05-18T16:01:46Z

Arguably, yes. You could also use the same argument to pad for extra digits, to allow comparison with arrays with larger numbers.

To me, it seems to be about breaking the format into subclasses, such that within each class, the formats are consistent. Right now, these classes are:

one digit all positive ints
one digit ints
boolean
...

Splitting bool into two categories is useless, because one of those categories contains exactly one member, so has no format to be consistent with.

charris · 2017-05-19T15:45:31Z

Note that this has led to failing tests in scipy, see scipy/scipy#7418.

WarrenWeckesser · 2017-05-19T15:53:34Z

These changes might cause issues in downstream projects. The change in str(array('foo')) appears to be affecting scipy (but that hasn't been confirmed yet).

With numpy 1.12.1:

In [141]: np.__version__
Out[141]: '1.12.1'

In [142]: str(np.array('foo'))
Out[142]: 'foo'

In [143]: str(np.array(np.pi))
Out[143]: '3.141592653589793'

With numpy master:

In [10]: np.__version__
Out[10]: '1.14.0.dev0+1ec9ad6'

In [11]: str(np.array('foo'))
Out[11]: "'foo'"

In [12]: str(np.array(np.pi))
Out[12]: ' 3.14159265'

There are now quotes included in str(array('foo')) (as if it was using repr() instead of str()), and the format for a floating point value has fewer digits and includes a leading space.

ahaldane · 2017-05-19T17:01:11Z

I do think that in principle this PR and #9139 are more correct behavior, but I recognize it is annoying for everyone to update the spaces in all their doctests. I'm open to reverting or leaving them for a more future release.

Here's a more conservative change I am thinking about. Note how:

The old behavior for 0d arrays was to print str(x.item()).
This PR's behavior is to print arrayprint.formatter(x).
A third unexplored option is to print str(x), ie rely on the string functions in scalartypes.c.src.

In other words, in the third option we could keep the new implementations of genint_type_str in this PR and the proposed void-reprs in #8981. This avoid using str(item()) which gives weird behavior for types whose .item() method gives something strange, but also (I think) avoid large changes to doctests of this PR.

I don't know all the implications of the third option yet.

eric-wieser · 2017-05-19T18:27:23Z

str isn't just used for print, it's used to convert numpy string-likes into python strings - so allowing it to be overridden doesn't make sense.

I think that we need str(x) to be mapped to str(x[()]) for 0d arrays (mapping to str(x.item()) still seems wrong to me).

And really, we want 0d arrays to match the behaviour of scalars as closely as possible.

mhvk · 2017-05-19T18:53:03Z

@eric-wieser - I'm not sure I agree - at least in my ideal future, numpy scalars don't exist at all, one just has array scalars and these behave the same way as regular arrays. (I think many problems have arisen from a poor analogy between arrays and lists.)

ahaldane · 2017-05-19T19:28:18Z

I'm in the middle of trying option 3 out in #9143. (I might not have time to work on it more today).

mhvk · 2017-05-19T19:38:44Z

It may be good to try define a final goal. My feeling would be that any array, scalar or otherwise, should be typeset by arrayprint.formatter. However, I think we do need to normalize away an extra space if all values are positive, and if there is a single True.

mhvk · 2017-05-19T20:13:55Z

See #9144 for a quick check what happens if one removes extraneous spaces for positive numbers (and all-True booleans).

…g_s" This reverts commit 692655e, reversing changes made to d4eaa2c.

Delay use of array repr until needed for string representations of the float info parameters. This is to allow getlimits to be imported early without pulling in too much of the repr machinery. See: numpy#8983 (comment)

eric-wieser reviewed Apr 24, 2017

View reviewed changes

charris added 03 - Maintenance 00 - Bug component: numpy._core component: numpy.ma masked arrays labels Apr 24, 2017

ahaldane force-pushed the fix0d_array2string_s branch 8 times, most recently from 08084c5 to 60c2eee Compare April 25, 2017 16:06

eric-wieser mentioned this pull request Apr 25, 2017

BUG: Prevent crash on repr of recursive array #8963

Merged

eric-wieser reviewed Apr 25, 2017

View reviewed changes

ahaldane force-pushed the fix0d_array2string_s branch from 60c2eee to 9b01087 Compare April 25, 2017 16:29

ahaldane changed the title ~~ENH: cleanup arrayprint.py, fix 0d array str/repr~~ ENH: fix 0d array str/repr Apr 25, 2017

eric-wieser reviewed Apr 25, 2017

View reviewed changes

mhvk mentioned this pull request May 18, 2017

Numpy style changes astropy/astropy#6090

Merged

charris mentioned this pull request May 18, 2017

BUG: delay calls of array repr in getlimits #9135

Merged

ahaldane mentioned this pull request May 18, 2017

ENH: remove unneeded spaces in float/bool reprs, fixes 0d str #9139

Merged

charris mentioned this pull request May 19, 2017

BUG: sparse: load_npz is failing with numpy master scipy/scipy#7418

Closed

ahaldane mentioned this pull request May 19, 2017

WIP: MAINT: print 0d arrays using scalar str/repr #9143

Closed

ahaldane added a commit to ahaldane/numpy that referenced this pull request May 21, 2017

Revert "Merge pull request numpy#8983 from ahaldane/fix0d_array2strin…

52ff459

…g_s" This reverts commit 692655e, reversing changes made to d4eaa2c.

eric-wieser mentioned this pull request Jul 1, 2017

BUG: np.void(b'test') enters recursion loop in repr #9345

Closed

ahaldane mentioned this pull request Sep 27, 2017

BUG: Fix unicode(unicode_array_0d) on python 2.7 #9201

Merged

eric-wieser mentioned this pull request Oct 18, 2017

Display of 0-D datetime64 arrays is odd. #4337

Closed

Uh oh!

ENH: fix str/repr for 0d-arrays and int* scalars #8983

ENH: fix str/repr for 0d-arrays and int* scalars #8983

Uh oh!

Conversation

ahaldane commented Apr 24, 2017

Uh oh!

ahaldane commented Apr 24, 2017

Uh oh!

eric-wieser commented Apr 24, 2017

Uh oh!

eric-wieser commented Apr 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahaldane commented Apr 24, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahaldane commented Apr 24, 2017

Uh oh!

eric-wieser commented Apr 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahaldane commented Apr 24, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Apr 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahaldane Apr 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Apr 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahaldane commented Apr 25, 2017 • edited by eric-wieser Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Apr 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahaldane Apr 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Apr 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahaldane commented May 17, 2017

Uh oh!

ahaldane commented May 17, 2017

Uh oh!

mhvk commented May 18, 2017

Uh oh!

eric-wieser commented Apr 24, 2017 •

edited

Loading

eric-wieser commented Apr 24, 2017 •

edited

Loading

eric-wieser Apr 25, 2017 •

edited

Loading

ahaldane Apr 25, 2017 •

edited

Loading

eric-wieser Apr 25, 2017 •

edited

Loading

ahaldane commented Apr 25, 2017 •

edited by eric-wieser

Loading

eric-wieser Apr 25, 2017 •

edited

Loading

ahaldane Apr 25, 2017 •

edited

Loading

eric-wieser Apr 25, 2017 •

edited

Loading

ahaldane commented May 18, 2017 •

edited

Loading

eric-wieser commented May 19, 2017 •

edited

Loading