-
-
Notifications
You must be signed in to change notification settings - Fork 11k
ENH: fix str/repr for 0d-arrays and int* scalars #8983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I've also left in a large comment with an alternate implementation for your consideration, which is not back-compatible. Currently, no user-specified format functions have access to When it comes time to merge I will remove that comment and one or two others. |
This is also fixed in #8963 - but hackily, so probably in need of a rebase |
Oh yeah, one more thing. I deprecated the |
numpy/core/arrayprint.py
Outdated
elif issubclass(dtypeobj, _nt.integer): | ||
if issubclass(dtypeobj, _nt.timedelta64): | ||
return formatdict['timedelta'] | ||
return _lazyget(formatter, 'timedelta', TimedeltaFormat, data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be more obvious as _lazyget(formatter, 'timedelta', lambda: TimedeltaFormat(data))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, could this be written as formatter.get('timedelta') or TimedeltaFormat(data)
?
numpy/core/arrayprint.py
Outdated
'str_kind': ['numpystr', 'str'] } | ||
|
||
formatdict = {} | ||
for tp, fmtfunc in six.iteritems(formatter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have six
available? Also, .items()
would work just fine here
Yeah the lambda version is pretty nice. Maybe we can combine the lambdas for the raw types, plus the (Though now that I think about it, my |
Do you want to split your PRs one last time, and make the I think I preferred it when But clearly we need a fix to the formatters for either solution, so it'd be good to merge that first |
Sure, let me review your PR. |
08084c5
to
60c2eee
Compare
} | ||
return ret; | ||
return PyObject_CallFunction(fallback, "O", self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we leave this as it was before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't actually checked, but I am 95% sure we will get infinite recursions because of tests which do things like
>>> np.set_printoptions(formatter={'all':lambda x: str(x)})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will get infinite recursions
Hopefully they should now be bounded - that was why wanted that other commit merged
That doesn't seem sane as a formatter though anyway - "Make the __str__
for arrays to call str(self)
" is obviously very wrong.
Really, that should be written np.set_printoptions(formatter={'all':lambda x: str(x.item())})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#8963 doesn't protect against this kind of recursion (I noted that there).
If we disallow this kind formatter, though, it will be a backwards-compatibility break, which I was trying to avoid.
This formatter makes sense if you assume a clear division between array reprs and scalar reprs. In that case, if x is a scalar, then there shouldn't be any recursion problem.
The recursion problem only arises because we happened to implement a small number of scalar reprs by calling promoting to 0d and calling the array repr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, note the comment I added at the top of the file:
Both scalartypes.c.src
and arrayprint.py
implement reprs for scalars, but for different purposes. scalartypes.c.src
has reprs for when the scalar is printed on its own, while arrayprint.py
has reprs for when scalars are printed inside an ndarray.
This is quite apparent in datetime64, for example:
>>> a = np.datetime64('2005-02-25')
>>> a
numpy.datetime64('2005-02-25')
>>> np.array([a])
array(['2005-02-25'], dtype='datetime64[D]')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protect against this kind of recursion (I noted that there).
You noted that it doesn't protect again str(np.array(x))
- although you're right, I guess that is what is happening here.
This is quite apparent in datetime64, for example:
With this patch, what does np.datetime64('2005-02-25')[...]
give a repr
of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And as a follow up, can a test be added for that new repr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gives
>>> repr(np.datetime64('2005-02-25')[...])
"array('2005-02-25', dtype='datetime64[D]')"
(it used to give "array(datetime.date(2005, 2, 25), dtype='datetime64[D]')"
)
numpy/core/arrayprint.py
Outdated
format_function = _get_format_function( | ||
a.reshape((1,)), precision, suppress_small, formatter) | ||
lst = format_function(a[()]) | ||
elif reduce(product, a.shape) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a rebase, I think - this won't work well with changes merged from #8963
numpy/core/arrayprint.py
Outdated
elif functools.reduce(product, a.shape) == 0: | ||
format_function = _get_format_function( | ||
a.reshape((1,)), precision, suppress_small, formatter) | ||
lst = format_function(a[()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is starting to feel like the job of _array2string
, in an if rank == 0
case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean _formatArray
, but yes
60c2eee
to
9b01087
Compare
} | ||
return ret; | ||
return PyObject_CallFunction(fallback, "O", self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposal: return PyObject_Repr( self.item() )
here instead? Since this is a fallback anyway...
Not that I disagree with how you currently have this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about something like
"{}({})".format(x.dtype.name, str(x))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, that's pretty nice - with x = self.item()
, I assume.
Also, I think that should always call repr
, not str
, if it intends to put it inside the parentheses
|
||
arr = PyArray_FromScalar(self, NULL); | ||
if (arr != NULL) { | ||
/* XXX: Why are we using str here? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment needs to be kept in some form - because we still have the same code path for both str
and repr
Great reviewing, @eric-wieser |
Just want to note the consequences of this PR can be seen in astropy/astropy#6090. It looks like some of their string representations have changed slightly. |
Indeed, this caused quite a few test failures in astropy; these things are somewhat of a pain with doctests. We had a similar problem when the PR that made structured arrays respect the print options was merged, but, like for that one, I think it is well worth it: scalars should not be typeset differently from arrays! |
Looking at the fixes, I actually don't very much like the extra spaces this PR added in many arrays right after the >>> np.array(1.0)
array( 1.)
>>> np.array(True)
array( True, dtype=bool) (no spaces are added for integer types though) I kind of want to fix that... I'll take a look. |
Does anyone know why the default is to print float arrays with two spaces? Ie, >>> np.arange(10.)
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) instead of >>> np.arange(10.)
array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) ? It's easy to remove the space (just change a 2 to a 1 here) and the few tests that break don't seem to explain the double space. |
I think it just leaves space for the sign, i.e., if you have negative numbers in your example, things will line up. For the same reason,
Now, arguably, since the array printing code already looks at the number of trailing digits that are required, it might as well also look at whether the space for the sign is required. |
@ahaldane - do you think you are going to change it? I have an astropy PR that changes our tests for the new repr (as we try to ensure numpy-dev passes our tests), which I will hold off on if you are. For what it is worth, my sense would be to remove the extraneous space if there are no negative numbers (which means more changes in astropy, but so be it...) and similarly for |
Yeah I'd like to try, it doesn't look too hard. I'll try to get something together later today or tonight. |
Delay use of array repr until needed for string representations of the float info parameters. This is to allow getlimits to be imported early without pulling in too much of the repr machinery. See: numpy#8983 (comment)
I might be tempted to leave the
|
True, though that same argument would suggest to keep the space for the sign (which it currently does not only for float!?)
|
Arguably, yes. You could also use the same argument to pad for extra digits, to allow comparison with arrays with larger numbers. To me, it seems to be about breaking the format into subclasses, such that within each class, the formats are consistent. Right now, these classes are:
Splitting |
Note that this has led to failing tests in scipy, see scipy/scipy#7418. |
These changes might cause issues in downstream projects. The change in With numpy 1.12.1:
With numpy master:
There are now quotes included in |
I do think that in principle this PR and #9139 are more correct behavior, but I recognize it is annoying for everyone to update the spaces in all their doctests. I'm open to reverting or leaving them for a more future release. Here's a more conservative change I am thinking about. Note how:
In other words, in the third option we could keep the new implementations of I don't know all the implications of the third option yet. |
I think that we need And really, we want 0d arrays to match the behaviour of scalars as closely as possible. |
@eric-wieser - I'm not sure I agree - at least in my ideal future, numpy scalars don't exist at all, one just has array scalars and these behave the same way as regular arrays. (I think many problems have arisen from a poor analogy between arrays and lists.) |
I'm in the middle of trying option 3 out in #9143. (I might not have time to work on it more today). |
It may be good to try define a final goal. My feeling would be that any array, scalar or otherwise, should be typeset by |
See #9144 for a quick check what happens if one removes extraneous spaces for positive numbers (and all-True booleans). |
Delay use of array repr until needed for string representations of the float info parameters. This is to allow getlimits to be imported early without pulling in too much of the repr machinery. See: numpy#8983 (comment)
0d arrays now respect the printoptions:
Before this PR:
With this PR:
This PR also cleans up a lot of array2string, and eliminates a lot of unnecessary evaluated code. In particular, the
IntegerFormat
and other constructors are now only called if the array has the corresponding type. Before, all the constructors for every type were called every time array2string was called.