Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH: fix str/repr for 0d-arrays and int* scalars #8983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 17, 2017

Conversation

ahaldane
Copy link
Member

0d arrays now respect the printoptions:

Before this PR:

>>> np.set_printoptions(formatter={'all': lambda x: "test"})
>>> array(1)
array(1)

With this PR:

>>> np.set_printoptions(formatter={'all': lambda x: "test"})
>>> array(1)
array(test)

This PR also cleans up a lot of array2string, and eliminates a lot of unnecessary evaluated code. In particular, the IntegerFormat and other constructors are now only called if the array has the corresponding type. Before, all the constructors for every type were called every time array2string was called.

@ahaldane
Copy link
Member Author

I've also left in a large comment with an alternate implementation for your consideration, which is not back-compatible. Currently, no user-specified format functions have access to precision, suppress_small and so on, that extra code would make them accessible.

When it comes time to merge I will remove that comment and one or two others.

@eric-wieser
Copy link
Member

In particular, the IntegerFormat and other constructors are now only called if the array has the corresponding type.

This is also fixed in #8963 - but hackily, so probably in need of a rebase

@eric-wieser
Copy link
Member

eric-wieser commented Apr 24, 2017

Actually, I think lazy_get is a little more opaque than the lambda stuff I do in #8963 - what do you think?

Specifically, in a2aea77

@ahaldane
Copy link
Member Author

Oh yeah, one more thing. I deprecated the style argument of array2string.

elif issubclass(dtypeobj, _nt.integer):
if issubclass(dtypeobj, _nt.timedelta64):
return formatdict['timedelta']
return _lazyget(formatter, 'timedelta', TimedeltaFormat, data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be more obvious as _lazyget(formatter, 'timedelta', lambda: TimedeltaFormat(data))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, could this be written as formatter.get('timedelta') or TimedeltaFormat(data)?

'str_kind': ['numpystr', 'str'] }

formatdict = {}
for tp, fmtfunc in six.iteritems(formatter):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have six available? Also, .items() would work just fine here

@ahaldane
Copy link
Member Author

Yeah the lambda version is pretty nice.

Maybe we can combine the lambdas for the raw types, plus the _expand_formatter_kinds from here.

(Though now that I think about it, my _expand_formatter_kinds slighly changes the behavior since now the *kinds don't show up in get_printoptions)

@eric-wieser
Copy link
Member

eric-wieser commented Apr 24, 2017

Do you want to split your PRs one last time, and make the arrayprint constructor fixes without the 0d case?

I think I preferred it when gentype_repr did an array promotion - and if that does lead to recursion, we can handle it in #8963 without hanging.

But clearly we need a fix to the formatters for either solution, so it'd be good to merge that first

@ahaldane
Copy link
Member Author

Sure, let me review your PR.

}
return ret;
return PyObject_CallFunction(fallback, "O", self);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we leave this as it was before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't actually checked, but I am 95% sure we will get infinite recursions because of tests which do things like

>>> np.set_printoptions(formatter={'all':lambda x: str(x)})

Copy link
Member

@eric-wieser eric-wieser Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will get infinite recursions

Hopefully they should now be bounded - that was why wanted that other commit merged

That doesn't seem sane as a formatter though anyway - "Make the __str__ for arrays to call str(self)" is obviously very wrong.

Really, that should be written np.set_printoptions(formatter={'all':lambda x: str(x.item())})

Copy link
Member Author

@ahaldane ahaldane Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#8963 doesn't protect against this kind of recursion (I noted that there).

If we disallow this kind formatter, though, it will be a backwards-compatibility break, which I was trying to avoid.

This formatter makes sense if you assume a clear division between array reprs and scalar reprs. In that case, if x is a scalar, then there shouldn't be any recursion problem.

The recursion problem only arises because we happened to implement a small number of scalar reprs by calling promoting to 0d and calling the array repr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, note the comment I added at the top of the file:

Both scalartypes.c.src and arrayprint.py implement reprs for scalars, but for different purposes. scalartypes.c.src has reprs for when the scalar is printed on its own, while arrayprint.py has reprs for when scalars are printed inside an ndarray.

This is quite apparent in datetime64, for example:

>>> a = np.datetime64('2005-02-25')
>>> a
numpy.datetime64('2005-02-25')
>>> np.array([a])
array(['2005-02-25'], dtype='datetime64[D]')

Copy link
Member

@eric-wieser eric-wieser Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protect against this kind of recursion (I noted that there).

You noted that it doesn't protect again str(np.array(x)) - although you're right, I guess that is what is happening here.

This is quite apparent in datetime64, for example:

With this patch, what does np.datetime64('2005-02-25')[...] give a repr of?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And as a follow up, can a test be added for that new repr)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gives

>>> repr(np.datetime64('2005-02-25')[...])
"array('2005-02-25', dtype='datetime64[D]')"

(it used to give "array(datetime.date(2005, 2, 25), dtype='datetime64[D]')")

format_function = _get_format_function(
a.reshape((1,)), precision, suppress_small, formatter)
lst = format_function(a[()])
elif reduce(product, a.shape) == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a rebase, I think - this won't work well with changes merged from #8963

@ahaldane
Copy link
Member Author

ahaldane commented Apr 25, 2017

Updated with only the 0d array fix.

I split off my arrayprint cleanup to a branch in my repository here. I can still submit it if anyone wants, but it's not really necessary because of #8963.

elif functools.reduce(product, a.shape) == 0:
format_function = _get_format_function(
a.reshape((1,)), precision, suppress_small, formatter)
lst = format_function(a[()])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to feel like the job of _array2string, in an if rank == 0 case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean _formatArray, but yes

@ahaldane ahaldane force-pushed the fix0d_array2string_s branch from 60c2eee to 9b01087 Compare April 25, 2017 16:29
@ahaldane ahaldane changed the title ENH: cleanup arrayprint.py, fix 0d array str/repr ENH: fix 0d array str/repr Apr 25, 2017
}
return ret;
return PyObject_CallFunction(fallback, "O", self);
Copy link
Member

@eric-wieser eric-wieser Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal: return PyObject_Repr( self.item() ) here instead? Since this is a fallback anyway...

Not that I disagree with how you currently have this.

Copy link
Member Author

@ahaldane ahaldane Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about something like

"{}({})".format(x.dtype.name, str(x))

Copy link
Member

@eric-wieser eric-wieser Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, that's pretty nice - with x = self.item(), I assume.

Also, I think that should always call repr, not str, if it intends to put it inside the parentheses


arr = PyArray_FromScalar(self, NULL);
if (arr != NULL) {
/* XXX: Why are we using str here? */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs to be kept in some form - because we still have the same code path for both str and repr

@ahaldane
Copy link
Member Author

Great reviewing, @eric-wieser

@ahaldane
Copy link
Member Author

Just want to note the consequences of this PR can be seen in astropy/astropy#6090. It looks like some of their string representations have changed slightly.

@mhvk
Copy link
Contributor

mhvk commented May 18, 2017

Indeed, this caused quite a few test failures in astropy; these things are somewhat of a pain with doctests. We had a similar problem when the PR that made structured arrays respect the print options was merged, but, like for that one, I think it is well worth it: scalars should not be typeset differently from arrays!

@ahaldane
Copy link
Member Author

ahaldane commented May 18, 2017

Looking at the fixes, I actually don't very much like the extra spaces this PR added in many arrays right after the ( (which curiously didn't show up in any numpy tests):

>>> np.array(1.0)
array( 1.)
>>> np.array(True)
array( True, dtype=bool)

(no spaces are added for integer types though)

I kind of want to fix that... I'll take a look.

@ahaldane
Copy link
Member Author

Does anyone know why the default is to print float arrays with two spaces? Ie,

>>> np.arange(10.)
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

instead of

>>> np.arange(10.)
array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

?

It's easy to remove the space (just change a 2 to a 1 here) and the few tests that break don't seem to explain the double space.

@mhvk
Copy link
Contributor

mhvk commented May 18, 2017

I think it just leaves space for the sign, i.e., if you have negative numbers in your example, things will line up. For the same reason, False does not get an extra space, unlike True:

In [2]: np.array(True), np.array(False)
Out[2]: (array( True, dtype=bool), array(False, dtype=bool))

Now, arguably, since the array printing code already looks at the number of trailing digits that are required, it might as well also look at whether the space for the sign is required.

@mhvk
Copy link
Contributor

mhvk commented May 18, 2017

@ahaldane - do you think you are going to change it? I have an astropy PR that changes our tests for the new repr (as we try to ensure numpy-dev passes our tests), which I will hold off on if you are. For what it is worth, my sense would be to remove the extraneous space if there are no negative numbers (which means more changes in astropy, but so be it...) and similarly for True.

@ahaldane
Copy link
Member Author

Yeah I'd like to try, it doesn't look too hard.

I'll try to get something together later today or tonight.

charris pushed a commit to charris/numpy that referenced this pull request May 18, 2017
Delay use of array repr until needed for string representations of the
float info parameters.  This is to allow getlimits to be imported early
without pulling in too much of the repr machinery.

See: numpy#8983 (comment)
@eric-wieser
Copy link
Member

I might be tempted to leave the True/False padding in all but the scalar case, as there's exactly one array for a given shape that does not need the padding - it's a lot easier to compare things when they align:

>>> a
array([ True,  True,  True])
>>> b
array([False,  True,  True])

@mhvk
Copy link
Contributor

mhvk commented May 18, 2017

True, though that same argument would suggest to keep the space for the sign (which it currently does not only for float!?)

In [3]: np.arange(0, -10, -1)
Out[3]: array([ 0, -1, -2, -3, -4, -5, -6, -7, -8, -9])

In [4]: np.arange(0, 10, 1)
Out[4]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [5]: np.arange(0, 10., 1)
Out[5]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

In [6]: np.arange(0, -10., -1)
Out[6]: array([ 0., -1., -2., -3., -4., -5., -6., -7., -8., -9.])

@eric-wieser
Copy link
Member

Arguably, yes. You could also use the same argument to pad for extra digits, to allow comparison with arrays with larger numbers.

To me, it seems to be about breaking the format into subclasses, such that within each class, the formats are consistent. Right now, these classes are:

  • one digit all positive ints
  • one digit ints
  • boolean
  • ...

Splitting bool into two categories is useless, because one of those categories contains exactly one member, so has no format to be consistent with.

@charris
Copy link
Member

charris commented May 19, 2017

Note that this has led to failing tests in scipy, see scipy/scipy#7418.

@WarrenWeckesser
Copy link
Member

These changes might cause issues in downstream projects. The change in str(array('foo')) appears to be affecting scipy (but that hasn't been confirmed yet).

With numpy 1.12.1:

In [141]: np.__version__
Out[141]: '1.12.1'

In [142]: str(np.array('foo'))
Out[142]: 'foo'

In [143]: str(np.array(np.pi))
Out[143]: '3.141592653589793'

With numpy master:

In [10]: np.__version__
Out[10]: '1.14.0.dev0+1ec9ad6'

In [11]: str(np.array('foo'))
Out[11]: "'foo'"

In [12]: str(np.array(np.pi))
Out[12]: ' 3.14159265'

There are now quotes included in str(array('foo')) (as if it was using repr() instead of str()), and the format for a floating point value has fewer digits and includes a leading space.

@ahaldane
Copy link
Member Author

I do think that in principle this PR and #9139 are more correct behavior, but I recognize it is annoying for everyone to update the spaces in all their doctests. I'm open to reverting or leaving them for a more future release.

Here's a more conservative change I am thinking about. Note how:

  • The old behavior for 0d arrays was to print str(x.item()).
  • This PR's behavior is to print arrayprint.formatter(x).
  • A third unexplored option is to print str(x), ie rely on the string functions in scalartypes.c.src.

In other words, in the third option we could keep the new implementations of genint_type_str in this PR and the proposed void-reprs in #8981. This avoid using str(item()) which gives weird behavior for types whose .item() method gives something strange, but also (I think) avoid large changes to doctests of this PR.

I don't know all the implications of the third option yet.

@eric-wieser
Copy link
Member

eric-wieser commented May 19, 2017

str isn't just used for print, it's used to convert numpy string-likes into python strings - so allowing it to be overridden doesn't make sense.

I think that we need str(x) to be mapped to str(x[()]) for 0d arrays (mapping to str(x.item()) still seems wrong to me).

And really, we want 0d arrays to match the behaviour of scalars as closely as possible.

@mhvk
Copy link
Contributor

mhvk commented May 19, 2017

@eric-wieser - I'm not sure I agree - at least in my ideal future, numpy scalars don't exist at all, one just has array scalars and these behave the same way as regular arrays. (I think many problems have arisen from a poor analogy between arrays and lists.)

@ahaldane
Copy link
Member Author

I'm in the middle of trying option 3 out in #9143. (I might not have time to work on it more today).

@mhvk
Copy link
Contributor

mhvk commented May 19, 2017

It may be good to try define a final goal. My feeling would be that any array, scalar or otherwise, should be typeset by arrayprint.formatter. However, I think we do need to normalize away an extra space if all values are positive, and if there is a single True.

@mhvk
Copy link
Contributor

mhvk commented May 19, 2017

See #9144 for a quick check what happens if one removes extraneous spaces for positive numbers (and all-True booleans).

ahaldane added a commit to ahaldane/numpy that referenced this pull request May 21, 2017
…g_s"

This reverts commit 692655e, reversing
changes made to d4eaa2c.
mherkazandjian pushed a commit to mherkazandjian/numpy that referenced this pull request May 30, 2017
Delay use of array repr until needed for string representations of the
float info parameters.  This is to allow getlimits to be imported early
without pulling in too much of the repr machinery.

See: numpy#8983 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants