Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH: Add shape control parameter to set_printoptions #27461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
GF-Huang opened this issue Sep 25, 2024 · 8 comments · Fixed by #27482
Closed

ENH: Add shape control parameter to set_printoptions #27461

GF-Huang opened this issue Sep 25, 2024 · 8 comments · Fixed by #27482

Comments

@GF-Huang
Copy link

Proposed new feature or change:

I want to be able to print large arrays and display the shape of the array at the same time.

Like this, I don't know what the shape of x, shape == (2, ???).

> print(x)
array([[       0,        1,        2, ..., 49999997, 49999998, 49999999],
       [50000000, 50000001, 50000002, ..., 99999997, 99999998, 99999999]])
@mattip
Copy link
Member

mattip commented Sep 25, 2024

What is wrong with this? I don't think we are going to change something so basic to NumPy at this stage.

print(f"{x=}")
print(f"{x.shape=}")

@GF-Huang
Copy link
Author

GF-Huang commented Sep 25, 2024

What is wrong with this? I don't think we are going to change something so basic to NumPy at this stage.

print(f"{x=}")
print(f"{x.shape=}")

I'm using jupyter notebook.

If you want to look at a complex statement like:
x[x > 0].sum(...).otherFunc1(...).otherFunc2(...)

Your need to do either:

tmp = x[x > 0].sum(...).otherFunc1(...).otherFunc2(...)
print(tmp)
print(tmp.shape)

or:

print(x[x > 0].sum(...).otherFunc1(...).otherFunc2(...))
print(x[x > 0].sum(...).otherFunc1(...).otherFunc2(...).shape)

I want to look at a large array summary and its shape in one line code.

@mhvk
Copy link
Contributor

mhvk commented Sep 25, 2024

Maybe good to try to be concrete, since while writing this I completely changed my mind, to thinking this should be done without even an option to turn it on or off.

In a jupyter notebook, what you get from print(x) is not directly relevant, since that uses __str__ while you would normally see __repr__, like,

In [3]: np.ones((2, 100000), dtype='f4')
Out[3]: 
array([[1., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.]], dtype=float32)

I chose this example with 'f4' since that shows that we do put in extra information already when it cannot be inferred directly from the data as shown. But here it is also really logical to do so in that it is an argument that can actually be passed on to np.array. This is not the case for shape, making that less logical to show. However, it is in fact already shown for zero-sized arrays with shape not simply (0,):

In [4]: np.ones((2, 0), dtype='f4')
Out[4]: array([], shape=(2, 0), dtype=float32)

For this case, the shape cannot be guessed from the data values ([] in this case), and perhaps that was the reason to show it, just like dtype is shown for the case that it cannot be guessed from the data.

To me, the above makes it seem reasonable to show the shape more generally when it cannot be directly inferred, i.e., also when data has been summarized. I think it would not be particularly hard to do: just an extra bit of logic in _array_repr_implementation in numpy/_core/arrayprint.py.

p.s. An alternative I thought of would be to allow customization of summary_insert, so that one could replace ... with, e.g., ...[10 items].... But I think for adding extra options we have to be way more careful - we already have rather many and really it is not that useful to allow control at the level of which people should just write their own function.

@GF-Huang
Copy link
Author

@mhvk This is what I want to say, sometime you are hard to know the shape.

image

I had written myself version for printing, but I had to write it every new notebook.

def summary(self: NDArray, include_dtype: bool = False, include_shape: bool = True):
    summ = cp.array_repr(self) if hasattr(self, 'device') else np.array_repr(self)
    summ_parts = [summ[:-1]]
    if include_dtype:
        summ_parts.append(f', dtype={self.dtype}')
    if include_shape:
        summ_parts.append(f', shape={self.shape}')
    summ_parts.append(')')
    print(''.join(summ_parts))

@mhvk
Copy link
Contributor

mhvk commented Sep 25, 2024

The actual implementation does something fairly similar - except that it also checks the result still fits on the last line. I think it would make sense to adjust it... See

def _array_repr_implementation(
arr, max_line_width=None, precision=None, suppress_small=None,
array2string=array2string):
"""Internal version of array_repr() that allows overriding array2string."""
current_options = format_options.get()
override_repr = current_options["override_repr"]
if override_repr is not None:
return override_repr(arr)
if max_line_width is None:
max_line_width = current_options['linewidth']
if type(arr) is not ndarray:
class_name = type(arr).__name__
else:
class_name = "array"
skipdtype = dtype_is_implied(arr.dtype) and arr.size > 0
prefix = class_name + "("
suffix = ")" if skipdtype else ","
if (current_options['legacy'] <= 113 and
arr.shape == () and not arr.dtype.names):
lst = repr(arr.item())
elif arr.size > 0 or arr.shape == (0,):
lst = array2string(arr, max_line_width, precision, suppress_small,
', ', prefix, suffix=suffix)
else: # show zero-length shape unless it is (0,)
lst = "[], shape=%s" % (repr(arr.shape),)
arr_str = prefix + lst + suffix
if skipdtype:
return arr_str
dtype_str = "dtype={})".format(dtype_short_repr(arr.dtype))
# compute whether we should put dtype on a new line: Do so if adding the
# dtype would extend the last line past max_line_width.
# Note: This line gives the correct result even when rfind returns -1.
last_line_len = len(arr_str) - (arr_str.rfind('\n') + 1)
spacer = " "
if current_options['legacy'] <= 113:
if issubclass(arr.dtype.type, flexible):
spacer = '\n' + ' '*len(class_name + "(")
elif last_line_len + len(dtype_str) + 1 > max_line_width:
spacer = '\n' + ' '*len(class_name + "(")
return arr_str + spacer + dtype_str

@seberg
Copy link
Member

seberg commented Sep 27, 2024

I'm using jupyter notebook.

I think this is common. I think it may make sense to focus attention on creating an html representation for arrays, which could probably much more naturally fit the shape somewhere. I realize that is a bigger job, but it would be a pretty major improvement!

@GF-Huang
Copy link
Author

I found a solution to my needs, which is np.set_string_function:

def summary(self: NDArray, include_dtype: bool = False, include_shape: bool = True):
    summ = cp.array_repr(self) if hasattr(self, 'device') else np.array_repr(self)
    summ_parts = [summ[:-1]]
    if include_dtype:
        summ_parts.append(f', dtype={self.dtype}')
    if include_shape:
        summ_parts.append(f', shape={self.shape}')
    summ_parts.append(')')
    return ''.join(summ_parts)


np.set_string_function(summary)
> np.arange(1000000).reshape(2000, -1)
array([[     0,      1,      2, ...,    497,    498,    499],
       [   500,    501,    502, ...,    997,    998,    999],
       [  1000,   1001,   1002, ...,   1497,   1498,   1499],
       ...,
       [998500, 998501, 998502, ..., 998997, 998998, 998999],
       [999000, 999001, 999002, ..., 999497, 999498, 999499],
       [999500, 999501, 999502, ..., 999997, 999998, 999999]], shape=(2000, 500))

@mhvk
Copy link
Contributor

mhvk commented Sep 30, 2024

That work-around is great! But it still irritated me that the code doesn't "do the right thing" by default, so I made a quick PR that changes the behaviour - see #27482

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants