ENH: Add `shape` control parameter to `set_printoptions` #27461

GF-Huang · 2024-09-25T09:27:32Z

Proposed new feature or change:

I want to be able to print large arrays and display the shape of the array at the same time.

Like this, I don't know what the shape of x, shape == (2, ???).

> print(x)
array([[       0,        1,        2, ..., 49999997, 49999998, 49999999],
       [50000000, 50000001, 50000002, ..., 99999997, 99999998, 99999999]])

The text was updated successfully, but these errors were encountered:

mattip · 2024-09-25T10:31:29Z

What is wrong with this? I don't think we are going to change something so basic to NumPy at this stage.

print(f"{x=}")
print(f"{x.shape=}")

GF-Huang · 2024-09-25T12:04:10Z

What is wrong with this? I don't think we are going to change something so basic to NumPy at this stage.
print(f"{x=}")
print(f"{x.shape=}")

I'm using jupyter notebook.

If you want to look at a complex statement like:
x[x > 0].sum(...).otherFunc1(...).otherFunc2(...)

Your need to do either:

tmp = x[x > 0].sum(...).otherFunc1(...).otherFunc2(...)
print(tmp)
print(tmp.shape)

or:

print(x[x > 0].sum(...).otherFunc1(...).otherFunc2(...))
print(x[x > 0].sum(...).otherFunc1(...).otherFunc2(...).shape)

I want to look at a large array summary and its shape in one line code.

mhvk · 2024-09-25T18:05:10Z

Maybe good to try to be concrete, since while writing this I completely changed my mind, to thinking this should be done without even an option to turn it on or off.

In a jupyter notebook, what you get from print(x) is not directly relevant, since that uses __str__ while you would normally see __repr__, like,

In [3]: np.ones((2, 100000), dtype='f4')
Out[3]: 
array([[1., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.]], dtype=float32)

I chose this example with 'f4' since that shows that we do put in extra information already when it cannot be inferred directly from the data as shown. But here it is also really logical to do so in that it is an argument that can actually be passed on to np.array. This is not the case for shape, making that less logical to show. However, it is in fact already shown for zero-sized arrays with shape not simply (0,):

In [4]: np.ones((2, 0), dtype='f4')
Out[4]: array([], shape=(2, 0), dtype=float32)

For this case, the shape cannot be guessed from the data values ([] in this case), and perhaps that was the reason to show it, just like dtype is shown for the case that it cannot be guessed from the data.

To me, the above makes it seem reasonable to show the shape more generally when it cannot be directly inferred, i.e., also when data has been summarized. I think it would not be particularly hard to do: just an extra bit of logic in _array_repr_implementation in numpy/_core/arrayprint.py.

p.s. An alternative I thought of would be to allow customization of summary_insert, so that one could replace ... with, e.g., ...[10 items].... But I think for adding extra options we have to be way more careful - we already have rather many and really it is not that useful to allow control at the level of which people should just write their own function.

GF-Huang · 2024-09-25T18:12:19Z

@mhvk This is what I want to say, sometime you are hard to know the shape.

I had written myself version for printing, but I had to write it every new notebook.

def summary(self: NDArray, include_dtype: bool = False, include_shape: bool = True):
    summ = cp.array_repr(self) if hasattr(self, 'device') else np.array_repr(self)
    summ_parts = [summ[:-1]]
    if include_dtype:
        summ_parts.append(f', dtype={self.dtype}')
    if include_shape:
        summ_parts.append(f', shape={self.shape}')
    summ_parts.append(')')
    print(''.join(summ_parts))

mhvk · 2024-09-25T18:17:57Z

The actual implementation does something fairly similar - except that it also checks the result still fits on the last line. I think it would make sense to adjust it... See

numpy/numpy/_core/arrayprint.py

Lines 1564 to 1613 in 9c51621

    
           def _array_repr_implementation( 
        
                   arr, max_line_width=None, precision=None, suppress_small=None, 
        
                   array2string=array2string): 
        
               """Internal version of array_repr() that allows overriding array2string.""" 
        
               current_options = format_options.get() 
        
               override_repr = current_options["override_repr"] 
        
               if override_repr is not None: 
        
                   return override_repr(arr) 
        
               if max_line_width is None: 
        
                   max_line_width = current_options['linewidth'] 
        
               if type(arr) is not ndarray: 
        
                   class_name = type(arr).__name__ 
        
               else: 
        
                   class_name = "array" 
        
               skipdtype = dtype_is_implied(arr.dtype) and arr.size > 0 
        
               prefix = class_name + "(" 
        
               suffix = ")" if skipdtype else "," 
        
               if (current_options['legacy'] <= 113 and 
        
                       arr.shape == () and not arr.dtype.names): 
        
                   lst = repr(arr.item()) 
        
               elif arr.size > 0 or arr.shape == (0,): 
        
                   lst = array2string(arr, max_line_width, precision, suppress_small, 
        
                                      ', ', prefix, suffix=suffix) 
        
               else:  # show zero-length shape unless it is (0,) 
        
                   lst = "[], shape=%s" % (repr(arr.shape),) 
        
               arr_str = prefix + lst + suffix 
        
               if skipdtype: 
        
                   return arr_str 
        
               dtype_str = "dtype={})".format(dtype_short_repr(arr.dtype)) 
        
               # compute whether we should put dtype on a new line: Do so if adding the 
        
               # dtype would extend the last line past max_line_width. 
        
               # Note: This line gives the correct result even when rfind returns -1. 
        
               last_line_len = len(arr_str) - (arr_str.rfind('\n') + 1) 
        
               spacer = " " 
        
               if current_options['legacy'] <= 113: 
        
                   if issubclass(arr.dtype.type, flexible): 
        
                       spacer = '\n' + ' '*len(class_name + "(") 
        
               elif last_line_len + len(dtype_str) + 1 > max_line_width: 
        
                   spacer = '\n' + ' '*len(class_name + "(") 
        
               return arr_str + spacer + dtype_str

seberg · 2024-09-27T09:41:06Z

I'm using jupyter notebook.

I think this is common. I think it may make sense to focus attention on creating an html representation for arrays, which could probably much more naturally fit the shape somewhere. I realize that is a bigger job, but it would be a pretty major improvement!

GF-Huang · 2024-09-30T07:32:21Z

I found a solution to my needs, which is np.set_string_function:

def summary(self: NDArray, include_dtype: bool = False, include_shape: bool = True):
    summ = cp.array_repr(self) if hasattr(self, 'device') else np.array_repr(self)
    summ_parts = [summ[:-1]]
    if include_dtype:
        summ_parts.append(f', dtype={self.dtype}')
    if include_shape:
        summ_parts.append(f', shape={self.shape}')
    summ_parts.append(')')
    return ''.join(summ_parts)


np.set_string_function(summary)

> np.arange(1000000).reshape(2000, -1)
array([[     0,      1,      2, ...,    497,    498,    499],
       [   500,    501,    502, ...,    997,    998,    999],
       [  1000,   1001,   1002, ...,   1497,   1498,   1499],
       ...,
       [998500, 998501, 998502, ..., 998997, 998998, 998999],
       [999000, 999001, 999002, ..., 999497, 999498, 999499],
       [999500, 999501, 999502, ..., 999997, 999998, 999999]], shape=(2000, 500))

mhvk · 2024-09-30T15:56:39Z

That work-around is great! But it still irritated me that the code doesn't "do the right thing" by default, so I made a quick PR that changes the behaviour - see #27482

mhvk mentioned this issue Sep 30, 2024

Show shape any time it cannot be inferred in repr #27482

Merged

mattip closed this as completed in #27482 Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Add `shape` control parameter to `set_printoptions` #27461

ENH: Add `shape` control parameter to `set_printoptions` #27461

GF-Huang commented Sep 25, 2024

mattip commented Sep 25, 2024

Uh oh!

GF-Huang commented Sep 25, 2024 •

edited

Loading

Uh oh!

mhvk commented Sep 25, 2024

Uh oh!

GF-Huang commented Sep 25, 2024

Uh oh!

mhvk commented Sep 25, 2024

Uh oh!

seberg commented Sep 27, 2024

Uh oh!

GF-Huang commented Sep 30, 2024

Uh oh!

mhvk commented Sep 30, 2024

Uh oh!

Uh oh!

ENH: Add shape control parameter to set_printoptions #27461

ENH: Add shape control parameter to set_printoptions #27461

Comments

GF-Huang commented Sep 25, 2024

Proposed new feature or change:

mattip commented Sep 25, 2024

Uh oh!

GF-Huang commented Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhvk commented Sep 25, 2024

Uh oh!

GF-Huang commented Sep 25, 2024

Uh oh!

mhvk commented Sep 25, 2024

Uh oh!

seberg commented Sep 27, 2024

Uh oh!

GF-Huang commented Sep 30, 2024

Uh oh!

mhvk commented Sep 30, 2024

Uh oh!

ENH: Add `shape` control parameter to `set_printoptions` #27461

ENH: Add `shape` control parameter to `set_printoptions` #27461

GF-Huang commented Sep 25, 2024 •

edited

Loading