Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: error in printing masked arrays #7621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fonnesbeck opened this issue May 11, 2016 · 26 comments
Closed

BUG: error in printing masked arrays #7621

fonnesbeck opened this issue May 11, 2016 · 26 comments

Comments

@fonnesbeck
Copy link

fonnesbeck commented May 11, 2016

Numpy masked arrays do not fill correctly when constructed. In the example below, the values filled do not correspond to the mask (notice the big chunk of values in the middle that get filled when they should not:

screenshot

The same happens with other masked array constructors, such as masked_equal. Strangely, if I use nan as the mask value, it works as expected.

Running NumPy 1.11.0 on Python 3.5.1 (OS X 10.11.4)

@abalkin
Copy link
Contributor

abalkin commented May 11, 2016

Please don't post screen shots. Copy and paste your session instead. Please try to reproduce your problem with a smaller array.

@fonnesbeck
Copy link
Author

import numpy as np
foo = np.array([  30.  ,   61.  ,   31.  ,   37.  ,    6.  ,    2.  ,  132.  ,
         27.  ,   38.  ,   48.7 ,    3.  ,   72.  ,   37.5 ,    5.1 ,
         48.  ,   20.2 ,   26.  ,    1.8 ,   15.3 ,   30.4 ,    4.5 ,
          8.  ,   13.  ,   31.  ,   51.  ,   36.  ,   42.  ,   42.  ,
         34.  ,   21.  ,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,   38.  ,    9.  ,   42.  ,   27.  ,   17.  ,   39.  ,
         29.  ,   58.  ,  137.  ,   13.  ,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11])
np.ma.masked_equal(foo, value=1.11)

yields:

masked_array(data = [30.0 61.0 31.0 37.0 6.0 2.0 132.0 27.0 38.0 48.7 3.0 72.0 37.5 5.1 48.0
 20.2 26.0 1.8 15.3 30.4 4.5 8.0 13.0 31.0 51.0 36.0 42.0 42.0 34.0 21.0 --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --],
             mask = [False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True False False False False False False False False False False
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True],
       fill_value = 1.11)

@abalkin
Copy link
Contributor

abalkin commented May 11, 2016

Hmm, I get

In [72]: np.ma.masked_equal(foo, value=1.11)
Out[72]:
masked_array(data = [30.0 61.0 31.0 37.0 6.0 2.0 132.0 27.0 38.0 48.7 3.0 72.0 37.5 5.1 48.0
 20.2 26.0 1.8 15.3 30.4 4.5 8.0 13.0 31.0 51.0 36.0 42.0 42.0 34.0 21.0 --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 38.0 9.0 42.0
 27.0 17.0 39.0 29.0 58.0 137.0 13.0 -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- --],
             mask = [False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True False False False False False False False False False False
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True],
       fill_value = 1.11)

@abalkin
Copy link
Contributor

abalkin commented May 11, 2016

I noticed that the mask shows up correctly in your output.

@fonnesbeck
Copy link
Author

This may be a repr issue, as when I ask for a filled array, it seems to do the right thing:

time_masked.filled()
Out[64]:
array([  30.  ,   61.  ,   31.  ,   37.  ,    6.  ,    2.  ,  132.  ,
         27.  ,   38.  ,   48.7 ,    3.  ,   72.  ,   37.5 ,    5.1 ,
         48.  ,   20.2 ,   26.  ,    1.8 ,   15.3 ,   30.4 ,    4.5 ,
          8.  ,   13.  ,   31.  ,   51.  ,   36.  ,   42.  ,   42.  ,
         34.  ,   21.  ,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,   38.  ,    9.  ,   42.  ,   27.  ,   17.  ,   39.  ,
         29.  ,   58.  ,  137.  ,   13.  ,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11,    1.11,    1.11,    1.11,    1.11,    1.11,
          1.11,    1.11])

@abalkin
Copy link
Contributor

abalkin commented May 11, 2016

I've upgraded to numpy 1.11.0 and I now see your problem:

In [3]: np.ma.masked_equal(foo, value=1.11)
Out[3]:
masked_array(data = [30.0 61.0 31.0 37.0 6.0 2.0 132.0 27.0 38.0 48.7 3.0 72.0 37.5 5.1 48.0
 20.2 26.0 1.8 15.3 30.4 4.5 8.0 13.0 31.0 51.0 36.0 42.0 42.0 34.0 21.0 --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --],
             mask = [False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True False False False False False False False False False False
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True],
       fill_value = 1.11)

@abalkin
Copy link
Contributor

abalkin commented May 11, 2016

Same issue with __str__:

In [4]: print(np.ma.masked_equal(foo, value=1.11))
[30.0 61.0 31.0 37.0 6.0 2.0 132.0 27.0 38.0 48.7 3.0 72.0 37.5 5.1 48.0
 20.2 26.0 1.8 15.3 30.4 4.5 8.0 13.0 31.0 51.0 36.0 42.0 42.0 34.0 21.0 --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --]

@abalkin
Copy link
Contributor

abalkin commented May 11, 2016

Here are some really odd displays:

In [39]: print(np.ma.masked_equal(foo[:109], value=1.11))
[30.0 61.0 31.0 37.0 6.0 2.0 132.0 27.0 38.0 48.7 3.0 72.0 37.5 5.1 48.0
 20.2 26.0 1.8 15.3 30.4 4.5 8.0 13.0 31.0 51.0 36.0 42.0 42.0 34.0 21.0 --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 13.0 -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --]

In [40]: print(np.ma.masked_equal(foo[:108], value=1.11))
[30.0 61.0 31.0 37.0 6.0 2.0 132.0 27.0 38.0 48.7 3.0 72.0 37.5 5.1 48.0
 20.2 26.0 1.8 15.3 30.4 4.5 8.0 13.0 31.0 51.0 36.0 42.0 42.0 34.0 21.0 --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 137.0 13.0 -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --]

@abalkin
Copy link
Contributor

abalkin commented May 11, 2016

Here is a clearer case demonstrating the problem:

In [46]: a = np.arange(120)

In [47]: a[30:50] = a[60:] = -1

In [48]: print(np.ma.masked_equal(a, value=-1))
[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
 28 29 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --]

@charris
Copy link
Member

charris commented May 11, 2016

I'm going to guess this came in with #6748.

@ahaldane
Copy link
Member

I don't have time to check but #6094 also comes to mind as well.

@charris charris added this to the 1.11.1 release milestone May 11, 2016
@abalkin
Copy link
Contributor

abalkin commented May 11, 2016

The problem is related to _print_width introduced in b5c456e.

Here is a simple demonstration:

In [84]: np.ma.MaskedArray._print_width = 10

In [85]: print(np.ma.masked_values([0]*120, 0))
[-- -- -- -- -- -- -- -- -- --]

The default value for _print_width is 100.

@abalkin
Copy link
Contributor

abalkin commented May 11, 2016

@saimn - it looks like you've done some work in this area recently.

@abalkin
Copy link
Contributor

abalkin commented May 11, 2016

I think the problem was introduced in #3544 / #6748.

@saimn
Copy link
Contributor

saimn commented May 11, 2016

@abalkin - Yes indeed. I did this while working / checking on 2D/3D arrays I guess, and it seems that the rule for the number on values printed on screen for 1D arrays is different ... I don't know exactly what is the rule / limit, and when filling ... are used, but this can probably be fixed by increasing the _print_width value ? This idea of truncating the array is mostly relevant for big arrays anyway, so having a higher value for _print_width is fine. The question is which value to choose ...

@abalkin
Copy link
Contributor

abalkin commented May 11, 2016

@saimn - I cannot understand your logic in 593345a. It looks like you drop the middle values from the array leaving no indication for the subsequent code as to where to place the dots. Also, you only apply the new logic when mask is not nomask leading to spurious display differences between arrays with mask=nomask and mask=ones(n).

@abalkin
Copy link
Contributor

abalkin commented May 11, 2016

The logic for 1-D arrays is odd. A length 1001 array is contracted:

In [98]: print(np.arange(1001))
[   0    1    2 ...,  998  999 1000]

but a length 1000 one is printed in full. (I will not paste a screenful here.)

@saimn
Copy link
Contributor

saimn commented May 11, 2016

@abalkin - The logic is just to reduce the size of the array, but still have enough values to use the same printing logic as before. So if there are enough values, the output should be the same (with the conversion to the object dtype, filling with -- for masked values, and then truncating and adding ... which is done by ndarray). If mask is nomask, there is no need to do all this stuff because the array is printed directly.

@abalkin
Copy link
Contributor

abalkin commented May 11, 2016

Got it, but the constant _print_width logic is probably too simplistic. You need a different value for 1D case.

@saimn
Copy link
Contributor

saimn commented May 11, 2016

Hmm with _print_width = 1000 and a 1001 length array, I get a data which is fully printed but the mask is truncated ...

edit:

In [13]: np.ma.MaskedArray._print_width =1000
In [14]: a = np.ma.arange(1001)
In [15]: a[:50] = np.ma.masked

In [16]: a.data
Out[16]: array([   0,    1,    2, ...,  998,  999, 1000])

In [17]: a.mask
Out[17]: array([ True,  True,  True, ..., False, False, False], dtype=bool)

But then printing a shows the full data.

@abalkin
Copy link
Contributor

abalkin commented May 11, 2016

Truncation logic may also be dtype specific.

@charris
Copy link
Member

charris commented May 14, 2016

Is there a fix for this appropriate for 1.11.1?

@charris charris changed the title error in filling mask in masked arrays BUG: error in printing masked arrays May 14, 2016
@saimn
Copy link
Contributor

saimn commented May 17, 2016

It seems that Numpy starts to truncate the array when it has more than 1000 elements, so for the 1D case, setting _print_width to something greater than 1000 should do the job.
But for 2D and more, the current value is fine. What would be the best way to distinguish the 2 cases ? Adding another _print_width_1d variable ?

@charris
Copy link
Member

charris commented May 22, 2016

@saimn That sounds good as a quick fix. Long term, I think we should figure out a better way of printing masked arrays.

saimn added a commit to saimn/numpy that referenced this issue May 22, 2016
Ref numpy#7621. numpy#6748 added `np.ma.MaskedArray._print_width` which is used to cut
a masked array before printing it (to save memory and cpu time during the
conversion to the object dtype). But this doesn't work correctly for 1D arrays,
for which up to 1000 values can be printed before cutting the array.

So this commit adds a new class variable `_print_width_1d` to handle the 1D case
separately.
charris pushed a commit to charris/numpy that referenced this issue May 23, 2016
Ref numpy#7621. numpy#6748 added `np.ma.MaskedArray._print_width` which is used to cut
a masked array before printing it (to save memory and cpu time during the
conversion to the object dtype). But this doesn't work correctly for 1D arrays,
for which up to 1000 values can be printed before cutting the array.

So this commit adds a new class variable `_print_width_1d` to handle the 1D case
separately.
@charris
Copy link
Member

charris commented May 23, 2016

Should be fixed by #7658. Closing, but woud be good if folks would give the fix a shot and see if they can cause trouble.

@charris charris closed this as completed May 23, 2016
@fonnesbeck
Copy link
Author

This fixed the issue for me, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants