BUG: lib: Fix handling of usecols=[] in loadtxt. #16632

WarrenWeckesser · 2020-06-18T10:41:16Z

Before this change, because of statements such as if usecols:,
usecols=[] was treated the same as usecols=None, and all the
columns were used. With this change, usecols=[] means "read no
columns", and an empty array is returned.

The error handling is improved: now if usecols is given, and the
number of columns in usecols does not equal the number of fields
in a given structured dtype, an exception is raised with a message
that explains the problem. Previously this mismatch would fail
with an incidental IndexError.

numpy/lib/npyio.py

Before this change, because of statements such as `if usecols:`, `usecols=[]` was treated the same as `usecols=None`, and all the columns were used. With this change, `usecols=[]` means "read no columns", and an empty array is returned. The error handling is improved: now if `usecols` is given, and the number of columns in `usecols` does not equal the number of fields in a given structured dtype, an exception is raised with a message that explains the problem. Previously this mismatch would fail with an incidental IndexError.

WarrenWeckesser · 2020-06-18T15:58:10Z

The maintenance PR (#16633) is merged, and this PR now has just the changes related to fixing the handling of usecols=[].

eric-wieser · 2020-06-18T16:23:54Z

numpy/lib/npyio.py

+        if len(usecols) == 0:
+            shp = (0, 0) if ndmin == 2 else (0,)
+            return np.empty(shp, dtype=dtype)


I'd perhaps expect an a (N, 0) array (or maybe (0, N)?) where N is the number of lines in the file.

I thought about doing that (specifically (N, 0), or (0, N) if unpack is True). But then I wondered if that was a foolish consistency. However, if in fact most people would expect (N, 0), then that's what it should do.

Is this early exit even needed? What happens if you let the rest of the function run, to avoiding needing a special case?

I think we should go with (N, 0), ideally it works without any special handling. Speed is not relevant for such a weird corner case anyway. It is a bit strange, but you could imaging reading K columns from multiple files and concatenating them or so, which only makes sense if the shape includes the N correctly.

I'm working on an update to return (N, 0). I just found a bug in how loadtxt handles nested structured dtypes, so I'll try to fix that, and then get back to this PR.

For future reference, the bug that I encountered is #16678.

eric-wieser · 2020-06-18T16:25:00Z

numpy/lib/npyio.py

@@ -1071,6 +1071,15 @@ def read_data(chunk_size):

    dtype_types, packing = flatten_dtype_internal(dtype)

+    if usecols is not None:
+        if len(dtype_types) > 1 and len(usecols) != len(dtype_types):


perhaps clearer as:

Suggested change

if len(dtype_types) > 1 and len(usecols) != len(dtype_types):

if len(dtype_types) <= 1:

pass # this is ok because ???

elif len(usecols) != len(dtype_types):

although i understand that this should be a rare case, maybe makes sense to make it work for len(dtype_types) >= 1 , for one element structured dtypes ?

seberg · 2022-02-08T14:19:08Z

Closing, the error is now (also coming from Warren via npreadtext I think):

TypeError: If a structured dtype is used, the number of columns in `usecols` must match the effective number of fields. But 2 usecols were given and the number of fields is 3.

Although, I guess we could also use ValueError as here.

WarrenWeckesser added 00 - Bug component: numpy.lib labels Jun 18, 2020

WarrenWeckesser mentioned this pull request Jun 18, 2020

ENH: lib: the usecols parameter of loadtxt now accepts a slice or callable. #15995

Closed

eric-wieser reviewed Jun 18, 2020

View reviewed changes

numpy/lib/npyio.py Outdated Show resolved Hide resolved

WarrenWeckesser marked this pull request as draft June 18, 2020 14:03

WarrenWeckesser force-pushed the empty-usecols branch from 68d7e55 to fc8f6b3 Compare June 18, 2020 14:57

WarrenWeckesser marked this pull request as ready for review June 18, 2020 15:24

eric-wieser reviewed Jun 18, 2020

View reviewed changes

Base automatically changed from master to main March 4, 2021 02:04

seberg closed this Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: lib: Fix handling of usecols=[] in loadtxt. #16632

BUG: lib: Fix handling of usecols=[] in loadtxt. #16632

Uh oh!

WarrenWeckesser commented Jun 18, 2020 •

edited

Loading

Uh oh!

Uh oh!

WarrenWeckesser commented Jun 18, 2020

Uh oh!

eric-wieser Jun 18, 2020

Uh oh!

WarrenWeckesser Jun 18, 2020

Uh oh!

eric-wieser Jun 18, 2020

Uh oh!

seberg Jun 19, 2020

Uh oh!

WarrenWeckesser Jun 19, 2020

Uh oh!

WarrenWeckesser Jun 24, 2020

Uh oh!

eric-wieser Jun 18, 2020

Uh oh!

anirudh2290 Jun 26, 2020

Uh oh!

seberg commented Feb 8, 2022

Uh oh!

Uh oh!

-        if len(dtype_types) > 1 and len(usecols) != len(dtype_types):
+        if len(dtype_types) <= 1:
+            pass  # this is ok because ???
+        elif len(usecols) != len(dtype_types):

Uh oh!

BUG: lib: Fix handling of usecols=[] in loadtxt. #16632

BUG: lib: Fix handling of usecols=[] in loadtxt. #16632

Uh oh!

Conversation

WarrenWeckesser commented Jun 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

WarrenWeckesser commented Jun 18, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg commented Feb 8, 2022

Uh oh!

Uh oh!

WarrenWeckesser commented Jun 18, 2020 •

edited

Loading