Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH: add quoting support to np.genfromtxt #14577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

hypercubestart
Copy link
Contributor

Fixes #2211

Example:
test.txt content:

"This is my text, that has a comma inside","Other value","3"
"Another text, with coma","More text, with comma",5

Previous behavior:

>>> np.genfromtxt('test.txt', delimiter=',', encoding=None, dtype=None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/anaconda3/envs/default/lib/python3.7/site-packages/numpy/lib/npyio.py", line 2089, in genfromtxt
    raise ValueError(errmsg)
ValueError: Some errors were detected !
    Line #2 (got 5 columns instead of 4)

Expected Behavior:

>>> np.genfromtxt('test.txt', delimiter=',', quoter='"', encoding=None, dtype=None)
array([('This is my text, that has a comma inside', 'Other value', 3),
       ('Another text, with coma', 'More text, with comma', 5)],
      dtype=[('f0', '<U40'), ('f1', '<U21'), ('f2', '<i8')])

@eric-wieser
Copy link
Member

Does this handle "strings containing an escaped \" character"? (and should it?)

@seberg
Copy link
Member

seberg commented Sep 22, 2019

My libreoffice seems to use double "" to escape the quotes. Not sure if there is a csv standard about that, but it would may good to support common csv methods if we improve things here? OTOH, right now pandas and read_csv are probably simply better in either case.

@hypercubestart
Copy link
Contributor Author

Does this handle "strings containing an escaped \" character"? (and should it?)

It does not handle an escaped character. I referenced pandas.read_csv which does not handle escaped characters either, so I decided to copy its behavior.

@seberg
Copy link
Member

seberg commented Sep 22, 2019

read_csv has a doublequote option for this purpose (which defaults to True). There is also an escapechar option which probably supports Erics use case. Hmmm...

Maybe we should just put pandas.read_csv in the See Also section with a note right now (although not sure a cross-project link in that section works well)?

@eric-wieser
Copy link
Member

eric-wieser commented Sep 22, 2019

I wonder also if we should just provide a thin wrapper around the builtin csv module? Or provide a recipe in the docs for using np.from_iter from the output of csv.reader?

@charris
Copy link
Member

charris commented Sep 22, 2019

There has been previous discussion of bringing Pandas csv code into NumPy, but no one has pursued actually doing it.

@hypercubestart
Copy link
Contributor Author

@charris do you have a link to the previous discussion?

@hypercubestart
Copy link
Contributor Author

I think that pandas.read_csv will always be the better and more complete choice for users, so adding a reference to the pandas.read_csv docs like @seberg suggested would be preferential if possible. However, in this case, I am not sure whether NumPy even needs the genfromtxt function, and if it does, to what extent of cases should the genfromtxt function handle, since pandas.read_csv is much more robust at the moment.

I think one course of action may be to:

  • add a reference to pandas.read_csv as a reference
  • merge this pull request
  • create new issues for other potential cases of csv (eg. escape characters, double "" to escape quote)

@charris
Copy link
Member

charris commented Sep 23, 2019

do you have a link to the previous discussion?

No, it took place in a meeting at SciPy several years ago.

Base automatically changed from master to main March 4, 2021 02:04
@seberg
Copy link
Member

seberg commented Jun 29, 2022

Going to close this, loadtxt now has proper double quote support. We might repurpose the loadtxt to make a better genfromtext, but the approach in this PR Is too much at odds with the typical quoting support as loadtxt has it now.

@seberg seberg closed this Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

genfromtxt reading quoted csv files enhancement (Trac #1615)
4 participants