Thanks to visit codestin.com
Credit goes to github.com

Skip to content

genfromtxt reading quoted csv files enhancement (Trac #1615) #2211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
numpy-gitbot opened this issue Oct 19, 2012 · 13 comments
Open

genfromtxt reading quoted csv files enhancement (Trac #1615) #2211

numpy-gitbot opened this issue Oct 19, 2012 · 13 comments

Comments

@numpy-gitbot
Copy link

Original ticket http://projects.scipy.org/numpy/ticket/1615 on 2010-09-15 by trac user alefnula, assigned to unknown.

genfromtxt cannot handle the csv files that use quoting. For example:

"This is my text, that has a comma inside","Other value","3"
"Another text, with coma","More text, with comma",5

This is a csv text where the delimiter is ",", but the values also contain ",".

Here is the pach that enables the user to specify the quoter (the quoting character). The default behaviour is the same as the old behaviour of genfromtxt function, but if the quoter is set, quoting is taken into account.

Patch is in the attachment.

@numpy-gitbot
Copy link
Author

trac user alefnula wrote on 2010-09-15

I had a mistake in the example csv, it should look like:

"This is my text, that has a comma inside","Other value","3"[[BR]]
"Another text, with coma","More text, with comma","5"[[BR]]
...

@numpy-gitbot
Copy link
Author

Attachment added by trac user alefnula on 2010-09-15: patch.diff

@numpy-gitbot
Copy link
Author

@rgommers wrote on 2011-03-31

A unit test would be helpful.

@jseidel
Copy link

jseidel commented Mar 5, 2013

Will this issue be fixed?

@charris
Copy link
Member

charris commented Feb 19, 2014

@jseidel Want to open an PR?

ddasilva pushed a commit to ddasilva/numpy that referenced this issue Apr 6, 2014
Based off patch in numpy#2211. Implements custom split method and adds keywd.
@hypercubestart
Copy link
Contributor

@mattip is this an issue I can work on?

@mattip
Copy link
Member

mattip commented Sep 19, 2019

We don't usually assign issues, the rule is that the first person to file a PR should link back to the issue to indicate they are giving it a try. Note that the patch is a broken link so you will have to recreate it. Tests are of course critical,

@ddasilva
Copy link
Contributor

Is there a reason to use np.genfromtxt() over pandas' pd.read_csv() in 2019?

@hypercubestart
Copy link
Contributor

@ddasilva I'm not sure if this counts as a reason, but one advantage could be that the output is a ndarray rather than a pd.Dataframe. I do see your point though sinice pd.read_csv() does seem to be the clear choice over np.genfromtxt() for most people

@tinaoberoi
Copy link
Contributor

@hypercubestart @eric-wieser is this issue still active ?

@hypercubestart
Copy link
Contributor

@tinaoberoi not sure, but from the discussion here #14577 it feels like the general sentiment is to just use pandas.read_csv

@tinaoberoi
Copy link
Contributor

I would like to work on this issue, I followed the earlier PRs and the discussion. I agree with @hypercubestart suggestion of:
adding a reference of pandas.read_csv
I would like to make sure before adding the PR.
@rgommers @mattip

@rgommers
Copy link
Member

@tinaoberoi yes that seems like a good solution. genfromtxt is quite complex and basically beyond rescue. So just saying something in the Notes section like "for data with commas inside quoted strings or other corner cases that genfromtxt struggles with, using pandas.read_csv is a good alternative".

Longer-term, @WarrenWeckesser is working on a better text reader for numpy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants