Inconsistency between liac-arff and pandas parser in fetch_openml

From https://github.com/fairlearn/fairlearn/pull/1166, we have an inconsistency between liac-arff and pandas parser.

From the ARFF specs, the leading whitespaces are ignored if not between quotes. The pandas `read_csv` will include this space by default. E.g.

```python
>>> import sklearn.datasets as skd
>>> d = skd.fetch_openml(data_id=1590, as_frame=True, parser='pandas')
>>> d.target
0         <=50K
1         <=50K
2          >50K
3          >50K
4         <=50K
          ...  
48837     <=50K
48838      >50K
48839     <=50K
48840     <=50K
48841      >50K
Name: class, Length: 48842, dtype: category
Categories (2, object): [' <=50K', ' >50K']
```

I am unsure that we can easily solve the issue because once read by `read_csv`, we don't have the information about the quotes anymore. I assume that the best that we can provide is to pass any additional keyword argument to `read_csv` to make it flexible enough.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Inconsistency between liac-arff and pandas parser in fetch_openml #25311

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Inconsistency between liac-arff and pandas parser in fetch_openml #25311

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions