Confusing error message in OneHotEncoder with None-encoded missing values

Sister issue for #16703 (`OrdinalEncoder`)

## Code to reproduce

```python
import pandas as pd
from sklearn.preprocessing import OneHotEncoder


df = pd.DataFrame({"cat_feature": ["a", None, "b", "a"]})
OneHotEncoder().fit(df)
```

## Observed result

Got: TypeError: '<' not supported between instances of 'str' and 'NoneType'

Full traceback:

<details>

```python-traceback
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/code/scikit-learn/sklearn/preprocessing/_label.py in _encode(values, uniques, encode, check_unknown)
    111         try:
--> 112             res = _encode_python(values, uniques, encode)
    113         except TypeError:

~/code/scikit-learn/sklearn/preprocessing/_label.py in _encode_python(values, uniques, encode)
     59     if uniques is None:
---> 60         uniques = sorted(set(values))
     61         uniques = np.array(uniques, dtype=values.dtype)

TypeError: '<' not supported between instances of 'str' and 'NoneType'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-31-4b176a24c3a2> in <module>
      4 
      5 df = pd.DataFrame({"cat_feature": ["a", None, "b", "a"]})
----> 6 OneHotEncoder().fit(df)

~/code/scikit-learn/sklearn/preprocessing/_encoders.py in fit(self, X, y)
    375         """
    376         self._validate_keywords()
--> 377         self._fit(X, handle_unknown=self.handle_unknown)
    378         self.drop_idx_ = self._compute_drop_idx()
    379         return self

~/code/scikit-learn/sklearn/preprocessing/_encoders.py in _fit(self, X, handle_unknown)
     84             Xi = X_list[i]
     85             if self.categories == 'auto':
---> 86                 cats = _encode(Xi)
     87             else:
     88                 cats = np.array(self.categories[i], dtype=Xi.dtype)

~/code/scikit-learn/sklearn/preprocessing/_label.py in _encode(values, uniques, encode, check_unknown)
    112             res = _encode_python(values, uniques, encode)
    113         except TypeError:
--> 114             raise TypeError("argument must be a string or number")
    115         return res
    116     else:

Sister issue for #16703 (`OrdinalEncoder`)

TypeError: argument must be a string or number
```

</details>

## Expected result

A more informative `ValueError`, for instance:

```
ValueError: OneHotEncoder does not accept None typed values. Missing values should be imputed first, for instance using sklearn.preprocessing.SimpleImputer.
```

Maybe we could even include the URL of some FAQ or example that shows how to deal with a mix of str and None typed values and use the following prior to OneHotEncoding:


```python
SimpleImputer(strategy="constant", missing_values=None, fill_value="missing")
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Confusing error message in OneHotEncoder with None-encoded missing values #16702

Code to reproduce

Observed result

Expected result

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Confusing error message in OneHotEncoder with None-encoded missing values #16702

Description

Code to reproduce

Observed result

Expected result

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions