Preserving dtype for float32 / float64 in transformers

This is the issue which we want to tackle during the Man AHL Hackathon.

We would like that the transformer does not convert float32 to float64 whenever possible.  The transformers which are currently failing are:

- [x] BernoulliRBM (#24318)
- [x] Birch (#22968)
- [x] CCA - inherits from PLS, so also not applicable
- [x] DictionaryLearning (#22002)
- [ ] FactorAnalysis (#24321)
- [x] FastICA
- [ ] FeatureAgglomeration (#24346)
- [x] GaussianRandomProjection (#22114)
- [x] GenericUnivariateSelect
- [x] Isomap (#24714)
- [x] LatentDirichletAllocation (#22113)
- [x] LinearDiscriminantAnalysis (#13273)
- [ ] LocallyLinearEmbedding (#24337)
- [x] LogisticRegression (#13243)
- [x] MiniBatchDictionaryLearning
- [x] MiniBatchSparsePCA (#22111)
- [x] NMF
- [x] PLSCanonical   - not applicable as both `X` and `y` are used
- [x] PLSRegression  - not applicable as both `X` and `y` are used
- [x] PLSSVD   -  not applicable as both `X` and `y` are used
- [x] RBFSampler (#24317)
- [x] RidgeRegresssion (#13302)
- [x] SGDClassifier/SGDRegressor (#9084)(#13346)
- [x] SkewedChi2Sampler
- [x] SparsePCA (#22111)
- [x] SparseRandomProjection (#22114)

We could think to extend it to integer whenever possible and applicable.

Also the following transformers are not included in the common tests. We should write a specific test:

```python
# some strange ones                                                                                                                                                         
DONT_TEST = ['SparseCoder', 'DictVectorizer',                                                                                                                               
             'TfidfTransformer',                                                                                                                     
             'TfidfVectorizer' (check 10443), 'IsotonicRegression',                                                                                                                       
             'CategoricalEncoder',                                                                                                 
             'FeatureHasher',                                                                                             
             'TruncatedSVD', 'PolynomialFeatures',                                                                                                                          
             'GaussianRandomProjectionHash', 'HashingVectorizer',                                                                                                           
             'CountVectorizer']
```

We could also check classifiers, regressors or clusterers (see https://github.com/scikit-learn/scikit-learn/issues/8769 for more context),

- [ ] AffinitiyPropagation -> Bug in #10832
- [ ] check SVC -> #10713 


Below the code executed to find the failure.
<details>

```python
# Let's check the 32 - 64 bits type conservation.                                                                                                                       
if isinstance(X, np.ndarray):                                                                                                                                           
    for dtype in [np.float32, np.float64]:                                                                                                                              
        X_cast = X.astype(dtype)                                                                                                                                        
                                                                                                                                                                            
        transformer = clone(transformer_orig)                                                                                                                           
        set_random_state(transformer)                                                                                                                                   
                                                                                                                                                                            
        if hasattr(transformer, 'fit_transform'):                                                                                                                       
            X_trans = transformer.fit_transform(X_cast, y_)                                                                                                             
        elif hasattr(transformer, 'fit_transform'):                                                                                                                     
            transformer.fit(X_cast, y_)                                                                                                                                 
            X_trans = transformer.transform(X_cast)                                                                                                                     
                                                                                                                                                                            
        # FIXME: should we check that the dtype of some attributes are the                                                                                              
        # same than dtype.                                                                                                                                              
        assert X_trans.dtype == X_cast.dtype, 'transform dtype: {} - original dtype: {}'.format(X_trans.dtype, X_cast.dtype)
```
</details>



## Tips to run the test for a specific transformer:

- Choose a transformer, for instance `FastICA`
- If this class does not already have a method named `_more_tags`: add the following code snippet at the bottom of the class definition:

```python
    def _more_tags(self):
        return {"preserves_dtype": [np.float64, np.float32]}
```
- Run the common tests for this specific class:
```
pytest sklearn/tests/test_common.py -k "FastICA and check_transformer_preserve_dtypes" -v
```
- It should fail: read the error message and try to understand why the `fit_transform` method (if it exists) or the `transform` method returns a `float64` data array when it is passed a `float32` input array.

It might be helpful to use a debugger, for instance by adding the line:

```python
import pdb; pdb.set_trace()
```

at the beginning of the `fit_transform` method and then re-rerunning pytest with:

```
pytest sklearn/tests/test_common.py -k "FastICA and check_transformer_preserve_dtypes" --pdb
```

Then using the `l` (list), `n` (next), `s` (step into a function call), `p some_array_variable.dtype` (`p` stands for print) and `c` (continue) commands to interactively debug the execution of this `fit_transform` call.


ping @rth feel free to edit this thread.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Preserving dtype for float32 / float64 in transformers #11000

Tips to run the test for a specific transformer:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Preserving dtype for float32 / float64 in transformers #11000

Description

Tips to run the test for a specific transformer:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions