Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Pandas Copy-on-Write mode should be enabled in all tests #27879

Open
@s-banach

Description

@s-banach

Describe the bug

Pandas COW will be enabled by default in version 3.0.
For example, today I just found that TargetEncoder doesn't work properly with it enabled.
There are probably many other examples that could be uncovered by testing.

Steps/Code to Reproduce

import pandas as pd
from sklearn.preprocessing import TargetEncoder
pd.options.mode.copy_on_write = True

df = pd.DataFrame({
    "x": ["a", "b", "c", "c"],
    "y": [4., 5., 6., 7.]
})
t = TargetEncoder(target_type="continuous")
t.fit(df[["x"]], df["y"])

Expected Results

No error.

Actual Results

ValueError                                Traceback (most recent call last)
Cell In[2], line 10
      5 df = pd.DataFrame({
      6     "x": ["a", "b", "c", "c"],
      7     "y": [4., 5., 6., 7.]
      8 })
      9 t = TargetEncoder(target_type="continuous")
---> 10 t.fit(df[["x"]], df["y"])

File ~/.conda/envs/jhop311/lib/python3.11/site-packages/sklearn/base.py:1152, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs)
   1145     estimator._validate_params()
   1147 with config_context(
   1148     skip_parameter_validation=(
   1149         prefer_skip_nested_validation or global_skip_validation
   1150     )
   1151 ):
-> 1152     return fit_method(estimator, *args, **kwargs)

File ~/.conda/envs/jhop311/lib/python3.11/site-packages/sklearn/preprocessing/_target_encoder.py:203, in TargetEncoder.fit(self, X, y)
    186 @_fit_context(prefer_skip_nested_validation=True)
    187 def fit(self, X, y):
    188     """Fit the :class:`TargetEncoder` to X and y.
    189 
    190     Parameters
   (...)
    201         Fitted encoder.
    202     """
--> 203     self._fit_encodings_all(X, y)
    204     return self

File ~/.conda/envs/jhop311/lib/python3.11/site-packages/sklearn/preprocessing/_target_encoder.py:332, in TargetEncoder._fit_encodings_all(self, X, y)
    330 if self.smooth == "auto":
    331     y_variance = np.var(y)
--> 332     self.encodings_ = _fit_encoding_fast_auto_smooth(
    333         X_ordinal, y, n_categories, self.target_mean_, y_variance
    334     )
    335 else:
    336     self.encodings_ = _fit_encoding_fast(
    337         X_ordinal, y, n_categories, self.smooth, self.target_mean_
    338     )

File sklearn/preprocessing/_target_encoder_fast.pyx:82, in sklearn.preprocessing._target_encoder_fast._fit_encoding_fast_auto_smooth()

File stringsource:660, in View.MemoryView.memoryview_cwrapper()

File stringsource:350, in View.MemoryView.memoryview.__cinit__()

ValueError: buffer source array is read-only

Versions

System:
    python: 3.11.3 | packaged by conda-forge | (main, Apr  6 2023, 08:57:19) [GCC 11.3.0]
executable: /home/jhopfens/.conda/envs/jhop311/bin/python
   machine: Linux-3.10.0-1160.99.1.el7.x86_64-x86_64-with-glibc2.17

Python dependencies:
      sklearn: 1.3.2
          pip: 23.0.1
   setuptools: 67.6.1
        numpy: 1.25.2
        scipy: 1.11.2
       Cython: 3.0.0
       pandas: 2.1.0
   matplotlib: 3.7.2
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/jhopfens/.conda/envs/jhop311/lib/python3.11/site-packages/numpy.libs/libopenblas64_p-r0-5007b62f.3.23.dev.so
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: SkylakeX
    num_threads: 64

       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /home/jhopfens/.conda/envs/jhop311/lib/python3.11/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None
    num_threads: 128

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/jhopfens/.conda/envs/jhop311/lib/python3.11/site-packages/scipy.libs/libopenblasp-r0-23e5df77.3.21.dev.so
        version: 0.3.21.dev
threading_layer: pthreads
   architecture: SkylakeX
    num_threads: 64

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions