Thanks to visit codestin.com
Credit goes to github.com

Skip to content

RFC Trigger a copy when copy=False and X is read-only #28824

Closed
@jeremiedbb

Description

@jeremiedbb

Highly related to #14481 and maybe a little bit to #13986.

My understanding of the copy=False parameter of estimators is "allow inplace modifications of X".

When avoiding a copy is not possible (X doesn't have the right dtype or memory layout for instance), a copy is still triggered. I believe that X being read-only is a valid reason for still triggering a copy.

My main argument is that the user isn't always in control of the permissions of an input array within the whole pipeline. Especially when joblib parallelism is enabled, which may create read-only memmaps. We've have a bunch of issues because of that, the latest being #28781. And it's poorly tested because it requires big arrays which we try to avoid in the tests (although joblib 1.13 makes it easy to trigger with small arrays).

I wouldn't make check_array(copy=False) always trigger a copy when X is read-only because the semantic of the copy param of check_array is not the same as the one of estimators. We could introduce a new param in check_array, like copy_if_readonly ?

  • Estimator has no copy param (i.e.) doesn't intend to do inplace modification:
    check_array(copy=False, copy_if_readonly=False)
  • Estimator has copy param:
    check_array(copy=self.copy, copy_if_readonly=True)

It could also be a third option for copy in check_array: True, False, "if_readonly":

  • Estimator has no copy param:
    check_array(copy=False)
  • Estimator has copy param:
    check_array(copy=self.copy or "if_readonly")

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions