Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DOC Rename 'default' to 'native' for set_output SLEP #78

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

thomasjpfan
Copy link
Member

Based on the discussion in scikit-learn/scikit-learn#23734 (review), reviewers concluded that "default" was not descriptive enough. The PR was updated to use "native" instead.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving my proposal ;)

@adrinjalali
Copy link
Member

So "native" means convert to numpy.ndarray or scipy.sparse? I personally find "default" more descriptive in this case. I'm not sure what's native about numpy/scipy from the perspective of a user.

@glemaitre
Copy link
Member

Just for the context, refer to scikit-learn/scikit-learn#23734 (comment)

Both "default" and "native" are not passepartout names. "Native" would imply that this is the native format given by the estimator that could optionally depend on the inputs. I would feel that "default" should mean a single type of output.

But still, both terms are really generic and are used to define something that is ill-defined (if not reading the estimators' documentation).

Bottom line, I don't have a strong opinion and I don't have a better proposal :)

@adrinjalali
Copy link
Member

Would "unchanged" be a better name there? As in, we don't try to change the output from whatever's produced by the transformer.

@glemaitre
Copy link
Member

It was a proposal from @thomasjpfan and I would also be fine with it.

@lorentzenchr
Copy link
Member

As I already said

I'm happy with any other default value than "default", which I would consider bad design because it does not tell anything about the actual behaviour.

That means I favor a string that gives a hint at what it actually does. The meaning is - please correct me: A scikit-learn transformer outputs numpy.ndarray or scipy.sparse. This does not guarantee anything about 3rd party transformers.

How about "ndarray_or_sparse"? For sure, I‘m also ok with "unchanged ".

@thomasjpfan
Copy link
Member Author

thomasjpfan commented Oct 12, 2022

With the Array API PR, scikit-learn can also output cupy.array_api.Array, so the more descriptive name is "array-like_or_sparse".

I do not really like "unchanged" anymore. For example, it leads to this UX:

# By default, `est` returns an ndarray
est.transform(X_df) # this is an ndarray

est.set_output(transform="pandas")
est.transform(X_df) # this is a dataframe

est.set_output(transform="unchanged")
est.transform(X_df) # this is a ndarray??

I originally choose "default" to mean:

  • "what the default transformer outputs"
  • "what a not configured transformer outputs"

Maybe... "not-configured", "original", or "initial"?

@lorentzenchr
Copy link
Member

lorentzenchr commented Oct 12, 2022

Some options for the default value of set_output with the meaning "output what you get out of estimator.transform() without modifying anything". Please add your choice.

nr default value advantage disadvantage
1 "default" concise, no SLEP change needed unspecific: default of what?
2 "native" concise unspecific: native to what?
3 "unchanged" transform does change the input, e.g. a df as input may output a ndarray
4 "not-configured"
5 "original"
6 "initial"
7 "array-like_or_sparse" very specific long (maybe wrong for 3rd party estimators)
8 "estimator_default" clear meaning long
9 None pythonic default value not an option as it is used as sentinel value internally

@lorentzenchr
Copy link
Member

After a good 🏃, I withdraw my concerns for "default". If this option has consensus among the core devs, let's go with it and document well what it means exactly.

@amueller
Copy link
Member

so... close and leave as-is? I think in the end none of the words easily captures what's going on and folks will have to read the docs

@glemaitre
Copy link
Member

Agreed with @amueller. Closing then.

@glemaitre glemaitre closed this Oct 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants