-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
ENH Add parameter return_X_y to make_classification
#30196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@adrinjalali could you review this? |
adrinjalali
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have a look at fetch_openml on how to document return_X_y and the documentation of the return values.
doc/whats_new/upcoming_changes/sklearn.datasets/30196.enhancement.rst
Outdated
Show resolved
Hide resolved
Co-authored-by: Adrin Jalali <[email protected]>
OmarManzoor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @SuccessMoses
I added a few comment
|
@OmarManzoor Thanks for the review. I am working on it |
OmarManzoor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @SuccessMoses
|
CC: @adrinjalali |
adrinjalali
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
| ) | ||
| if len(weights) == n_classes - 1: | ||
| if isinstance(weights, list): | ||
| weights = weights + [1.0 - sum(weights)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't modifying the existing variable, it's allocating a new chunk of memory, and the nameweights refers to that new chunk. So the original data passed to this function is never changed. Therefore this change in this PR is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not intend to modify the variable weights passed by the user this is why I created new variable weights_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You won't be changing the original variable, that's now how python works 😉
Investigate this example:
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
def f(a):
a = a + 1
return a
print(f(a))
print(a)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't modifying the existing variable, it's allocating a new chunk of memory,
Thank you @adrinjalali for the correction. make_classification returns a bunch object which also contains a dictionary of the original value of parameters like n_samples, n_features and weights that was used to generate the data.
reassigning the variable weights will change its content. Is there a way to work around this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I'd missed that
Reference Issues/PRs
Fixes #16532
What does this implement/fix? Explain your changes.
make_classificationAny other comments?
The dataset returned by
load_irisis a Bunch, which is more descriptive. #16532 proposes same.