Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@SuccessMoses
Copy link
Contributor

Reference Issues/PRs

Fixes #16532

What does this implement/fix? Explain your changes.

  • add parameter return_X_y to make_classification

Any other comments?

The dataset returned by load_iris is a Bunch, which is more descriptive. #16532 proposes same.

from sklearn.datasets import load_iris

data = load_iris()
print(data.DESCR)  # Prints a description of the Iris dataset

@github-actions
Copy link

github-actions bot commented Nov 2, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: a5754db. Link to the linter CI: here

@SuccessMoses
Copy link
Contributor Author

@adrinjalali could you review this?

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a look at fetch_openml on how to document return_X_y and the documentation of the return values.

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @SuccessMoses
I added a few comment

@SuccessMoses
Copy link
Contributor Author

@OmarManzoor Thanks for the review. I am working on it

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @SuccessMoses

@OmarManzoor
Copy link
Contributor

CC: @adrinjalali

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

)
if len(weights) == n_classes - 1:
if isinstance(weights, list):
weights = weights + [1.0 - sum(weights)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't modifying the existing variable, it's allocating a new chunk of memory, and the nameweights refers to that new chunk. So the original data passed to this function is never changed. Therefore this change in this PR is unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not intend to modify the variable weights passed by the user this is why I created new variable weights_

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You won't be changing the original variable, that's now how python works 😉

Investigate this example:

import numpy as np

a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

def f(a):
    a = a + 1
    return a

print(f(a))
print(a)

Copy link
Contributor Author

@SuccessMoses SuccessMoses Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't modifying the existing variable, it's allocating a new chunk of memory,

Thank you @adrinjalali for the correction. make_classification returns a bunch object which also contains a dictionary of the original value of parameters like n_samples, n_features and weights that was used to generate the data.

https://github.com/SuccessMoses/scikit-learn/blob/test/sklearn/datasets/_samples_generator.py#L356-L383

reassigning the variable weights will change its content. Is there a way to work around this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I'd missed that

@adrinjalali adrinjalali merged commit c9aeb15 into scikit-learn:main Jan 3, 2025
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

make_classification (samples_generator.py) explains itself

3 participants