Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

ghost
Copy link

@ghost ghost commented Sep 28, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Fixes #18347

Any other comments?

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Ansur!

Also please add an entry to the change log at doc/whats_new/v0.24.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself with :user:.

multi_class=multi_class)
clf.fit(X, y, sample_weight=W)
actual = W.sum()
assert expected == actual, 'Sum of weight before ({}) should be the same as sum if weight after ({})'.format(expected, actual)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to keep lines shorter than 80 char, and also to avoid exact float comparison

Suggested change
assert expected == actual, 'Sum of weight before ({}) should be the same as sum if weight after ({})'.format(expected, actual)
msg = (
f'Sum of weight before ({expected}) should be the same as'
f'sum if weight after ({actual})'
)
assert_allclose(expected, actual, err_msg=msg)

Copy link
Author

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added whats_new info on pr 18480

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @Ansur !

Comment on lines +1870 to +1883
@pytest.mark.parametrize("multi_class", {'ovr', 'multinomial', 'auto'})
def test_sample_weight_not_modified(multi_class):
X, y = load_iris(return_X_y=True)
np.random.seed(1234)
W = np.random.random(len(X)) * 10.0

for weight in [{0: 1.0, 1: 10.0, 2: 1.0}]:
for class_weight in (weight, 'balanced'):
expected = W.sum()

clf = LogisticRegression(random_state=0,
class_weight=class_weight,
max_iter=200,
multi_class=multi_class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can take advantage of pytest.parametrize:

@pytest.mark.parametrize("multi_class", {'ovr', 'multinomial', 'auto'})
@pytest.mark.parametrize("class_weight", [
    {0: 1.0, 1: 10.0, 2: 1.0}, 'balanced'
])
def test_sample_weight_not_modified(multi_class, class_weight):
    X, y = load_iris(return_X_y=True)
    n_features = len(X)
    W = np.ones(n_features)
    W[:n_features // 2] = 2

    expected = W.sum()
	...

class_weight_ = compute_class_weight(class_weight,
classes=classes, y=y)
sample_weight *= class_weight_[le.fit_transform(y)]
sample_weight = sample_weight * class_weight_[le.fit_transform(y)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rth instead of doing this, would it be better to have copy=False/True within _check_sample_weight signature?

@adrinjalali adrinjalali deleted the branch scikit-learn:master January 22, 2021 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use of class_weight dictionary in LogisticRegression modifies sample_weight object
5 participants