align equality check in yeo johnson transform #436

veni-vidi-vici-dormivi · 2024-05-02T12:52:33Z

Changing the comparison to eps in _yeo_johnson_transform to be consistent with sklearn's yet-johnsons power transform.

codecov · 2024-05-02T12:54:47Z

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 81.94%. Comparing base (9b0b76b) to head (0bfe836).
Report is 79 commits behind head on main.

Files	Patch %	Lines
mesmer/mesmer_m/power_transformer.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #436      +/-   ##
==========================================
- Coverage   87.90%   81.94%   -5.96%     
==========================================
  Files          40       44       +4     
  Lines        1745     1939     +194     
==========================================
+ Hits         1534     1589      +55     
- Misses        211      350     +139

Flag	Coverage Δ
unittests	`81.94% <0.00%> (-5.96%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mathause · 2024-05-03T12:23:10Z

Can you define

eps = np.spacing(np.float64(1))

or actually, if we don't do it for an array

eps = np.finfo(np.float64).eps

?
That would be clearer to me (not sure if here or in another PR, though).

veni-vidi-vici-dormivi · 2024-05-03T13:31:41Z

I think another PR would make sense.

veni-vidi-vici-dormivi · 2024-05-03T14:42:18Z

Then let's merge this then?

mathause · 2024-05-06T08:54:44Z

Can you merge main - I need to think about this.

veni-vidi-vici-dormivi · 2024-05-06T11:04:53Z

The question is, if when lambda is exactly eps, if we consider that to be zero, or not. I just wanted to make it consistent with the sklearn function. They however are not consistent themselves I think because they write if abs(lmbda) < np.spacing(1.0) for $\lambda = 0$ but then if not abs(lmbda - 2) > np.spacing(1.0) for $\lambda = 2$...

# when x >= 0
if abs(lmbda) < np.spacing(1.0):
    out[pos] = np.log1p(x[pos])
else:  # lmbda != 0
    out[pos] = (np.power(x[pos] + 1, lmbda) - 1) / lmbda

# when x < 0
if abs(lmbda - 2) > np.spacing(1.0):
    out[~pos] = -(np.power(-x[~pos] + 1, 2 - lmbda) - 1) / (2 - lmbda)
else:  # lmbda == 2
    out[~pos] = -np.log1p(-x[~pos])

Maybe that's the reason why Shruti wrote it herself? Because for the inverse transform she actually just uses the once from sklearn...

mesmer/mesmer/mesmer_m/power_transformer.py

Line 272 in 03ab48c

inverted_monthly_T[j, i] = self._yeo_johnson_inverse_transform(

mathause · 2024-05-06T12:43:59Z

Shruti wrote this herself so she could have variable (or dependent) $\lambda$ values.

I think the >= / > mess comes because the original author also confused that $\lambda$ can be any real value and does not have to be in 0..2 (also #430 (comment)). The value problem was fixed but this particular inconsistency remained - see scikit-learn/scikit-learn#12522 (Assuming 0 <= lambda <= 2 the operators make sense.)

mathause · 2024-05-06T12:51:42Z

scipy does the same but I think it's written by the same author: https://github.com/scipy/scipy/blob/7dcd8c59933524986923cde8e9126f5fc2e6b30b/scipy/stats/_morestats.py#L1572

veni-vidi-vici-dormivi · 2024-05-06T12:51:55Z

Shruti wrote this herself so she could have variable (or dependent) values.

Hm, I mean we could also pass every (value, lambda) pair to the sklearn power transform no? Like we do for the inverse transform.

$\lambda$ can be any real value and does not have to be in 0..2

In our case it is because lambda is derived from a logistic function.

veni-vidi-vici-dormivi · 2024-05-06T12:55:37Z

I don't get how it makes sense that in one case $\lambda == eps$ is not contained in the case and in the other it is?

mathause · 2024-05-06T13:00:31Z

Shruti wrote this herself so she could have variable (or dependent) values.

Hm, I mean we could also pass every (value, lambda) pair to the sklearn power transform no? Like we do for the inverse transform.

Yes, but then we have to check if this is vectorized - otherwise it will be too slow.

λ can be any real value and does not have to be in 0..2

In our case it is because lambda is derived from a logistic function.

Ah ok, sorry - it's too many open PRs and comments. But then this is a function of the covariate function and it's not optimal if this is in

mesmer/mesmer/mesmer_m/power_transformer.py

Line 219 in 10c8c43

def _get_yeo_johnson_lambdas(self, yearly_T):

(Technically the user will not be able to easily replace the lambda_function but it's misleading when the bounds are associated with the Yeo-Johnson transform and not with the covariate function. And almost impossible to find out if the lambda_function should ever be changed...)

veni-vidi-vici-dormivi · 2024-05-06T13:07:08Z

I agree, it is pretty hard to see through it all. The get_yeo_johnson_lambdas is pretty horrible. I think we could get rid of it when rewriting the whole thing for xarrays. Maybe we should think about passing the lambda function as argument...
But in this PR I actually just wanted to fix the operator, to be the same as in sklearn. Am I missing something?

mathause · 2024-05-06T13:20:01Z

But in this PR I actually just wanted to fix the operator, to be the same as in sklearn. Am I missing something?

No - I was trying to understand why it's inconsistent in scikit-learn (and I think I do now).

mathause

Ok, some more comments.

I don't think what we choose will matter in practice.

eps is much smaller for 0 than for 1

np.spacing([0, 1, 2])

array([4.94065646e-324, 2.22044605e-016, 4.44089210e-016])

That means there are values between 0 and eps (otherwise np.abs(lambdas) < eps would be equal to np.abs(lambdas) == 0)

Ok, let's merge this but I think we need to fix the title (we don't change a sign). Maybe: "align equality check in yeo johnson transform"?

for more information, see https://pre-commit.ci

* align equality check in yeo johnson transform with scikit learn yeo-johnson transform * add NOTE * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

fix yeo johnson transform sign

a9d9b45

Merge branch 'main' into pt_fix

68b40d6

mathause approved these changes May 14, 2024

View reviewed changes

veni-vidi-vici-dormivi changed the title ~~Fix sign in yeo johnson transform~~ align equality check in yeo johnson transform May 15, 2024

veni-vidi-vici-dormivi and others added 2 commits May 16, 2024 17:07

add NOTE

17654a4

[pre-commit.ci] auto fixes from pre-commit.com hooks

0bfe836

for more information, see https://pre-commit.ci

veni-vidi-vici-dormivi merged commit c1c0cf8 into MESMER-group:main May 17, 2024

veni-vidi-vici-dormivi deleted the pt_fix branch May 17, 2024 09:09

align equality check in yeo johnson transform #436

align equality check in yeo johnson transform #436

Uh oh!

Conversation

veni-vidi-vici-dormivi commented May 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented May 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mathause commented May 3, 2024

Uh oh!

veni-vidi-vici-dormivi commented May 3, 2024

Uh oh!

veni-vidi-vici-dormivi commented May 3, 2024

Uh oh!

mathause commented May 6, 2024

Uh oh!

veni-vidi-vici-dormivi commented May 6, 2024 • edited by mathause Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mathause commented May 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mathause commented May 6, 2024

Uh oh!

veni-vidi-vici-dormivi commented May 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

veni-vidi-vici-dormivi commented May 6, 2024

Uh oh!

mathause commented May 6, 2024

Uh oh!

veni-vidi-vici-dormivi commented May 6, 2024

Uh oh!

mathause commented May 6, 2024

Uh oh!

mathause left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

veni-vidi-vici-dormivi commented May 2, 2024 •

edited

Loading

codecov bot commented May 2, 2024 •

edited

Loading

veni-vidi-vici-dormivi commented May 6, 2024 •

edited by mathause

Loading

mathause commented May 6, 2024 •

edited

Loading

veni-vidi-vici-dormivi commented May 6, 2024 •

edited

Loading