Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@veni-vidi-vici-dormivi
Copy link
Collaborator

@veni-vidi-vici-dormivi veni-vidi-vici-dormivi commented May 2, 2024

Changing the comparison to eps in _yeo_johnson_transform to be consistent with sklearn's yet-johnsons power transform.

@codecov
Copy link

codecov bot commented May 2, 2024

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 81.94%. Comparing base (9b0b76b) to head (0bfe836).
Report is 79 commits behind head on main.

Files Patch % Lines
mesmer/mesmer_m/power_transformer.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #436      +/-   ##
==========================================
- Coverage   87.90%   81.94%   -5.96%     
==========================================
  Files          40       44       +4     
  Lines        1745     1939     +194     
==========================================
+ Hits         1534     1589      +55     
- Misses        211      350     +139     
Flag Coverage Δ
unittests 81.94% <0.00%> (-5.96%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mathause
Copy link
Member

mathause commented May 3, 2024

Can you define

eps = np.spacing(np.float64(1))

or actually, if we don't do it for an array

eps = np.finfo(np.float64).eps

?
That would be clearer to me (not sure if here or in another PR, though).

@veni-vidi-vici-dormivi
Copy link
Collaborator Author

I think another PR would make sense.

@veni-vidi-vici-dormivi
Copy link
Collaborator Author

Then let's merge this then?

@mathause
Copy link
Member

mathause commented May 6, 2024

Can you merge main - I need to think about this.

@veni-vidi-vici-dormivi
Copy link
Collaborator Author

veni-vidi-vici-dormivi commented May 6, 2024

The question is, if when lambda is exactly eps, if we consider that to be zero, or not. I just wanted to make it consistent with the sklearn function. They however are not consistent themselves I think because they write if abs(lmbda) < np.spacing(1.0) for $\lambda = 0$ but then if not abs(lmbda - 2) > np.spacing(1.0) for $\lambda = 2$...

# when x >= 0
if abs(lmbda) < np.spacing(1.0):
    out[pos] = np.log1p(x[pos])
else:  # lmbda != 0
    out[pos] = (np.power(x[pos] + 1, lmbda) - 1) / lmbda

# when x < 0
if abs(lmbda - 2) > np.spacing(1.0):
    out[~pos] = -(np.power(-x[~pos] + 1, 2 - lmbda) - 1) / (2 - lmbda)
else:  # lmbda == 2
    out[~pos] = -np.log1p(-x[~pos])

Maybe that's the reason why Shruti wrote it herself? Because for the inverse transform she actually just uses the once from sklearn...

inverted_monthly_T[j, i] = self._yeo_johnson_inverse_transform(

@mathause
Copy link
Member

mathause commented May 6, 2024

Shruti wrote this herself so she could have variable (or dependent) $\lambda$ values.

I think the >= / > mess comes because the original author also confused that $\lambda$ can be any real value and does not have to be in 0..2 (also #430 (comment)). The value problem was fixed but this particular inconsistency remained - see scikit-learn/scikit-learn#12522 (Assuming 0 <= lambda <= 2 the operators make sense.)

@mathause
Copy link
Member

mathause commented May 6, 2024

scipy does the same but I think it's written by the same author: https://github.com/scipy/scipy/blob/7dcd8c59933524986923cde8e9126f5fc2e6b30b/scipy/stats/_morestats.py#L1572

@veni-vidi-vici-dormivi
Copy link
Collaborator Author

veni-vidi-vici-dormivi commented May 6, 2024

Shruti wrote this herself so she could have variable (or dependent) values.

Hm, I mean we could also pass every (value, lambda) pair to the sklearn power transform no? Like we do for the inverse transform.

$\lambda$ can be any real value and does not have to be in 0..2

In our case it is because lambda is derived from a logistic function.

@veni-vidi-vici-dormivi
Copy link
Collaborator Author

I don't get how it makes sense that in one case $\lambda == eps$ is not contained in the case and in the other it is?

@mathause
Copy link
Member

mathause commented May 6, 2024

Shruti wrote this herself so she could have variable (or dependent) values.

Hm, I mean we could also pass every (value, lambda) pair to the sklearn power transform no? Like we do for the inverse transform.

Yes, but then we have to check if this is vectorized - otherwise it will be too slow.

λ can be any real value and does not have to be in 0..2

In our case it is because lambda is derived from a logistic function.

Ah ok, sorry - it's too many open PRs and comments. But then this is a function of the covariate function and it's not optimal if this is in

def _get_yeo_johnson_lambdas(self, yearly_T):

(Technically the user will not be able to easily replace the lambda_function but it's misleading when the bounds are associated with the Yeo-Johnson transform and not with the covariate function. And almost impossible to find out if the lambda_function should ever be changed...)

@veni-vidi-vici-dormivi
Copy link
Collaborator Author

I agree, it is pretty hard to see through it all. The get_yeo_johnson_lambdas is pretty horrible. I think we could get rid of it when rewriting the whole thing for xarrays. Maybe we should think about passing the lambda function as argument...
But in this PR I actually just wanted to fix the operator, to be the same as in sklearn. Am I missing something?

@mathause
Copy link
Member

mathause commented May 6, 2024

But in this PR I actually just wanted to fix the operator, to be the same as in sklearn. Am I missing something?

No - I was trying to understand why it's inconsistent in scikit-learn (and I think I do now).

Copy link
Member

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, some more comments.

  • I don't think what we choose will matter in practice.
  • eps is much smaller for 0 than for 1
    np.spacing([0, 1, 2])
    array([4.94065646e-324, 2.22044605e-016, 4.44089210e-016])
    
  • That means there are values between 0 and eps (otherwise np.abs(lambdas) < eps would be equal to np.abs(lambdas) == 0)

Ok, let's merge this but I think we need to fix the title (we don't change a sign). Maybe: "align equality check in yeo johnson transform"?

@veni-vidi-vici-dormivi veni-vidi-vici-dormivi changed the title Fix sign in yeo johnson transform align equality check in yeo johnson transform May 15, 2024
@veni-vidi-vici-dormivi veni-vidi-vici-dormivi merged commit c1c0cf8 into MESMER-group:main May 17, 2024
@veni-vidi-vici-dormivi veni-vidi-vici-dormivi deleted the pt_fix branch May 17, 2024 09:09
veni-vidi-vici-dormivi added a commit to veni-vidi-vici-dormivi/mesmer that referenced this pull request May 23, 2024
* align equality check in yeo johnson transform with scikit learn yeo-johnson transform

* add NOTE

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants