Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@lorentzenchr
Copy link
Contributor

@lorentzenchr lorentzenchr commented Nov 2, 2023

This PR adds a 2. convergence check based on gradients in the line search. This can deal with tiny loss improvements.

This is taken from https://github.com/scikit-learn/scikit-learn/blob/9621539a9defe86ff55c890d5f2475f42697604f/sklearn/linear_model/_glm/_newton_solver.py#L263.

This might help with some of the tests in #723.

Checklist

  • Added a CHANGELOG.rst entry

Copy link
Member

@MarcAntoineSchmidtQC MarcAntoineSchmidtQC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this change. Does not have any negative performance impact on our current benchmarks, and the logic seems to make sense. The new conditional branch is rarely used in our current golden master tests, but used quite often in the test_glm.py tests that @lorentzenchr provided.

Two small requests before merging:

  • Is it possible to provide a source for the logic behind the tiny loss branch?
  • Can you add an entry to the changelog in the "Unreleased" section.

@lorentzenchr
Copy link
Contributor Author

Is it possible to provide a source for the logic behind the tiny loss branch?

I developed this in scikit-learn/scikit-learn#24637, trying to satisfy my own (very strict) tests. I can't provide any literature on this.
The basic insight is as follows:

  • In general, the first order condition alias score equation is a better condition for checking convergence than loss improvements or step sizes (change in coefficients) because we know the absolute scale: it should be zero.
  • Empirically, if we are already close to the minimum and seeking for a high precision solution, the floating point precision of the loss might not be enough while the gradient has not yet achieved, say 1e-12 or lower. Therefore, the gradient has more information in such a setting.
  • The line search may then reject a (Newton) step that brings the gradient closer to zero, but that provides no improvement of the loss (but no worsening either up to machine precision).

@MarcAntoineSchmidtQC
Copy link
Member

Thanks @lorentzenchr!

@MarcAntoineSchmidtQC MarcAntoineSchmidtQC merged commit a304d55 into Quantco:main Nov 8, 2023
@lorentzenchr lorentzenchr deleted the linear_search branch November 10, 2023 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants