SAGE: Early stopping and convergence checks #29
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
API for convergence tracking i snot great yet but it works, kind of. The early stopping bit is still in need of some improvement because I found it quite hard to have it stop, but in general I guess it works.
This now works by taking the number of permutations and creating a checkpoint after some number (default 5) of permutations after which convergence is checked and the current state of SAGE values is written to a history table.
I realize this is now somewhat orthogonal to the
$scoretable in the other method's implementations, where we also keep per-resampling, per-permutation importance scores, but in the case of SAGE I so far only ever used holdout due to the general slowness of things and the number of permutations here has a different meaning than the number of permutation iterations inPFI, for example.Would be nice to unify the API and semantics a bit.
As for SAGE convergence, I wondered if the
max_reference_sizehas a large impact on convergence but it looks like it's not huge. Here'sfriedman1's first two features (both should have same contribution due to them being only an interaction effect) formax_reference_size$ \in {100, 500}$ after 100 permutations each (quite a lot)In either case I'd argue that 50 permutations seems fine.
Here's the same for the most important feature (including intermediate reference size):
Here it looks like a few more permutations wouldn't have hurt, but not sure what to conclude about the reference size here.
I should also note that computing SAGE for this task with
max_reference_size = 300andn_permutations = 100took 7 hours or sousing the ranger learner with 5 threads.It's just.. a lot of stuff to do. Maybe I need to parallelize batchwise predictions after all?