Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Copying my Discord comment:
Alright, I've set up the optimization loop (using a Bayesian optimizer) to optimize these powers that are used in the load balancer's weight formula:
(1 / np.power(r, due_power)) * (1 / np.power(delta_t, interval_power))due_power and interval_power are parameters to be fine-tuned. The range for both is from 0.5 (square root) to 3 (cube).
We have two optimization objectives here: average absolute difference between true retention and desired retention, which I called
avg_abs_ret_diff; andvolatility, which is a measure of how much workload varies day-by-day. Example: if you had 120 due cards today and 100 due cards yesterday, volatility=20%.We want to minimize both. However, when minimizing two different objectives, you often run into a situation where you cannot make one better without making the other one worse. It's called a Pareto frontier.
So instead of getting one set of parameters as a result, we will get a bunch of Pareto-optimal (can't-improve-A-without-making-B-worse) sets of parameters.
Simulation parameters
maximumInterval = 36500
new_cards_limits = 10
review_limits = 9999
max_time_limits = 10000 (IIRC this is in seconds)
learn_days = 100
deck_size = 1000
sample_size = 5
retentions = [0.7, 0.8, 0.85, 0.9, 0.95, 0.97, 0.99]
For each value of desired retention, the simulation runs
sample_sizetimes, for a total of 7*5=35 simulations per each set of parameters. Then this is done 100 times for different parameters. The same seeds are used across all retentions, for the sake of consistency.Here are baseline averages and their 95% confidence intervals:
Fuzz (no LB)
avg_abs_ret_diff=1.06%±0.18%, volatility=0.170±0.026
Current double-weighted LB: due_power = 2 and interval_power = 1
avg_abs_ret_diff=1.16%±0.16%, volatility=0.115±0.014
Current double-weighted LB (predicted by the Bayesian model)
avg_abs_ret_diff=1.06%, volatility=0.117
The first one is a "raw" experimental result, the second one is given by the Bayesian model after it has processed 100 experimental results. And by "experimental" I mean "simulated".
Notice how much volatility is reduced compared to random fuzz!
I wanted to add a table with Pareto-optimal values, but decided that it's better as a graph.
The "utopia point" is a hypothetical point where both objectives are minimized. It's not actually obtainable. What we can obtain instead instead is a "knee point" - a point closest to the utopia point.
So what's the best course of action given all of this? I suppose we can modify the powers. We can use due_power=2.150 and interval_power=3.000 (knee point), which would give us 0.88% abs. diff. in retention and 0.115 volatility, which is better than the current implementation. I doubt anyone would notice a difference, but even more so I doubt that it would make anything worse, so I guess why not.