Request for comment: Ebisu v3 API

# Ebisu v3 request for comment (RFC)

- [Ebisu v3 request for comment (RFC)](#ebisu-v3-request-for-comment-rfc)
  - [Introduction](#introduction)
    - [V2 recap](#v2-recap)
    - [V3's goal](#v3s-goal)
    - [Boost](#boost)
    - [This RFC](#this-rfc)
  - [V3 statistical model](#v3-statistical-model)
    - [Predict](#predict)
    - [Update step 1: update halflife](#update-step-1-update-halflife)
    - [Update step 2: update halflife and boost](#update-step-2-update-halflife-and-boost)
  - [V3 API](#v3-api)
    - [Model](#model)
    - [Initialize](#initialize)
    - [Predict](#predict-1)
    - [Update halflife](#update-halflife)
    - [Update halflife and boost](#update-halflife-and-boost)
    - [Reset halflife](#reset-halflife)
    - [Reinforcement strength](#reinforcement-strength)
  - [Conclusion](#conclusion)

## Introduction
This issue contains a proposal for a future v3 release of Ebisu, with the goal of inviting feedback to improve the design before it is released.

### V2 recap
A quick recap of the current version of Ebisu: v2 models each flashcard's probability of recall as a probability distribution that is valid at a certain time in the future. (Stats nerd note: when you first learn a card, its probability of recall `t` hours in the future is a Beta distribution with parameters `a` and `b`.) At any given time in the future (not just `t`), you can call `ebisu.predictRecall` to get the estimated recall probability for this flashcard. Doing this for each flashcard lets you pick which flashcards have the lowest recall probability, and you present one of those to the learner. Then, you call `ebisu.updateRecall` with the quiz's result (binary pass/fail, binomial passes and fails, noisy pass/fail), which updates the Beta distribution in light of the quiz result.

### V3's goal
The main goal of v3 is to address the issue that various folks have raised over the years but that I finally understood thanks to @cyphar in #43 (which has links to the original Reddit thread): Ebisu v2 has a fundamental flaw in how it models your memory, in that it ignores the fact that, in the real world, the act of reviewing a flashcard actually changes the underlying strength of the memory. Instead, Ebisu v2 assumed that your memory of a flashcard was a fixed but unknown probability, and updated its belief about that probability's distribution after each quiz. This made it appear that it was strengthening or weakening the memory model but when we tried to infer the ideal initial halflife for real-world flashcard histories, Ebisu v2 insisted on absurdly high estimates of initial halflife.

In practice, this means that the actual recall probabilities predicted by Ebisu v2 were extremely pessimistic, and flashcard apps that attempted to schedule reviews based on a minimum acceptable recall probability had to set those thresholds unrealistically low. I didn't realize this was a problem because flashcard apps I wrote on top of Ebisu just used its predicted recall to *rank* flashcards, and ignored their actual values; and since the predicted recall went up or down as I passed or failed quizzes, I didn't notice that the halflife was growing very slowly. Again, thanks to @cyphar and others who patiently explained this to me repeatedly until I finally saw the problem 🙏!


### Boost
Ebisu v3 is a total rewrite. It introduces the concept of a "boost" to Ebisu, which is very similar to Anki's "ease factor": each time you review a card, Ebisu will apply a Bayesian update based on the quiz's result, but then for each successful quiz it will boost the resulting probability distribution by a flashcard-specific scalar.

For example, suppose the halflife of a flashcard was a day, and you pass a quiz after two days, and suppose Ebisu v2 would simply update the halflife to four days. The Ebisu v3 model for this flashcard would have not only a probability distrubtion for the halflife (one day) but also for the boost, so if the mean boost value for this flashcard was 1.5, under Ebisu v3, `updateRecall` would yield a model with halflife `4 * 1.5` or *six* days. (I'm being *very* hand-wavy here for purposes of illustration.)

An important part of boosting is that it is applied only for successful quizzes **after a big enough delay**. This should make sense: imagine if you reviewed that flashcard with one day halflife after just *ten minutes*, and that Ebisu v2 would update its halflife to, say, 1.01 days. We wouldn't want Ebisu v3 to boost the halflife by 1.5 leaving us with a halflife of `1.01 * 1.5`! We'd want v3 to keep the halflife around 1.01, because it's unlikely that a quiz after ten minutes would significantly alter our neural memory of this fact.

Ebisu v3 therefore has the following nominal psuedocode:
```py
def updateRecallBoost(oldHl: float, elapsedTime: float, quizResult: bool, boost: float, LEFT: float, RIGHT: float) -> float:
  updatedHl = updateRecallSimple(oldHl, elapsedTime, quizResult)

  if quizResult:
    b = lerp([LEFT * oldHl, RIGHT * oldHl], [1, boost], elapsedTime)
    # clamp
    if b < 1.0:
      b = 1.0
    elif b > boost:
      b = boost
  else:
    b = 1.0

  boostedHl = b * updatedHl
  return boostedHl

def lerp(xs: float[], ys: float[], x: float) -> float:
  mu = (x - x[0]) / (x[1] - x[0])
  return (y[0] * (1 - mu) + y[1] * mu)
```
where `lerp([x1, x2], [y1, y2], x)` is linear interpolation of `x` along the line between points `(x1, y1)` and `(x2, y2)`. In words:
1. calculate the new no-boost Bayesian-updated halflife based on a quiz result and quiz elapsed time,
2. calculate the boost factor:
    - 1.0 for quiz failures and elapsed times short relative to the old halflife,
    - the full boost factor for elapsed times that are long relative to the old halflife, and
    - between 1.0 and the full boost for intermediate elapsed times, linearly interpolating.
3. Multiply the two values to get your final new halflife.

`LEFT` and `RIGHT` are constants that you pick to control what counts as "short" vs "long" relative to the old halflife. In practice, I've been experimenting with `LEFT=0.3` and `RIGHT=1.0` and these work well.

### This RFC
With now *two* things to keep track of for each flashcard (its halflife and its boost), the `update` step is more complicated in v3 than v2: it is broken down into two functions that app authors will need to call, because of the computational burden. However, in v3, the `predict` step (`predictRecall`) is a lot faster than in v2 and in fact can be done in SQL. Hopefully the more complex mathematics and API are worth it for more realistic estimates.

In the remainder of this issue, I'll
1. briefly sketch out the new probablility model (since GitHub doesn't render LaTeX, there will be only very basic equations), but more importantly,
2. specify the functions in the API and how they work in considerable detail.

My main goal is to get feedback on the second point above from app authors who use Ebisu. Does the proposed API feel good? Are there flaws in it? Are there other nice-to-haves that I haven't thought of? Do you want things renamed?

## V3 statistical model

This section can be skipped by folks who don't care about the math. For those that do, this section will only be a sketch because I can't use LaTeX math in a GitHub issue. The details will be spelled out on the main Ebisu website when v3 is released, but I'm of course happy to answer questions about this now during the RFC phase as well.

Before getting to v3's statistical model, recall that v2's statistical model is, for each flashchard:
```
(recall probability in t hours in the future) ~ Beta(a, b),
```
that is, a Beta random variable parameterized by `a` and `b`.

In v3, we instead go a level higher and place the prior probability distribution on the halflife directly:
```
halflife ~ Gamma(a, b),
```
We also place a probability distribution on boost:
```
boost ~ Gamma(aBoost, bBoost).
```
That is, we have two Gamma-distributed random variables.

### Predict
The predicted recall probability at time `t` hours in the future is a very simple arithmetic expression:
```
logPredictRecall ∝ -t / E[halflife]
predictRecall = exp(-predictRecall)
```

> The symbol `∝` means "proportional to", so there are some extra factors I'm omitting here that aren't important to the explanation here but are needed to line up the units of `t` and `halflife`, and to convert `exp`'s base to `2` (so that `predictRecall` for `t = halflife` is `0.5 = 2**-1` instead of `exp(-1)`).

Note that the predict recall step is really simple. To find the flashcard with the lowest predicted recall, find the flashchard with lowest `-t / halflife`, which can be done in SQL and cached and etc.

> In the README for Ebisu v2, I invoke Jensen's inequality to explain why this is technically inaccurate: because `E[f(x)] ≠ f(E[x])`, the exact expected probability of recall `E[2**(-t / halflife)]` will be different from the above, which is `2**(-t / E[halflife])`. However, I've decided for v3 that this inexactness is a good exchange for the computational simplicity and boost-based modeling.

### Update step 1: update halflife
When we have a quiz result, after time `t` has elapsed since the last review, we can apply a Bayesian update on halflife:
- original prior: `halflife ~ Gamma(a, b)`
- exact posterior: `halflife | quiz after t hours ~ SomeComplicatedDistribution` (we have moments of this, not the exact density)
- approximated posterior: `halflife | quiz after t hours ~ Gamma(a2, b2)`

So far this is the exact same architecture as Ebisu v2, except with prior on halflife instead of recall probability (and therefore different integrals).

However, at this stage we apply the boost:
- given a `mean posterior halflife = E[halflife | quiz after t hours]`,
- compute a deterministic value `boostValue = clampLerp([LEFT * E[prior halflife], RIGHT * E[prior halflife]], [1, E[boost]], t)`, where `clampLerp` is the linear interpolation clamped so `1 <= boostValue <= E[boost]`, before finally
- scaling the posterior yielding the final boosted halflife, `boostValue * posterior ~ Gamma(a2, boostValue * b2)`

That is, we've updated our probability distribution on the halflife in response to a quiz and applied a boost (a little bit of boost for small `t`, the full boost value for large `t`).

In this way, we can see that we're just updating our probability distribution around the ***initial*** halflife, before the first quiz, and after each successive quiz, we update our distribution about that *initial* halflife. The halflife after several quizzes is a random variable that's fully specified by the *initial* halflife random variable after it's been scaled by a sequence of these `clampLerp`ed boost values.

However, this leaves open the question: how do we estimate boost? How do we update our probability distribution from `boost ~ Gamma(aBoost, bBoost)`?

### Update step 2: update halflife and boost

This section is the primary innovation of Ebisu v3 over v2, mathematically. After we have three or more quizzes (though you could do this after just two quizes), we can update our probability distributions about both (a) the initial halflife and (b) the boost.

To do this, we use two simple techniques in sequence.

First, **curve fit**. We evaluate the posterior `initial halflife, boost | several quizzes` for a grid of values in the `initial halflife` × `boost` plane and *curve fit* this two-dimensional posterior surface to two independent Gamma random variables. This is readily doable because we assume each quiz is independent, so the overall posterior is simply the product of individual likelihoods and priors after each quiz.

> We curve fit because, while we have a simple expression for the posterior (up to a scalar constant), we can't analytically evaluate this bivariate distribution's moments. (In Ebisu v2, we had a univariate posterior on recall probability with thankfully tractable integrals yielding analytical moments.)

The curve fit is just weighted least-squares, i.e., a very fast solution to an overdetermined system of equations. It is essentially solving `A x = b` for a tall and skinny matrix `A`, and where the unknowns `x` are the four parameters of the two Gamma random variables (initial halflife and boost).

However, the resulting curve fit often fails to properly capture the behavior of the posterior in its tails, so badly that prediction performance drops if we just use these two Gammas as our updated probability distributions.

Therefore, second, we use **importance sampling**, a Monte Carlo method to obtain moments of a probability distribution which we can analytically evaluate, given samples of some other distribution (the latter is called the "proposal distribution").

Specifically, we use importance sampling get accurate estimates for the moments of the bivariate posterior, given samples from the curve-fit bivariate Gamma distribution we estimated. With relatively few samples (on the order of ~1000), we can get good estimates of the posterior's moments and moment-match these to two independent Gammas, on initial halflife and boost respectively.

Therefore, this update step allows us to update our estimate of the boost factor in a data-driven way, unlike Anki's ease factor which is hardcoded. Given a sequence of quiz results, we can answer the questions, "What was the halflife of this fact when I began learning it? What boost factor has been strenghtening it each time I review it?"

> N.B. We need *both* these steps, and in this order. The curve fit alone doesn't give us accurate enough parameter estimates but it gives a great proposal distribution for use with importance sampling. Without that, importance sampling using the original priors on initial halflife and boost would need many more Monte Carlo samples and would not be computationally tractable. (I tried 🥲.)

> N.B.2. While I've described this two-stage bivariate update step as relatively efficient (linear system solver, importance sampling with ~thousands of samples), it is much more expensive than the plain univariate Bayesian update described above, which used the mean boost `E[boost]` everywhere. Therefore, the API breaks these up into two separate phases, with two `update` functions in the API: you can run the simpler univariate halflife-only update after each quiz and then maybe once a day or once a week run the heavier-weight bivariate halflife-and-boost `update`r to refresh your estimate of `boost` for each flashcard.

We turn to the API secion next. Again, the above mathematics has been terse because I only have ASCII to describe it and I'm hoping to primary get feedback on the API, but I hope it is sufficiently detailed to give a flavor of the computational burdens involved.

<details>
<summary>Tutorial on importance sampling</summary>

Here's a super quick script to show you the value of importance sampling, plus how it's used.

Suppose you want to estimate the mean of *the square* of the unit uniform distribution, i.e., if `u ~ Unif(0, 1)`, what is `E[u**2]`?, and you can't do calculus so you want to do Monte Carlo. Super-easy:
```py
import numpy as np
from scipy.stats import norm as normrv
from scipy.stats import uniform as uniformrv

nsamples = 1_000_000
u = np.random.rand(nsamples)

direct = np.mean(u**2)
directvar = np.var(u**2)
```
You generate a million samples uniformly between 0 and 1, square them, and call `np.mean` to get your estimate of the mean. Then you can call `np.var` to get the *variance*, i.e., accuracy, of that estimate: the lower the variance, the more accurate your estimate, and the more efficient your sampling is (those million samples are doing a good job).

This is not importance sampling, it's just normal Monte Carlo sampling.

Suppose though that you for some reason cannot sample the uniform distribution. Maybe you only have a library to generate Normally-distributed (Gaussian) random samples, so you have `randn` but no `rand`. Could you estimate `E[u**2]` with just `randn`? Yes. Importance sampling lets you:
```py
n = np.random.randn(nsamples)
indirect = np.mean(n**2 * uniformrv.pdf(n) / normrv.pdf(n))
indirectvar = np.var(n**2 * uniformrv.pdf(n) / normrv.pdf(n))
```
Here we generate a million samples from the unit Normal distribution, `n ~ Normal(0, 1)`, and again square them, but before calling `np.mean` to get our estimate, we *weight each sample by its importance*. That's what the crucial `uniformrv.pdf(n) / normrv.pdf(n)` factor there is doing: it's giving more weight to samples that are likely to have come from our target distribution (`Unif(0,1)`) and less weight to samples that aren't.

This will actually work: your `indirect` estimate of `E[u**2]` will be correct but (much) higher variance than when you could sample from the Uniform distribution directly.

But we know the unit Normal distribution is an obviously bad proposal distribution for the unit Uniform distribution—over half the samples will just be thrown away, since negative samples get an importance factor of `0`. We can try one more experiment: let's use `Normal(0.5, σ=0.5)` as our proposal. This should be a good deal more accurate than the unit Normal:
```py
n2 = normrv.rvs(loc=0.5, scale=0.5, size=nsamples)
indirect2 = np.mean(n2**2 * uniformrv.pdf(n2) / normrv.pdf(x=n2, loc=0.5, scale=0.5))
indirect2var = np.var(n2**2 * uniformrv.pdf(n2) / normrv.pdf(x=n2, loc=0.5, scale=0.5))
```

Printing out the estimated mean and the variance (inaccuracy) of each estimate:
```py
print("| method | estimated mean | estimator variance |")
print('|--------|----------------|--------------------|')
print(f'| direct | {direct:0.4f} | {directvar:0.4f} |')
print(f'| crappy proposal | {indirect:0.4f} | {indirectvar:0.4f} |')
print(f'| better proposal | {indirect2:0.4f} | {indirect2var:0.4f} |')
```
yields
| method | estimated mean | estimator variance |
|--------|----------------|--------------------|
| direct | 0.3328 | 0.0888 |
| crappy proposal | 0.3340 | 0.6122 |
| better proposal | 0.3332 | 0.2182 |

As expected, the middle column all report the same estimate, 1/3. However, whereas the straightforward Monte Carlo estimator had a low variance of 0.09, which is the best you're going to get given you had access to `rand`, the crappy importance sampling estimator using just `randn` (the unit Normal) as its proposal had a much higher variance, almost 7× worse. But by using `Normal(0.5, σ=0.5)` as the proposal, we reduce the inaccuracy of the estimator, whose variance is just ~2.5× the direct estimator.

So, of course ideally you'd be able to do the integral to find `E[u**2]` (zero estimator variance 🙌!), and failing that, hopefully you can use `rand` to draw uniform samples and calculate the estimate directly. But failing both, importance sampling gives you a way to convert samples from some rando distribution into moment estimates of the distribution you do care about.

> (Obviously the proposal distribution can't be too rando, so for example you couldn't switch the role of the Uniform and Normal above: a unit Uniform proposal totally misses out on huge chunks of the target unit Normal's support. There are rules about what makes an allowable proposal distribution, and about what's the *best* proposal distribution (the target distribution 🙃) that textbooks bore us with, but the rules are quite sensible.)

Hopefully the above convinces you that there's nothing magical about using it to improve our posterior fits. The curve-fit posterior is like the "better proposal" above. The "crappy proposal" for importance sampling might be the original priors on halflife and boost (except after a dozen quizzes, the variance of the estimates would be enormous without billions and trillions of samples).

</details>

## V3 API
### Model
Ebisu v2 had a very compact data model. Each flashcard was just a 3-tuple, `[float, float, float]`: two numbers representing the Beta random variable's parameters and the elapsed time at which that Beta applied. It didn't keep track of past quizzes because that 3-tuple was a "sufficient statistic": as far as the math was concerned, everything about the past was encoded in that 3-tuple.

Because the Ebisu v3 algorithm described above needs to know about past quiz times and results to re-estimate boost, the Ebisu v3 data model is much bigger, and now includes past quizzes.

<details>

<summary>See Python `dataclass`es for Ebisu v3 model</summary>

While the key-value structure of the data model is probably an implementation detail and not relevant for the majority of readers of this RFC, for completeness I include it here, with the caveat that it may change.

The top-level model has three sections:
```py
@dataclass
class Model:
  quiz: Quiz
  prob: Probability
  pred: Predict
```

One section for a list of quiz results and times:
```py
@dataclass
class Quiz:
  elapseds: list[list[float]]

  # same length as `elapseds`, and each sub-list has the same length
  results: list[list[Result]]

  # 0 < x <= 1 (reinforcement). Same length/sub-lengths as `elapseds`
  startStrengths: list[list[float]]
```

A second section for the parameters of the probability distributions:
```py
@dataclass
class Probability:
  # priors: fixed at model creation time
  initHlPrior: tuple[float, float]  # alpha and beta
  boostPrior: tuple[float, float]  # alpha and beta

  # posteriors: these change after quizzes
  initHl: tuple[float, float]  # alpha and beta
  boost: tuple[float, float]  # alpha and beta
```

And a final section for computed values that are useful for `predictRecall` (whether that's done in Python or in SQL or whatever):
```py
@dataclass
class Predict:
  # just for developer ease, these can be stored in SQL, etc.
  # halflife is proportional to `logStrength - (startTime * CONSTANT) / halflife`
  # where `CONSTANT` converts `startTime` to same units as `halflife`.
  startTime: float  # unix epoch
  currentHalflife: float  # mean (so _currentHalflifePrior works). Same units as `elapseds`
  logStrength: float
```

For `logStrength` and `startStrengths`, see discussion below on reinforcement strength.

</details>

### Initialize
```py
def initModel(initHlPrior: Union[tuple[float, float], None] = None,
              boostPrior: Union[tuple[float, float], None] = None,
              initHlMean: Union[float, None] = None,
              initHlStd: Union[float, None] = None,
              boostMean: Union[float, None] = None,
              boostStd: Union[float, None] = None) -> Model: pass
```
You're expected to provide *either* `initHlPrior` (a 2-tuple `[α, β]` of the Gamma random variable) *or* both `initHlMean` and `initHlStd` representing the mean and the standard deviation of the Gamma random variable representining the initial halflife.

Similarly for boost.

### Predict
There are actually two functions here:
```py
def predictRecall(model: Model, elapsedHours=None, logDomain=True) -> float: pass
def _predictRecallBayesian(model: Model, elapsedHours=None, logDomain=True) -> float: pass
```
The official `predictRecall` returns `2**(-t / (mean halflife))`, which is an approximation of the recall probability at time `t` in the future. It's mathematically inaccurate but very fast and convenient to find the mean of the halflife and then put it through Ebbinghaus' exponential forgetting function. I expect this function to be portable to SQL, etc., because it's just a fast algebraic expression (addition, multiplication, division).

For fun, I plan to include `_predictRecallBayesian` which computes the exact expected recall probability (like Ebisu v2 does). The leading `_` means I don't expect this function to be in ports to other languages. This function computes an expensive Bessel function of the second kind, the analytical solution to the integral that yields the exact recall probability.

### Update halflife
I've decided to reuse the `updateRecall` name in Ebisu v3 for the "quick" update step that assumes the boost is fixed and just applies a quiz result:
```py
def updateRecall(
    model: Model,
    elapsed: float,
    successes: Union[float, int],
    total: int = 1,
    now: Union[None, float] = None,
    q0: Union[None, float] = None,
    reinforcement: float = 1.0,
    left=0.3,
    right=1.0,
) -> Model: pass
```

Note that like Ebisu v2, the above function supports noisy quizzes and binomial/binary quizzes. Also note two new v3-only parameters: `left` and `right`, which control how much boost to apply. Nominally, if `elapsed < left * (current halflife)`, *no* boost is applied. If `elapsed > right * (current halflife)`, the full boost is applied (whatever it may be, recall it has a probability distribution that you initialized or that was computed from data). For values of `elapsed` in between, the boost is scaled linearly.

This function needs to be called whenever the student completes a quiz. But as alluded above, it only updates the halflife, and takes the boost as fixed.

> For more on what `reinforcement` means, see below on reinforcement strength.

### Update halflife and boost
The "non-quick" update function will update its belief about *both* the halflife and the boost.
```py
def updateRecallHistory(
    model: Model,
    left=0.3,
    right=1.0,
    size=10_000,
) -> Model: pass
```

If we had unlimited computing power, we'd run `updateRecall` and then immediately run `updateRecallHistory` so we always had the most accurate estimates. However, this `updateRecallHistory` is an expensive function, much moreso than the old v2 `updateRecall`, because it is looping over all quizzes. Specifically, as described in the math section above, it's
1. evaluating probabilities on a two-dimensional grid of halflife and boost for each quiz, then
2. solving a linear system of equations, before finally
3. an importance sampling step (the number of Monte Carlo samples is governed by the `size` argument).

All this should be totally feasible on mobile/browser, but I do expect apps will opt to run this as a batch function, for example, 
- run this daily for flashcards that were reviewed today, or
- run this every 2–4 quizzes.

If you initialized the boost prior reasonably, there shouldn't be a pressing need to rerun this function after each quiz.

In the future, some smart person might enlighten me how we can make this estimation step better (or how we can change the entire model to have a better estimation step), but for now, I feel that this is a reasonable balance between model accuracy, prediction speed, and update complexity.

### Reset halflife
Ebisu v2 supports manually rescaling the halflife (`rescaleHalflife` in v2) of a model, for those situations where you just know the model has over- or underestimated your memory. We'd like to support this in v3 as well.

In Ebisu v3, since a flashcard's data model has all past quiz results, you may think that it'd be straightforward to just tweak the initial halflife's priors and re-run `updateRecallHistory`, but after you get several quizzes, the initial priors are often relatively powerless, with the posteriors almost entirely driven by the data (the quizzes). This is a well-known outcome in Bayesian statistics.

Therefore, the proposed function is called "reset" because it will literally start a new chapter in your history with this flashcard with a new prior on halflife:
```py
def resetHalflife(
    model: Model,
    initHlMean: float,
    initHlStd: float,
    startTime: Union[float, None] = None,
    strength: float = 1.0,
) -> Model: pass
```
The model will still contain all past quizzes, but this function sets those aside and reinitializes a new list of elapsed times and quizzes, so all *future* quizzes get the initial halflife you ask for here.

This way, if after several quizzes, you need to rescale/reset the halflife of a flashcard, this lets you do that. No hard feelings.

> Ideally someday we'll be able to analyze each flashcard's history of quizzes and automatically detect when there's a "new halflife", i.e., when your memory for this fact got much better or much worse. Sort of like the example in *Bayesian methods for hackers* chapter 1, ["Inferring behaviour from text-message data"](https://nbviewer.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Ch1_Introduction_PyMC3.ipynb#Example:-Inferring-behaviour-from-text-message-data).

That's it. That's the API.

### Reinforcement strength

Throughout the discussion above, I've alluded to `startStrength` and `logStrength` and `reinforcement`. This is to support partial reconsolidation or reinforcement, raised in https://github.com/fasiha/ebisu/issues/51 by @jasonsparc. The idea is, Ebbinghaus' exponential decay assumes that, after each review, your memory of some fact jumps to 100%: probability of recall `2**(-t / halflife) = 1.0` when `t=0` (right after you've reviewed this fact, probability 1 = 100% recall probability). @jasonsparc's idea is: what if this isn't true? What if you know that the quiz only *partially* reconsolidated the memory, i.e., that at `t=0.0001` (a millisecond after your last quiz), the probability of recall is not .9999 but something much less, maybe 0.5, maybe 0.1?

Ebisu v3 tentatively supports this by allowing you to specify how much reconsolidation you think has happened every time you call `updateRecall`. This number is taken as deterministic and fixed, just like the quiz result or elapsed time, and it flows through the update process as if it was known. We'll want to experiment with this to make sure it's working. For users who aren't interested in playing with this feature, the API defaults to sane values.

## Conclusion
The code and tests are written. You can find them (along with a *ton* of irrelevant code I wrote while getting to this point) at https://github.com/fasiha/ebisu-likelihood-analysis/, specifically [ebisu3.py](https://github.com/fasiha/ebisu-likelihood-analysis/blob/main/ebisu3.py) and [test_ebisu3.py](https://github.com/fasiha/ebisu-likelihood-analysis/blob/main/test_ebisu3.py).

I've prepared a [script](https://github.com/fasiha/ebisu-likelihood-analysis/#ebisu-v3-and-stan) that can
1. load flashcard review history from Anki's `collection.anki2` files (if you export your Anki deck to an `apkg` file and unzip that, this `collection.anki2` file, which is a SQLite database file, will be inside), and
2. run a couple of cards through both Ebisu v3 and a Stan model.

The Stan model will likely mostly be useful for those interested in changing the model or putting hyperpriors on `left` and `right` and other parameters, or verifying Ebisu v3's correctness (there's a difference in the output between these two that I'm working on tracking down).

But mainly I'm hoping that interested parties provide feedback on the API above. And of course any other feedback (the mathematical model for example) is also most welcome.

Many thanks for your patience with v3. This is several months late and I am grateful for the support and feedback I've received so far 🙇.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Request for comment: Ebisu v3 API #58

Ebisu v3 request for comment (RFC)

Introduction

V2 recap

V3's goal

Boost

This RFC

V3 statistical model

Predict

Update step 1: update halflife

Update step 2: update halflife and boost

V3 API

Model

Initialize

Predict

Update halflife

Update halflife and boost

Reset halflife

Reinforcement strength

Conclusion

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

method	estimated mean	estimator variance
direct	0.3328	0.0888
crappy proposal	0.3340	0.6122
better proposal	0.3332	0.2182

Request for comment: Ebisu v3 API #58

Description

Ebisu v3 request for comment (RFC)

Introduction

V2 recap

V3's goal

Boost

This RFC

V3 statistical model

Predict

Update step 1: update halflife

Update step 2: update halflife and boost

V3 API

Model

Initialize

Predict

Update halflife

Update halflife and boost

Reset halflife

Reinforcement strength

Conclusion

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions