-
Notifications
You must be signed in to change notification settings - Fork 287
Update README.md of ops delta_rule #595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Fix(math): Correct derivation for chunkwise DeltaNet parallelism Corrects the entire mathematical derivation for the chunkwise parallel form. The previous version used an incorrect left-multiplication order for the Householder product series, which has now been fixed. All dependent formulas for the state update and output have been updated accordingly.
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThe README for Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Reader
participant README as fla/ops/delta_rule/README.md
Note over README: Documentation-only math reformulation
Reader->>README: Read recurrence S_t = S_{t-1}(I - β_t k_t k_t^T) + β_t v_t k_t^T
Reader->>README: Follow derivation S^r = S^0 P^r + H^r
Note right of README: P = I - W^T K, H = U^T K
Reader->>README: See UT-transform: W = T K, U = T V
Note over README: Final formulas expressed for efficient matrix multiplies (T, W, U, outputs)
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @SeepingFragranceLock, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request rectifies a significant mathematical error in the README.md documentation related to the chunkwise DeltaNet parallelism. The fix involves correcting the order of matrix multiplication in the Householder product series, which subsequently required updating all related formulas for state updates and outputs. This ensures the documentation provides an accurate and reliable mathematical foundation for the DeltaNet implementation.
Highlights
- Mathematical Derivation Correction: The core mathematical derivation for the chunkwise parallel form of DeltaNet has been corrected, specifically addressing an incorrect left-multiplication order in the Householder product series.
- Updated Formulas: All dependent formulas for the state update and output, including the inductive proofs for P^r and H^r, have been revised to align with the corrected derivation.
- Matrix Form Adjustments: The matrix representations of P, H, W, and U have been updated to reflect the accurate mathematical model.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in pull request comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request provides a crucial correction to the mathematical derivation for the chunkwise parallel form of DeltaNet in the README. The main change is fixing the incorrect left-multiplication order of the Householder product series, which has been consistently applied to all dependent formulas. This significantly improves the correctness and clarity of the documentation. I have added a couple of minor suggestions to fix typos in the LaTeX for better rendering and accuracy.
fla/ops/delta_rule/README.md
Outdated
| &= \mathbf{S}^0_{[t]} + \left(\mathbf{U}^\top -\mathbf{S}^0_{[t]} \mathbf{W}^\top\right) \mathbf{K} \\ | ||
| &= \mathbf{S}^0_{[t]} + \left(\mathbf{V}^\top - \mathbf{S}^0_{[t]} \mathbf{K}^\top\right) \mathbf{T}^\top \mathbf{K} \;\;\in\mathbb{R}^{d_v \times d_k} \\ | ||
| \mathbf{O}_{[t]} &= \mathbf{Q}_{[t]}\mathbf{S}_{[t+1]}^\top \\ &= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right) \left(\mathbf{U} - \mathbf{W} \left(\mathbf{S}^0_{[t]}\right)^\top \right) \\ | ||
| &= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right)\mathbf{T} \left(\mathbf{V} - \mathbf{K}(\mathbf{S}^0_{[t]})^\top)\right) \;\;\in \mathbb{R}^{C \times d_v} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a typo with an extra closing parenthesis in the formula for O_{[t]}. The term \mathbf{K}(\mathbf{S}^0_{[t]})_\top) should likely be \mathbf{K}(\mathbf{S}^0_{[t]})_\top.
| &= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right)\mathbf{T} \left(\mathbf{V} - \mathbf{K}(\mathbf{S}^0_{[t]})^\top)\right) \;\;\in \mathbb{R}^{C \times d_v} | |
| &= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right)\mathbf{T} \left(\mathbf{V} - \mathbf{K}(\mathbf{S}^0_{[t]})_\top\right) \;\;\in \mathbb{R}^{C \times d_v} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the closing parenthesis is indeed redundant.
^\top is exactly fitting there, no problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
fla/ops/delta_rule/README.md (5)
5-11: Fix inline math fencing and tighten notation.
- The inline equation uses mixed backticks inside math:
$…$ . This will render incorrectly in many Markdown engines.- Minor: vectors/matrices alternate between
\bf{}and\mathbf{}; prefer a single convention (typically\mathbf{}) for consistency.Apply this diff to fix the inline math and unify boldface for the shown line:
-To reduce notational clutter, we focus on the first chunk, denoting $\mathbf{S}^r=\mathbf{S}_{[1]}^r$. By unrolling the recurrence $`S_t = S_{t-1}(I - \beta_t k_t k_t^\top) + \beta_t v_t k_t^\top`$, we have: +To reduce notational clutter, we focus on the first chunk, denoting $\mathbf{S}^r=\mathbf{S}_{[1]}^r$. By unrolling the recurrence $S_t = S_{t-1}(\mathbf{I} - \beta_t \mathbf{k}_t \mathbf{k}_t^\top) + \beta_t \mathbf{v}_t \mathbf{k}_t^\top$, we have:Optionally add a short “Notation” bullet list (dims for
$\mathbf{K},\mathbf{V},\mathbf{Q},\mathbf{S}^0$ , and chunk length$C$ ) right after this paragraph.
20-22: WY form looks right; keep symbol style consistent.Derivation and dims check out. Consider switching
\bf{}→\mathbf{}for$,\mathbf{w}^i,\mathbf{k}^i,$ to match the rest.
56-62: Matrix forms for P and H are dimensionally consistent.
$(\mathbf{W}^\top\mathbf{K})=\sum_i \mathbf{w}^i\mathbf{k}^{i\top}$ and$(\mathbf{U}^\top\mathbf{K})=\sum_i \mathbf{u}^i\mathbf{k}^{i\top}$ are accurate. Consider stating shapes of$\mathbf{K}\in\mathbb{R}^{C\times d_k}$ and$\mathbf{V}\in\mathbb{R}^{C\times d_v}$ explicitly here.
65-71: Avoid explicit matrix inverse in implementations.The triangular solve form is correct, but writing
$(\cdot)^{-1}$ can invite naïve implementations. Recommend phrasing as “solve lower‑triangular system” to encourage using forward substitution instead of forming the inverse.Proposed wording tweak:
-\mathbf{W} &= \left(\mathbf{I} + \mathrm{tril}(\mathrm{diag}(\beta) \mathbf{K}\mathbf{K}^\top, -1)\right)^{-1}\mathrm{diag}(\beta) \mathbf{K}\\ -\mathbf{U} &= \left(\mathbf{I} + \mathrm{tril}(\mathrm{diag}(\beta) \mathbf{K}\mathbf{K}^\top, -1)\right)^{-1}\mathrm{diag}(\beta) \mathbf{V} +\text{Solve } \left(\mathbf{I} + \mathrm{tril}(\mathrm{diag}(\beta)\,\mathbf{K}\mathbf{K}^\top,-1)\right)\mathbf{W}=\mathrm{diag}(\beta)\,\mathbf{K}\\ +\text{Solve } \left(\mathbf{I} + \mathrm{tril}(\mathrm{diag}(\beta)\,\mathbf{K}\mathbf{K}^\top,-1)\right)\mathbf{U}=\mathrm{diag}(\beta)\,\mathbf{V}
76-81: Use boldface for T consistently and define “UT transform.”
- Lines 79–80 use
Tinstead of\mathbf{T}.- “UT transform” is introduced but not expanded; clarify it (e.g., unit‑lower triangular transform scaled by diag(β)).
Apply this diff:
-\mathbf{P} &= \mathbf{I} - \mathbf{K}^\top T^\top \mathbf{K} \;\;\in \mathbb{R}^{d_k \times d_k}\\ -\mathbf{H} &= \mathbf{V}^\top T^\top \mathbf{K} \;\;\in \mathbb{R}^{d_v \times d_k} +\mathbf{P} &= \mathbf{I} - \mathbf{K}^\top \mathbf{T}^\top \mathbf{K} \;\;\in \mathbb{R}^{d_k \times d_k}\\ +\mathbf{H} &= \mathbf{V}^\top \mathbf{T}^\top \mathbf{K} \;\;\in \mathbb{R}^{d_v \times d_k}And add a parenthetical after the first mention of
$\mathbf{T}$ : “(a unit‑lower triangular transform scaled by $\mathrm{diag}(\beta)$)”.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
-
fla/ops/delta_rule/README.md(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: check-pt-python-compatibility
🔇 Additional comments (5)
fla/ops/delta_rule/README.md (5)
28-34: Induction step for P is correct.Algebra is sound and matches
$,\mathbf{P}^r=\mathbf{I}-\sum_i \mathbf{w}^i\mathbf{k}^{i\top},$ with the stated recursion.
40-41: H representation and recursion are consistent.Dims and induction base/step align with the construction of
$\mathbf{H}^r$ .
47-53: Induction for H expansion checks out.No issues; the split via
$\mathbf{P}_{i+1}^r$ is correct.
88-91: State/update equations are algebraically consistent.The transitions from
$(\mathbf{P},\mathbf{H})$ to$(\mathbf{W},\mathbf{U})$ and then to$(\mathbf{T},\mathbf{K},\mathbf{V})$ check out dimensionally and algebraically.If there is a unit test or benchmark notebook that exercises this form, please link it in this README section for quick cross‑reference.
98-99: Nice hardware‑oriented conclusion.Clear guidance that this maps to matmuls; good for readers implementing kernels.
| Substituting these compact forms back into the state update and output equations yields the hardware-efficient chunkwise algorithm. For a given chunk $[t]$ with initial state $`\mathbf{S}^0_{[t]}`$, the final state $`\mathbf{S}_{[t+1]}`$ and output $`\mathbf{O}_{[t]}`$ are: | ||
| ```math | ||
| \begin{equation} | ||
| \begin{aligned} | ||
| \mathbf{S} &= \mathbf{P}\cdot\mathbf{S}^0 + \mathbf{H} \\ | ||
| &= \mathbf{S}^0 + \mathbf{K}^\top (\mathbf{U} -\mathbf{W} \mathbf{S}^0) \in \mathbb{R}^{d_k \times d_v}\\ | ||
| \mathbf{O} &= \mathbf{Q} \mathbf{S}^0 + (\mathbf{Q} \mathbf{K}^{\top} \odot \mathbf{M}) \left(\mathbf{U} - \mathbf{W} \mathbf{S}^0\right) \in \mathbb{R}^{C \times d_v} | ||
| \mathbf{S}_{[t+1]} &= \mathbf{S}^0_{[t]} \mathbf{P} + \mathbf{H} \\ | ||
| &= \mathbf{S}^0_{[t]} + \left(\mathbf{U}^\top -\mathbf{S}^0_{[t]} \mathbf{W}^\top\right) \mathbf{K} \\ | ||
| &= \mathbf{S}^0_{[t]} + \left(\mathbf{V}^\top - \mathbf{S}^0_{[t]} \mathbf{K}^\top\right) \mathbf{T}^\top \mathbf{K} \;\;\in\mathbb{R}^{d_v \times d_k} \\ | ||
| \mathbf{O}_{[t]} &= \mathbf{Q}_{[t]}\mathbf{S}_{[t+1]}^\top \\ &= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right) \left(\mathbf{U} - \mathbf{W} \left(\mathbf{S}^0_{[t]}\right)^\top \right) \\ | ||
| &= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right)\mathbf{T} \left(\mathbf{V} - \mathbf{K}(\mathbf{S}^0_{[t]})^\top)\right) \;\;\in \mathbb{R}^{C \times d_v} | ||
| \end{aligned} | ||
| \end{equation} | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix parentheses, define the mask M, and boldface T in the output formula.
- Line 92 has mismatched parentheses: there is an extra
); rendering will break. -
$\mathbf{M}$ is used but not defined; specify it (e.g., a causal lower‑triangular mask induced by the UT structure, with shape$\mathbb{R}^{C\times C}$ ). - Use
$\mathbf{T}$ (bold) for consistency.
Apply this diff:
-\mathbf{O}_{[t]} &= \mathbf{Q}_{[t]}\mathbf{S}_{[t+1]}^\top \\ &= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right) \left(\mathbf{U} - \mathbf{W} \left(\mathbf{S}^0_{[t]}\right)^\top \right) \\
-&= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right)\mathbf{T} \left(\mathbf{V} - \mathbf{K}(\mathbf{S}^0_{[t]})^\top)\right) \;\;\in \mathbb{R}^{C \times d_v}
+\mathbf{O}_{[t]} &= \mathbf{Q}_{[t]}\mathbf{S}_{[t+1]}^\top \\ &= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right) \left(\mathbf{U} - \mathbf{W} \left(\mathbf{S}^0_{[t]}\right)^\top \right) \\
+&= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right)\mathbf{T}\left(\mathbf{V} - \mathbf{K}(\mathbf{S}^0_{[t]})^\top\right) \;\;\in \mathbb{R}^{C \times d_v}Please also add a one‑line definition of
🤖 Prompt for AI Agents
In fla/ops/delta_rule/README.md around lines 84 to 95, the displayed equations
contain a mismatched closing parenthesis on line ~92, the matrix T is not
consistently boldfaced in the output formula, and the mask M is used but
undefined; fix by removing the extra “)” so the parentheses balance, change T to
bold \mathbf{T} everywhere in the output expression for consistency, and add a
one‑line definition of \mathbf{M} immediately after the equations stating:
“\mathbf{M}\in\mathbb{R}^{C\times C} is a causal lower‑triangular (0/1) mask
induced by the UT structure—ones on the allowed (causal) positions and zeros
elsewhere—so the Hadamard product \odot enforces causal masking of
\mathbf{Q}_{[t]}\mathbf{K}^\top (this masking follows from the strictly
lower‑triangular structure of \mathbf{T}).”
|
For the impact of handedness and the order of cumulative Householder product series, see this discussion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
fla/ops/delta_rule/README.md (2)
5-5: Remove backticks inside inline math.Backticks render code, not math. Use plain LaTeX inside
$…$ .-To reduce notational clutter, we focus on the first chunk, denoting $\mathbf{S}^r=\mathbf{S}_{[1]}^r$. By unrolling the recurrence $`S_t = S_{t-1}(I - \beta_t k_t k_t^\top) + \beta_t v_t k_t^\top`$, we have: +To reduce notational clutter, we focus on the first chunk, denoting $\mathbf{S}^r=\mathbf{S}_{[1]}^r$. By unrolling the recurrence $S_t = S_{t-1}(I - \beta_t k_t k_t^\top) + \beta_t v_t k_t^\top$, we have:
91-92: Define mask M right after the equations.You use
$\mathbf{M}$ but don’t define it; add a one‑liner.\end{equation}
+\noindent$\mathbf{M}\in\mathbb{R}^{C\times C}$ denotes the causal lower‑triangular (0/1) mask induced by the UT structure (ones on allowed causal positions, zeros elsewhere). The Hadamard product
$\odot$ enforces causal masking of$\mathbf{Q}_{[t]}\mathbf{K}^\top$ , matching the strictly lower‑triangular support of$\mathbf{T}$ .
+
In this final form, the intra-chunk recurrence has been transformed into a series of efficient matrix multiplications (e.g., computing$\mathbf{T}$ ,$\mathbf{W}$ ,$\mathbf{U}$ , and the final output), which can be highly optimized on modern hardware like GPUs.</blockquote></details> </blockquote></details> <details> <summary>🧹 Nitpick comments (4)</summary><blockquote> <details> <summary>fla/ops/delta_rule/README.md (4)</summary><blockquote> `9-11`: **Use \mathbf consistently instead of deprecated \bf.** Standardize bold symbols for vectors/matrices. ```diff -... (\mathbf{I} - \beta^i \bf{k}^i \bf{k}^{i\top}) ... -... \beta^i \bf{v}^i\bf{k}^{i\top} ... +... (\mathbf{I} - \beta^i \mathbf{k}^i \mathbf{k}^{i\top}) ... +... \beta^i \mathbf{v}^i\mathbf{k}^{i\top} ... -\mathbf{P}^{r} = \mathbf{I} - \sum_{i=1}^{r}\bf{w}^i\bf{k}^{i\top} -\bf{w}^r = \beta^r \left(\bf{k}^r - \sum_{i=1}^{r-1} \left(\bf{k}^{i\top}\bf{k}^r \right)\bf{w}^i \right) +\mathbf{P}^{r} = \mathbf{I} - \sum_{i=1}^{r}\mathbf{w}^i\mathbf{k}^{i\top} +\mathbf{w}^r = \beta^r \left(\mathbf{k}^r - \sum_{i=1}^{r-1} \left(\mathbf{k}^{i\top}\mathbf{k}^r \right)\mathbf{w}^i \right) -... \beta^r \bf{k}^r \bf{k}^{r\top} ... -... \sum_{i=1}^{r-1}\bf{w}^i\bf{k}^{i\top} ... +... \beta^r \mathbf{k}^r \mathbf{k}^{r\top} ... +... \sum_{i=1}^{r-1}\mathbf{w}^i\mathbf{k}^{i\top} ... -\mathbf{H}^{r} = \sum_{i=1}^{r} \bf{u}^i \bf{k}^{i\top} -\bf{u}^r = \beta^r \left(\bf{v}^r - \sum_{i=1}^{r-1} \left(\bf{k}^{i\top}\bf{k}^r\right) \bf{u}^i \right) +\mathbf{H}^{r} = \sum_{i=1}^{r} \mathbf{u}^i \mathbf{k}^{i\top} +\mathbf{u}^r = \beta^r \left(\mathbf{v}^r - \sum_{i=1}^{r-1} \left(\mathbf{k}^{i\top}\mathbf{k}^r\right) \mathbf{u}^i \right) -... \sum_{i=1}^{r-1}\bf{u}^i \bf{k}^{i\top} ... -... \beta^r \bf{v}^r \bf{k}^{r\top} ... +... \sum_{i=1}^{r-1}\mathbf{u}^i \mathbf{k}^{i\top} ... +... \beta^r \mathbf{v}^r \mathbf{k}^{r\top} ...Also applies to: 20-22, 28-34, 40-41, 47-53
68-70: Prefer \operatorname for operators (diag, tril).Improves LaTeX typesetting; math operators won’t appear in roman text.
-\mathbf{W} &= \left(\mathbf{I} + \mathrm{tril}(\mathrm{diag}(\beta) \mathbf{K}\mathbf{K}^\top, -1)\right)^{-1}\mathrm{diag}(\beta) \mathbf{K}\\ -\mathbf{U} &= \left(\mathbf{I} + \mathrm{tril}(\mathrm{diag}(\beta) \mathbf{K}\mathbf{K}^\top, -1)\right)^{-1}\mathrm{diag}(\beta) \mathbf{V} +\mathbf{W} &= \left(\mathbf{I} + \operatorname{tril}(\operatorname{diag}(\beta)\,\mathbf{K}\mathbf{K}^\top, -1)\right)^{-1}\operatorname{diag}(\beta)\,\mathbf{K}\\ +\mathbf{U} &= \left(\mathbf{I} + \operatorname{tril}(\operatorname{diag}(\beta)\,\mathbf{K}\mathbf{K}^\top, -1)\right)^{-1}\operatorname{diag}(\beta)\,\mathbf{V}-\mathbf{T} &= \left(\mathbf{I} + \mathrm{tril}\left(\mathrm{diag}(\beta)\mathbf{K} \mathbf{K}^\top,-1\right)\right)^{-1}\mathrm{diag}\left(\beta\right) +\mathbf{T} &= \left(\mathbf{I} + \operatorname{tril}\!\left(\operatorname{diag}(\beta)\,\mathbf{K} \mathbf{K}^\top,-1\right)\right)^{-1}\operatorname{diag}\!\left(\beta\right)Also applies to: 76-76
79-80: Boldface T for consistency.Use
$\mathbf{T}$ where introduced as a matrix.-\mathbf{P} &= \mathbf{I} - \mathbf{K}^\top T^\top \mathbf{K} \;\;\in \mathbb{R}^{d_k \times d_k}\\ -\mathbf{H} &= \mathbf{V}^\top T^\top \mathbf{K} \;\;\in \mathbb{R}^{d_v \times d_k} +\mathbf{P} &= \mathbf{I} - \mathbf{K}^\top \mathbf{T}^\top \mathbf{K} \;\;\in \mathbb{R}^{d_k \times d_k}\\ +\mathbf{H} &= \mathbf{V}^\top \mathbf{T}^\top \mathbf{K} \;\;\in \mathbb{R}^{d_v \times d_k}
84-85: Also remove backticks around bold symbols in this sentence.Inline code formatting breaks math rendering.
-Substituting these compact forms back into the state update and output equations yields the hardware-efficient chunkwise algorithm. For a given chunk $[t]$ with initial state $`\mathbf{S}^0_{[t]}`$, the final state $`\mathbf{S}_{[t+1]}`$ and output $`\mathbf{O}_{[t]}`$ are: +Substituting these compact forms back into the state update and output equations yields the hardware-efficient chunkwise algorithm. For a given chunk $[t]$ with initial state $\mathbf{S}^0_{[t]}$, the final state $\mathbf{S}_{[t+1]}$ and output $\mathbf{O}_{[t]}$ are:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
-
fla/ops/delta_rule/README.md(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: check-pt-python-compatibility
🔇 Additional comments (2)
fla/ops/delta_rule/README.md (2)
91-92: Clarify exactness of the masked form.Confirm whether
$(\mathbf{Q}\mathbf{K}^\top)\mathbf{T}$ is exactly equal to$(\mathbf{Q}\mathbf{K}^\top \odot \mathbf{M})\mathbf{T}$ for your$\mathbf{T}$ , or if masking is an implementation detail. If exact, add a brief lemma/note stating the equality and its conditions (unit/strict lower‑triangular$\mathbf{T}$ , chunkwise causality).
56-61: Derivation looks correct and consistent with the transposed factorization.Left‑multiplication order and dimensions check out; the UT reparameterization aligns with $ \mathbf{W}=\mathbf{T}\mathbf{K}, \mathbf{U}=\mathbf{T}\mathbf{V}$ and yields the stated state/output forms.
If you want, I can add a short “Shapes and dataflow” box diagram to this README to aid implementers.
Also applies to: 73-81, 88-92
|
@SeepingFragranceLock Hi, please note the shape of |
oops... I have not checked that. For clarity, I suggest provide the modified recurrence equations for this README and set the accumulative Householder products accordingly to the transpose |
a4e1a1c to
90a0fea
Compare
Fix(math): Correct derivation for chunkwise DeltaNet parallelism
Corrects the entire mathematical derivation for the chunkwise parallel form. The previous version used an incorrect left-multiplication order for the Householder product series, which has now been fixed. All dependent formulas for the state update and output have been updated accordingly.
Summary by CodeRabbit