Update README.md of ops delta_rule #595

SeepingFragranceLock · 2025-09-17T01:33:19Z

Fix(math): Correct derivation for chunkwise DeltaNet parallelism

Corrects the entire mathematical derivation for the chunkwise parallel form. The previous version used an incorrect left-multiplication order for the Householder product series, which has now been fixed. All dependent formulas for the state update and output have been updated accordingly.

Summary by CodeRabbit

Documentation
- Clarified terminology (“unrolling the recurrence”) and added an explicit recurrence formula.
- Reworked derivations to express S^r = S^0 P^r + H^r with updated definitions.
- Switched to a transposed factorization and introduced a UT-transform-based presentation.
- Updated state‑update and output formulas to align with the UT-based forms.
- Added notes on GPU-friendly matrix-multiplication implementation.
- No changes to public APIs.

Fix(math): Correct derivation for chunkwise DeltaNet parallelism Corrects the entire mathematical derivation for the chunkwise parallel form. The previous version used an incorrect left-multiplication order for the Householder product series, which has now been fixed. All dependent formulas for the state update and output have been updated accordingly.

coderabbitai · 2025-09-17T01:33:26Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

The README for fla/ops/delta_rule is rewritten to replace “partially expanding the recurrence” with an explicit recurrence S_t = S_{t-1}(I - beta_t k_t k_t^T) + beta_t v_t k_t^T, rederive intra-chunk expansions using transposed factorization and a UT-transform T, and present final formulas suited for matrix-multiplication implementations.

Changes

Cohort / File(s)	Summary
Documentation (Delta Rule README) `fla/ops/delta_rule/README.md`	Replaces phrasing with explicit recurrence S_t = S_{t-1}(I - β_t k_t k_t^T) + β_t v_t k_t^T. Reworks intra-equation expansion as S^r = S^0 P^r + H^r and updates definitions of P and H to use transposed factorization (P = I - W^T K, H = U^T K). Introduces UT-transform T with W = T K, U = T V and rewrites P, H, state-update S, and output O in terms of T, K, V for matrix-multiplication-friendly presentation. No API or exported-entity changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Reader
  participant README as fla/ops/delta_rule/README.md
  Note over README: Documentation-only math reformulation
  Reader->>README: Read recurrence S_t = S_{t-1}(I - β_t k_t k_t^T) + β_t v_t k_t^T
  Reader->>README: Follow derivation S^r = S^0 P^r + H^r
  Note right of README: P = I - W^T K, H = U^T K
  Reader->>README: See UT-transform: W = T K, U = T V
  Note over README: Final formulas expressed for efficient matrix multiplies (T, W, U, outputs)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I twitch my whiskers at matrices tall,
I unroll the recurrence, transpose and call.
T folds K and V into tidy art,
P and H align, chunked from the start.
GPUs hum as products neatly fall.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Update README.md of ops delta_rule" accurately indicates the PR modifies the README in ops/delta_rule and the PR summary/raw_summary confirm the changes correct mathematical derivations and the Householder product ordering in that README, so the title is on-topic and concise. It is somewhat generic and does not highlight the specific derivation fixes, but it still meets the criteria for a related, concise title. Therefore it satisfies the title check.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Summary of Changes

Hello @SeepingFragranceLock, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request rectifies a significant mathematical error in the README.md documentation related to the chunkwise DeltaNet parallelism. The fix involves correcting the order of matrix multiplication in the Householder product series, which subsequently required updating all related formulas for state updates and outputs. This ensures the documentation provides an accurate and reliable mathematical foundation for the DeltaNet implementation.

Highlights

Mathematical Derivation Correction: The core mathematical derivation for the chunkwise parallel form of DeltaNet has been corrected, specifically addressing an incorrect left-multiplication order in the Householder product series.
Updated Formulas: All dependent formulas for the state update and output, including the inductive proofs for P^r and H^r, have been revised to align with the corrected derivation.
Matrix Form Adjustments: The matrix representations of P, H, W, and U have been updated to reflect the accurate mathematical model.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request provides a crucial correction to the mathematical derivation for the chunkwise parallel form of DeltaNet in the README. The main change is fixing the incorrect left-multiplication order of the Householder product series, which has been consistently applied to all dependent formulas. This significantly improves the correctness and clarity of the documentation. I have added a couple of minor suggestions to fix typos in the LaTeX for better rendering and accuracy.

fla/ops/delta_rule/README.md

gemini-code-assist · 2025-09-17T01:35:03Z

fla/ops/delta_rule/README.md

+&= \mathbf{S}^0_{[t]} + \left(\mathbf{U}^\top -\mathbf{S}^0_{[t]} \mathbf{W}^\top\right) \mathbf{K} \\
+&= \mathbf{S}^0_{[t]} + \left(\mathbf{V}^\top - \mathbf{S}^0_{[t]} \mathbf{K}^\top\right) \mathbf{T}^\top \mathbf{K} \;\;\in\mathbb{R}^{d_v \times d_k} \\
+\mathbf{O}_{[t]} &= \mathbf{Q}_{[t]}\mathbf{S}_{[t+1]}^\top \\ &= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right) \left(\mathbf{U} - \mathbf{W} \left(\mathbf{S}^0_{[t]}\right)^\top \right) \\
+&= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right)\mathbf{T} \left(\mathbf{V} - \mathbf{K}(\mathbf{S}^0_{[t]})^\top)\right) \;\;\in \mathbb{R}^{C \times d_v}


There appears to be a typo with an extra closing parenthesis in the formula for O_{[t]}. The term \mathbf{K}(\mathbf{S}^0_{[t]})_\top) should likely be \mathbf{K}(\mathbf{S}^0_{[t]})_\top.

Suggested change

&= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right)\mathbf{T} \left(\mathbf{V} - \mathbf{K}(\mathbf{S}^0_{[t]})^\top)\right) \;\;\in \mathbb{R}^{C \times d_v}

&= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right)\mathbf{T} \left(\mathbf{V} - \mathbf{K}(\mathbf{S}^0_{[t]})_\top\right) \;\;\in \mathbb{R}^{C \times d_v}

the closing parenthesis is indeed redundant.
^\top is exactly fitting there, no problem.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

fla/ops/delta_rule/README.md (5)
5-11: Fix inline math fencing and tighten notation.

The inline equation uses mixed backticks inside math: $…$. This will render incorrectly in many Markdown engines.

Minor: vectors/matrices alternate between \bf{} and \mathbf{}; prefer a single convention (typically \mathbf{}) for consistency.

Apply this diff to fix the inline math and unify boldface for the shown line:
-To reduce notational clutter, we focus on the first chunk, denoting $\mathbf{S}^r=\mathbf{S}_{[1]}^r$. By unrolling the recurrence $`S_t = S_{t-1}(I - \beta_t k_t k_t^\top) + \beta_t v_t k_t^\top`$, we have:
+To reduce notational clutter, we focus on the first chunk, denoting $\mathbf{S}^r=\mathbf{S}_{[1]}^r$. By unrolling the recurrence $S_t = S_{t-1}(\mathbf{I} - \beta_t \mathbf{k}_t \mathbf{k}_t^\top) + \beta_t \mathbf{v}_t \mathbf{k}_t^\top$, we have:
Optionally add a short “Notation” bullet list (dims for $\mathbf{K},\mathbf{V},\mathbf{Q},\mathbf{S}^0$, and chunk length $C$) right after this paragraph.

20-22: WY form looks right; keep symbol style consistent.

Derivation and dims check out. Consider switching \bf{} → \mathbf{} for $,\mathbf{w}^i,\mathbf{k}^i,$ to match the rest.

56-62: Matrix forms for P and H are dimensionally consistent.

$(\mathbf{W}^\top\mathbf{K})=\sum_i \mathbf{w}^i\mathbf{k}^{i\top}$ and $(\mathbf{U}^\top\mathbf{K})=\sum_i \mathbf{u}^i\mathbf{k}^{i\top}$ are accurate. Consider stating shapes of $\mathbf{K}\in\mathbb{R}^{C\times d_k}$ and $\mathbf{V}\in\mathbb{R}^{C\times d_v}$ explicitly here.

65-71: Avoid explicit matrix inverse in implementations.

The triangular solve form is correct, but writing $(\cdot)^{-1}$ can invite naïve implementations. Recommend phrasing as “solve lower‑triangular system” to encourage using forward substitution instead of forming the inverse.

Proposed wording tweak:
-\mathbf{W} &= \left(\mathbf{I} + \mathrm{tril}(\mathrm{diag}(\beta) \mathbf{K}\mathbf{K}^\top, -1)\right)^{-1}\mathrm{diag}(\beta) \mathbf{K}\\
-\mathbf{U} &= \left(\mathbf{I} + \mathrm{tril}(\mathrm{diag}(\beta) \mathbf{K}\mathbf{K}^\top, -1)\right)^{-1}\mathrm{diag}(\beta) \mathbf{V}
+\text{Solve } \left(\mathbf{I} + \mathrm{tril}(\mathrm{diag}(\beta)\,\mathbf{K}\mathbf{K}^\top,-1)\right)\mathbf{W}=\mathrm{diag}(\beta)\,\mathbf{K}\\
+\text{Solve } \left(\mathbf{I} + \mathrm{tril}(\mathrm{diag}(\beta)\,\mathbf{K}\mathbf{K}^\top,-1)\right)\mathbf{U}=\mathrm{diag}(\beta)\,\mathbf{V}
76-81: Use boldface for T consistently and define “UT transform.”

Lines 79–80 use T instead of \mathbf{T}.

“UT transform” is introduced but not expanded; clarify it (e.g., unit‑lower triangular transform scaled by diag(β)).

Apply this diff:
-\mathbf{P} &= \mathbf{I} - \mathbf{K}^\top T^\top \mathbf{K} \;\;\in \mathbb{R}^{d_k \times d_k}\\
-\mathbf{H} &= \mathbf{V}^\top T^\top \mathbf{K} \;\;\in \mathbb{R}^{d_v \times d_k}
+\mathbf{P} &= \mathbf{I} - \mathbf{K}^\top \mathbf{T}^\top \mathbf{K} \;\;\in \mathbb{R}^{d_k \times d_k}\\
+\mathbf{H} &= \mathbf{V}^\top \mathbf{T}^\top \mathbf{K} \;\;\in \mathbb{R}^{d_v \times d_k}
And add a parenthetical after the first mention of $\mathbf{T}$: “(a unit‑lower triangular transform scaled by $\mathrm{diag}(\beta)$)”.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d48579 and e6a077e.

📒 Files selected for processing (1)

fla/ops/delta_rule/README.md (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: check-pt-python-compatibility

🔇 Additional comments (5)

fla/ops/delta_rule/README.md (5)

28-34: Induction step for P is correct.

Algebra is sound and matches $,\mathbf{P}^r=\mathbf{I}-\sum_i \mathbf{w}^i\mathbf{k}^{i\top},$ with the stated recursion.

40-41: H representation and recursion are consistent.

Dims and induction base/step align with the construction of $\mathbf{H}^r$.

47-53: Induction for H expansion checks out.

No issues; the split via $\mathbf{P}_{i+1}^r$ is correct.

88-91: State/update equations are algebraically consistent.

The transitions from $(\mathbf{P},\mathbf{H})$ to $(\mathbf{W},\mathbf{U})$ and then to $(\mathbf{T},\mathbf{K},\mathbf{V})$ check out dimensionally and algebraically.

If there is a unit test or benchmark notebook that exercises this form, please link it in this README section for quick cross‑reference.

98-99: Nice hardware‑oriented conclusion.

Clear guidance that this maps to matmuls; good for readers implementing kernels.

coderabbitai · 2025-09-17T01:40:02Z

fla/ops/delta_rule/README.md

+Substituting these compact forms back into the state update and output equations yields the hardware-efficient chunkwise algorithm. For a given chunk $[t]$ with initial state $`\mathbf{S}^0_{[t]}`$, the final state $`\mathbf{S}_{[t+1]}`$ and output $`\mathbf{O}_{[t]}`$ are:
 ```math
 \begin{equation}
 \begin{aligned}
-\mathbf{S} &= \mathbf{P}\cdot\mathbf{S}^0 + \mathbf{H} \\
-&= \mathbf{S}^0 + \mathbf{K}^\top (\mathbf{U} -\mathbf{W} \mathbf{S}^0) \in \mathbb{R}^{d_k \times d_v}\\
-\mathbf{O} &= \mathbf{Q} \mathbf{S}^0 + (\mathbf{Q} \mathbf{K}^{\top} \odot \mathbf{M}) \left(\mathbf{U} - \mathbf{W} \mathbf{S}^0\right) \in \mathbb{R}^{C \times d_v}
+\mathbf{S}_{[t+1]} &= \mathbf{S}^0_{[t]} \mathbf{P} + \mathbf{H} \\
+&= \mathbf{S}^0_{[t]} + \left(\mathbf{U}^\top -\mathbf{S}^0_{[t]} \mathbf{W}^\top\right) \mathbf{K} \\
+&= \mathbf{S}^0_{[t]} + \left(\mathbf{V}^\top - \mathbf{S}^0_{[t]} \mathbf{K}^\top\right) \mathbf{T}^\top \mathbf{K} \;\;\in\mathbb{R}^{d_v \times d_k} \\
+\mathbf{O}_{[t]} &= \mathbf{Q}_{[t]}\mathbf{S}_{[t+1]}^\top \\ &= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right) \left(\mathbf{U} - \mathbf{W} \left(\mathbf{S}^0_{[t]}\right)^\top \right) \\
+&= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right)\mathbf{T} \left(\mathbf{V} - \mathbf{K}(\mathbf{S}^0_{[t]})^\top)\right) \;\;\in \mathbb{R}^{C \times d_v}
 \end{aligned}
 \end{equation}
 ```


⚠️ Potential issue

Fix parentheses, define the mask M, and boldface T in the output formula.

Line 92 has mismatched parentheses: there is an extra ); rendering will break.

$\mathbf{M}$ is used but not defined; specify it (e.g., a causal lower‑triangular mask induced by the UT structure, with shape $\mathbb{R}^{C\times C}$).

Use $\mathbf{T}$ (bold) for consistency.

Apply this diff:

-\mathbf{O}_{[t]} &= \mathbf{Q}_{[t]}\mathbf{S}_{[t+1]}^\top \\ &= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right) \left(\mathbf{U} - \mathbf{W} \left(\mathbf{S}^0_{[t]}\right)^\top \right) \\ -&= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right)\mathbf{T} \left(\mathbf{V} - \mathbf{K}(\mathbf{S}^0_{[t]})^\top)\right) \;\;\in \mathbb{R}^{C \times d_v} +\mathbf{O}_{[t]} &= \mathbf{Q}_{[t]}\mathbf{S}_{[t+1]}^\top \\ &= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right) \left(\mathbf{U} - \mathbf{W} \left(\mathbf{S}^0_{[t]}\right)^\top \right) \\ +&= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right)\mathbf{T}\left(\mathbf{V} - \mathbf{K}(\mathbf{S}^0_{[t]})^\top\right) \;\;\in \mathbb{R}^{C \times d_v}

Please also add a one‑line definition of $\mathbf{M}$ (shape, values, and why the Hadamard product appears here). If the Hadamard masking follows from the strictly lower‑triangular structure of $\mathbf{T}$, state that explicitly.

🤖 Prompt for AI Agents

In fla/ops/delta_rule/README.md around lines 84 to 95, the displayed equations contain a mismatched closing parenthesis on line ~92, the matrix T is not consistently boldfaced in the output formula, and the mask M is used but undefined; fix by removing the extra “)” so the parentheses balance, change T to bold \mathbf{T} everywhere in the output expression for consistency, and add a one‑line definition of \mathbf{M} immediately after the equations stating: “\mathbf{M}\in\mathbb{R}^{C\times C} is a causal lower‑triangular (0/1) mask induced by the UT structure—ones on the allowed (causal) positions and zeros elsewhere—so the Hadamard product \odot enforces causal masking of \mathbf{Q}_{[t]}\mathbf{K}^\top (this masking follows from the strictly lower‑triangular structure of \mathbf{T}).”

SeepingFragranceLock · 2025-09-17T01:43:29Z

For the impact of handedness and the order of cumulative Householder product series, see this discussion

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

fla/ops/delta_rule/README.md (2)

5-5: Remove backticks inside inline math.

Backticks render code, not math. Use plain LaTeX inside $…$.

-To reduce notational clutter, we focus on the first chunk, denoting $\mathbf{S}^r=\mathbf{S}_{[1]}^r$. By unrolling the recurrence $`S_t = S_{t-1}(I - \beta_t k_t k_t^\top) + \beta_t v_t k_t^\top`$, we have:
+To reduce notational clutter, we focus on the first chunk, denoting $\mathbf{S}^r=\mathbf{S}_{[1]}^r$. By unrolling the recurrence $S_t = S_{t-1}(I - \beta_t k_t k_t^\top) + \beta_t v_t k_t^\top$, we have:

91-92: Define mask M right after the equations.

You use $\mathbf{M}$ but don’t define it; add a one‑liner.

 \end{equation}

+\noindent$\mathbf{M}\in\mathbb{R}^{C\times C}$ denotes the causal lower‑triangular (0/1) mask induced by the UT structure (ones on allowed causal positions, zeros elsewhere). The Hadamard product $\odot$ enforces causal masking of $\mathbf{Q}_{[t]}\mathbf{K}^\top$, matching the strictly lower‑triangular support of $\mathbf{T}$.
+
In this final form, the intra-chunk recurrence has been transformed into a series of efficient matrix multiplications (e.g., computing $\mathbf{T}$, $\mathbf{W}$, $\mathbf{U}$, and the final output), which can be highly optimized on modern hardware like GPUs.


</blockquote></details>

</blockquote></details>

<details>
<summary>🧹 Nitpick comments (4)</summary><blockquote>

<details>
<summary>fla/ops/delta_rule/README.md (4)</summary><blockquote>

`9-11`: **Use \mathbf consistently instead of deprecated \bf.**

Standardize bold symbols for vectors/matrices.


```diff
-... (\mathbf{I} - \beta^i \bf{k}^i \bf{k}^{i\top}) ...
-... \beta^i \bf{v}^i\bf{k}^{i\top} ...
+... (\mathbf{I} - \beta^i \mathbf{k}^i \mathbf{k}^{i\top}) ...
+... \beta^i \mathbf{v}^i\mathbf{k}^{i\top} ...

-\mathbf{P}^{r} = \mathbf{I} - \sum_{i=1}^{r}\bf{w}^i\bf{k}^{i\top}
-\bf{w}^r = \beta^r \left(\bf{k}^r -  \sum_{i=1}^{r-1} \left(\bf{k}^{i\top}\bf{k}^r \right)\bf{w}^i  \right)
+\mathbf{P}^{r} = \mathbf{I} - \sum_{i=1}^{r}\mathbf{w}^i\mathbf{k}^{i\top}
+\mathbf{w}^r = \beta^r \left(\mathbf{k}^r -  \sum_{i=1}^{r-1} \left(\mathbf{k}^{i\top}\mathbf{k}^r \right)\mathbf{w}^i  \right)

-... \beta^r \bf{k}^r \bf{k}^{r\top} ...
-... \sum_{i=1}^{r-1}\bf{w}^i\bf{k}^{i\top} ...
+... \beta^r \mathbf{k}^r \mathbf{k}^{r\top} ...
+... \sum_{i=1}^{r-1}\mathbf{w}^i\mathbf{k}^{i\top} ...

-\mathbf{H}^{r} = \sum_{i=1}^{r} \bf{u}^i \bf{k}^{i\top}
-\bf{u}^r = \beta^r \left(\bf{v}^r -  \sum_{i=1}^{r-1} \left(\bf{k}^{i\top}\bf{k}^r\right) \bf{u}^i \right)
+\mathbf{H}^{r} = \sum_{i=1}^{r} \mathbf{u}^i \mathbf{k}^{i\top}
+\mathbf{u}^r = \beta^r \left(\mathbf{v}^r -  \sum_{i=1}^{r-1} \left(\mathbf{k}^{i\top}\mathbf{k}^r\right) \mathbf{u}^i \right)

-... \sum_{i=1}^{r-1}\bf{u}^i \bf{k}^{i\top} ...
-... \beta^r \bf{v}^r \bf{k}^{r\top} ...
+... \sum_{i=1}^{r-1}\mathbf{u}^i \mathbf{k}^{i\top} ...
+... \beta^r \mathbf{v}^r \mathbf{k}^{r\top} ...

Also applies to: 20-22, 28-34, 40-41, 47-53

68-70: Prefer \operatorname for operators (diag, tril).

Improves LaTeX typesetting; math operators won’t appear in roman text.

-\mathbf{W} &= \left(\mathbf{I} + \mathrm{tril}(\mathrm{diag}(\beta) \mathbf{K}\mathbf{K}^\top, -1)\right)^{-1}\mathrm{diag}(\beta) \mathbf{K}\\
-\mathbf{U} &= \left(\mathbf{I} + \mathrm{tril}(\mathrm{diag}(\beta) \mathbf{K}\mathbf{K}^\top, -1)\right)^{-1}\mathrm{diag}(\beta) \mathbf{V}
+\mathbf{W} &= \left(\mathbf{I} + \operatorname{tril}(\operatorname{diag}(\beta)\,\mathbf{K}\mathbf{K}^\top, -1)\right)^{-1}\operatorname{diag}(\beta)\,\mathbf{K}\\
+\mathbf{U} &= \left(\mathbf{I} + \operatorname{tril}(\operatorname{diag}(\beta)\,\mathbf{K}\mathbf{K}^\top, -1)\right)^{-1}\operatorname{diag}(\beta)\,\mathbf{V}

-\mathbf{T} &= \left(\mathbf{I} + \mathrm{tril}\left(\mathrm{diag}(\beta)\mathbf{K} \mathbf{K}^\top,-1\right)\right)^{-1}\mathrm{diag}\left(\beta\right)
+\mathbf{T} &= \left(\mathbf{I} + \operatorname{tril}\!\left(\operatorname{diag}(\beta)\,\mathbf{K} \mathbf{K}^\top,-1\right)\right)^{-1}\operatorname{diag}\!\left(\beta\right)

Also applies to: 76-76

79-80: Boldface T for consistency.

Use $\mathbf{T}$ where introduced as a matrix.

-\mathbf{P} &= \mathbf{I} - \mathbf{K}^\top T^\top \mathbf{K} \;\;\in \mathbb{R}^{d_k \times d_k}\\
-\mathbf{H} &= \mathbf{V}^\top T^\top \mathbf{K} \;\;\in \mathbb{R}^{d_v \times d_k}
+\mathbf{P} &= \mathbf{I} - \mathbf{K}^\top \mathbf{T}^\top \mathbf{K} \;\;\in \mathbb{R}^{d_k \times d_k}\\
+\mathbf{H} &= \mathbf{V}^\top \mathbf{T}^\top \mathbf{K} \;\;\in \mathbb{R}^{d_v \times d_k}

84-85: Also remove backticks around bold symbols in this sentence.

Inline code formatting breaks math rendering.

-Substituting these compact forms back into the state update and output equations yields the hardware-efficient chunkwise algorithm. For a given chunk $[t]$ with initial state $`\mathbf{S}^0_{[t]}`$, the final state $`\mathbf{S}_{[t+1]}`$ and output $`\mathbf{O}_{[t]}`$ are:
+Substituting these compact forms back into the state update and output equations yields the hardware-efficient chunkwise algorithm. For a given chunk $[t]$ with initial state $\mathbf{S}^0_{[t]}$, the final state $\mathbf{S}_{[t+1]}$ and output $\mathbf{O}_{[t]}$ are:

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e6a077e and 053afe8.

📒 Files selected for processing (1)

fla/ops/delta_rule/README.md (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: check-pt-python-compatibility

🔇 Additional comments (2)

fla/ops/delta_rule/README.md (2)

91-92: Clarify exactness of the masked form.

Confirm whether $(\mathbf{Q}\mathbf{K}^\top)\mathbf{T}$ is exactly equal to $(\mathbf{Q}\mathbf{K}^\top \odot \mathbf{M})\mathbf{T}$ for your $\mathbf{T}$, or if masking is an implementation detail. If exact, add a brief lemma/note stating the equality and its conditions (unit/strict lower‑triangular $\mathbf{T}$, chunkwise causality).

56-61: Derivation looks correct and consistent with the transposed factorization.

Left‑multiplication order and dimensions check out; the UT reparameterization aligns with $ \mathbf{W}=\mathbf{T}\mathbf{K}, \mathbf{U}=\mathbf{T}\mathbf{V}$ and yields the stated state/output forms.

If you want, I can add a short “Shapes and dataflow” box diagram to this README to aid implementers.

Also applies to: 73-81, 88-92

yzhangcs · 2025-09-17T07:54:32Z

@SeepingFragranceLock Hi, please note the shape of $\mathbf{S}$ in README is $d_k\times d_v$. So the derivations are correct.

SeepingFragranceLock · 2025-09-19T12:48:17Z

@SeepingFragranceLock Hi, please note the shape of S in README is d k × d v . So the derivations are correct.

oops... I have not checked that.
I think it got some point to transpose $S$ for a final clean result, however, that means it also changed the order of equations 4 and transposed $P$ and $H$ defined in the paper.

For clarity, I suggest provide the modified recurrence equations for this README and set the accumulative Householder products accordingly to the transpose $(\prod_{i=1}^r (\mathbf{I} - \beta^i \bf{k}^i \bf{k}^{i\top}))^\top$.

gemini-code-assist bot reviewed Sep 17, 2025

View reviewed changes

coderabbitai bot reviewed Sep 17, 2025

View reviewed changes

delete the extra parenthesis.

053afe8

coderabbitai bot reviewed Sep 17, 2025

View reviewed changes

yzhangcs force-pushed the main branch from c4020bb to 40a8fbc Compare October 27, 2025 06:26

zhiyuan1i force-pushed the main branch 3 times, most recently from a4e1a1c to 90a0fea Compare October 27, 2025 06:43

yzhangcs force-pushed the main branch from 90a0fea to 2e0f6c3 Compare October 27, 2025 08:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update README.md of ops delta_rule #595

Update README.md of ops delta_rule #595

SeepingFragranceLock commented Sep 17, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 17, 2025 •

edited

Loading

Other AI code review bot(s) detected

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Sep 17, 2025

Uh oh!

SeepingFragranceLock Sep 17, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 17, 2025

Uh oh!

SeepingFragranceLock commented Sep 17, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

yzhangcs commented Sep 17, 2025

Uh oh!

SeepingFragranceLock commented Sep 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	&= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right)\mathbf{T} \left(\mathbf{V} - \mathbf{K}(\mathbf{S}^0_{[t]})^\top)\right) \;\;\in \mathbb{R}^{C \times d_v}
	&= \mathbf{Q}_{[t]} \left(\mathbf{S}^0_{[t]}\right)^\top + \left(\mathbf{Q}_{[t]} \mathbf{K}^{\top} \odot \mathbf{M}\right)\mathbf{T} \left(\mathbf{V} - \mathbf{K}(\mathbf{S}^0_{[t]})_\top\right) \;\;\in \mathbb{R}^{C \times d_v}

Update README.md of ops delta_rule #595

Are you sure you want to change the base?

Update README.md of ops delta_rule #595

Conversation

SeepingFragranceLock commented Sep 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

SeepingFragranceLock Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

SeepingFragranceLock commented Sep 17, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

yzhangcs commented Sep 17, 2025

Uh oh!

SeepingFragranceLock commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SeepingFragranceLock commented Sep 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 17, 2025 •

edited

Loading

SeepingFragranceLock commented Sep 19, 2025 •

edited

Loading