Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Proposal: Branch Read Optimization#10061

Open
N-o-Z wants to merge 2 commits intomasterfrom
proposal/branch-read-opt
Open

Proposal: Branch Read Optimization#10061
N-o-Z wants to merge 2 commits intomasterfrom
proposal/branch-read-opt

Conversation

@N-o-Z
Copy link
Member

@N-o-Z N-o-Z commented Jan 28, 2026

Proposal for branch read optimization

@N-o-Z N-o-Z requested review from a team and ozkatz January 28, 2026 02:06
@N-o-Z N-o-Z self-assigned this Jan 28, 2026
@N-o-Z N-o-Z added proposal exclude-changelog PR description should not be included in next release changelog minor-change Used for PRs that don't require issue attached labels Jan 28, 2026
Comment on lines +197 to +200
if !branch.Dirty {
branch.Dirty = true
// This triggers a branch record update
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too late to update it now. There's already an entry in staging. If another read request comes, it should be able to see it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right - we must first set dirty bit before any write

@N-o-Z N-o-Z requested a review from itaiad200 January 29, 2026 02:04
@ozkatz
Copy link
Collaborator

ozkatz commented Feb 2, 2026

Out of curiosity - do we know how common this case is? (i.e. % of reads that go to "clean" branches?) Might help us understand if the added complexity is worth it.

Perhaps a more granular approach could be beneficial: we can go further than a single boolean and maintain a small bitmap: one bit per range - essentially lakeFS' take on dirty pages.
Instead of reading the bool, read the bitmap; and if the requested range(s) are 0, you're good to read just the underlying committed data. if it's 1, check staging. on write, mark 1 for any range affected by the write.

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really great idea! Enhance the structure of branches to boost performance.

But not sure how this exact proposal can work correctly at a branch level - I think it will be easier at the token level. Requesting changes to understand how we know a branch is clean.

1. Attempts to read from the current staging token.
2. Falls back to committed data if not found.

This happens **even when the branch has no staged changes**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a workaround, users can read lakefs://repo/branch@/. Obviously in order to do this the user needs to know that this is what they want. (Alternatively, this could be an easy way to show the performance difference!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already know the performance difference as for example we have hard limits in DDB for read (3000/s) and writes (1000/s)


### High-level idea

Introduce a **branch-level boolean flag**:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative might measure dirtiness per token. Doing this could additionally reduce read pressure during commits - when read pressure can anyway be higher.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read decision in Graveler is inherently branch-level: if any token (current or sealed) may contain entries, reads must consult staging. Tracking dirtiness per token doesn’t change that requirement and mostly adds state and maintenance complexity.

Commit-time read pressure is transient; the hot-partition issue we’re addressing is steady-state reads on clean branches. A branch-level dirty flag targets that directly with much lower risk.


```
dirty = true ⟺ (StagingToken has entries) OR (SealedTokens is non-empty)
dirty = false ⟺ (StagingToken is empty) AND (SealedTokens is empty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intermediate cases exist! For instance while committing a clean branch (assume appropriate flags!) there is an empty staging token but non-empty sealed tokens. This proposal forces lakeFS to consider the branch dirty even though it could know it is clean.

The correctness requirement for dirty is that if there are changes to the branch then dirty``. An additional _performance_ requirement is that if dirty` then usually there are changes to the branch.


For existing branches without the `dirty` field:

- **Default value: `true`** (conservative/safe)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version of Google protobufs has all fields optional. That means no default values. I therefore suggest using a clean field instead - the default ! false is precisely what we want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. We can invert the logic accordingly without changing the design

Operations that guarantee the absence of uncommitted changes set `dirty = false`.

Examples:
- successful commit (after sealed tokens are cleared)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand: writes can occur concurrently with the commit, and there might even be other concurrent commits. So there may be sealed tokens, or the staging token could already be dirty. You could work around the first, but I do not see how to work around the second.
(This may be an argument in favour of dirt-per-token.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearing dirty is not unconditional. It must be done via a conditional branch update that succeeds only if staging and sealed tokens are empty at that moment.
If a concurrent write or another commit introduces staged data, the condition fails and dirty remains true. False positives are acceptable; false negatives are not. This is the same concurrency pattern already used for commit ID and token rotation, and per-token dirtiness doesn’t eliminate the need for these conditional checks.

@N-o-Z
Copy link
Member Author

N-o-Z commented Feb 5, 2026

Out of curiosity - do we know how common this case is? (i.e. % of reads that go to "clean" branches?) Might help us understand if the added complexity is worth it.

We don't really know how common this case is - this is the reason for suggesting the phased implementation which introduced the metrics to provide visibility.

Perhaps a more granular approach could be beneficial: we can go further than a single boolean and maintain a small bitmap: one bit per range - essentially lakeFS' take on dirty pages. Instead of reading the bool, read the bitmap; and if the requested range(s) are 0, you're good to read just the underlying committed data. if it's 1, check staging. on write, mark 1 for any range affected by the write.

I am worried the bitmap approach might add a lot of complexity to this solution and create additional dangerous pitfalls:

  • Since ranges aren’t stable in Graveler due to compaction and layout evolution tying correctness to “range IDs” adds fragile, correctness-critical logic.
  • Extra work on the read path: To consult a bitmap you first need to resolve key/prefix -> range(s), which likely requires additional metadata reads and can cost as much as the staging lookup we’re trying to avoid (especially for List).
  • Higher write and coordination cost: Updating a shared bitmap on writes adds contention and write amplification and may just move the hot spot elsewhere.

The branch-level dirty flag addresses the bottleneck (hot partitions) with far less complexity and risk.

Additionally, this flag gives us a simple way to validate how frequently this scenario actually occurs before considering more granular optimizations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

exclude-changelog PR description should not be included in next release changelog minor-change Used for PRs that don't require issue attached proposal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants