-
Notifications
You must be signed in to change notification settings - Fork 58
Auditing improvements #470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #470 +/- ##
==========================================
+ Coverage 88.61% 89.03% +0.41%
==========================================
Files 39 38 -1
Lines 9109 7632 -1477
==========================================
- Hits 8072 6795 -1277
+ Misses 1037 837 -200 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
akd/src/auditor.rs
Outdated
| let manager1 = StorageManager::new_no_cache( | ||
| AsyncInMemoryDatabase::new_with_remove_child_nodes_on_insertion(), | ||
| ); | ||
| let mut azks1 = Azks::new::<TC, _>(&manager1).await?; | ||
| azks1 | ||
| .batch_insert_nodes::<TC, _>( | ||
| &manager1, | ||
| proof.unchanged_nodes.clone(), | ||
| InsertMode::Auditor, | ||
| AzksParallelismConfig::default(), | ||
| ) | ||
| .await?; | ||
| let computed_start_root_hash: Digest = azks1.get_root_hash::<TC, _>(&manager1).await?; | ||
| if computed_start_root_hash != start_hash { | ||
| return Err(AkdError::AzksErr(AzksError::VerifyAppendOnlyProof( | ||
| format!( | ||
| "Start hash {} does not match computed root hash {}", | ||
| hex::encode(start_hash), | ||
| hex::encode(computed_start_root_hash) | ||
| ), | ||
| ))); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we're mostly doing the same thing in the StorageManager instances we're creating here (i.e. d manager1 and manager2). That is:
- Creating the per-level cache + an akzs
- Inserting some set of nodes into the azks
- Asserting the resultant root hash is equal to an expected root hash
With the above in mind, do you think it might make sense to define a helper-like function which captures the common? E.g.
async fn verify_append_only_hash<TC:Configuration>(
nodes: Vec<AzksElement>,
expected_hash: Digest,
latest_epoch: Option<u64>,
) -> Result<(), AkdError> {
let manager = StorageManager::new_no_cache(
AsyncInMemoryDatabase::new_with_remove_child_nodes_on_insertion(),
);
let mut azks = Azks::new::<TC, _>(&manager1).await?;
if let Some(epoch) = latest_epoch {
azks.latest_epoch = epoch;
}
azks
.batch_insert_nodes::<TC, _>(
&manager,
nodes,
InsertMode::Auditor,
AzksParallelismConfig::default(),
)
.await?;
let computed_root_hash: Digest = azks.get_root_hash::<TC, _>(&manager).await?;
if computed_root_hash != expected_hash {
return Err(AkdError::AzksErr(AzksError::VerifyAppendOnlyProof(
format!(
"Expected hash {} does not match computed root hash {}",
hex::encode(expected_hash),
hex::encode(computed_root_hash)
),
)));
}
}If we do something like that, then I think this function essentially becomes:
pub async fn verify_consecutive_append_only<TC: Configuration>(
proof: &SingleAppendOnlyProof,
start_hash: Digest,
end_hash: Digest,
end_epoch: u64,
) -> Result<(), AkdError> {
let _unchanged = verify_append_only_hash::<TC>(
proof.unchanged_nodes.clone(),
start_hash,
None,
).await?;
let unchanged_with_inserted_nodes = proof.unchanged_nodes.clone().extend(proof.inserted.iter().map(|x| {
let mut y = *x;
y.value = AzksValue(TC::hash_leaf_with_commitment(x.value, end_epoch).0);
y
}));
let _changed = verify_append_only_hash::<TC>(
unchanged_with_inserted_nodes,
end_hash,
Some(end_epoch - 1),
).await
}Note: I didn't run, nor did I format, any of the code above. It's just meant to reflect an idea to reduce some duplication, but please feel free to ignore if you prefer what's here.
| /// technique takes advantage of the way batch insertion of nodes into the tree works, | ||
| /// since we always process all of the children of a particular subtree before processing | ||
| /// the root of that subtree. | ||
| remove_child_nodes_on_insertion: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally not the biggest fan of using a boolean to differentiate behavior, as I'd like to use something like a different type to reflect that the in-memory store we're using doesn't store everything, but I think that's a bit more of a rework than what we have now.
As such, I think what we have here is sufficient since we need to influence how inner workings of batch_set and using something like a newtype isn't necessarily going to make that easy given that we're not calling something before or after existing functionality. Additionally, you've commented this really well so it's pretty clear + the associated function to instantiate is super clear 👍
d7577cf to
59be8b4
Compare
59be8b4 to
5697d89
Compare
(Depends on #460, which should be merged first)
The goal of this change is to improve the overall efficiency of the auditor function (
verify_consecutive_append_only()) which is used to verify the validity of an append-only proof between two epochs. Note that this should be completely backwards-compatible with the previous way of verifying (and generating) audit proofs -- no actual correctness logic has changed.As an example of the improvements, here is a before and after of the result of attempting to audit epoch 714064, a particularly large proof of size 291MB. The below examples are run on my laptop, with the previous performance taking 2 minutes and 9GB of RAM to audit, and the new performance taking 21 seconds and only 760MB to audit.
Old behavior:
New behavior:
How it works
The way auditing works is by creating two new Azks instances from scratch, building each one up by inserting nodes (the first one from the proof's unchanged nodes and the second one from the proof's unchanged + inserted nodes), and then checking that the root hash of each Azks tree matches the expected root hash (the start hash or the end hash). Note that during this process, we are building the tree just for the purpose of computing the root hash, and nothing else -- the trees are discarded afterwards.
In the previous behavior, we would use an
AsyncInMemoryDatabaseto host all of the storage for the nodes that we insert into the tree (along with the resulting intermediary tree nodes). This means that the Azks instance would keep all of the nodes that have been inserted into the tree in memory at all times, up until the final root hash is computed.In the new behavior, we still use an
AsyncInMemoryDatabase, but we enable a flag which more aggressively removes nodes that no longer need to be kept in memory in order to compute the root hash. In particular, whenever we attempt to add a parent node to storage, we remove the left and right children from storage, if they exist. This takes advantage of the fact that batch insertion works on a level-by-level basis, computing the children hashes before computing their parent hash. So, once a parent hash has been computed, we no longer need the intermediary computations (corresponding to the child hashes).