-
Couldn't load subscription status.
- Fork 140
[base3-encoding] Add solana base3 encoding crate #200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@0x0ece It would be great to get your feedback on this! |
|
Had a private discussion with @0x0ece. As per discussion, I updated the encoding/decoding to use little-endian instead of big-endian. Currently I use In terms of the length of the encoded bytes for length 4096 bit-vectors, we have Using u8 gives the more compact encoding, but with just 4 additional bytes, an implementation using |
|
Yeah, I think we can spare 4 bytes for better performance. |
|
Two minor things (again, sorry...)
|
base3-encoding/src/lib.rs
Outdated
| } | ||
|
|
||
| const BASE3_SYMBOL_PER_CHUNK: usize = 80; | ||
| const ENCODED_BYTES_PER_CHUNK: usize = 16; // std::mem::size_of::<u128>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we decided to use u64?
base3-encoding/src/lib.rs
Outdated
| /// 3. Iterates through the input vectors in chunks of 80 bits. | ||
| /// 4. For each chunk, it converts the 80 bit-pairs into 80 ternary digits and | ||
| /// constructs a `u128` number from them. | ||
| /// 5. This `u128` number is appended to the output buffer as 16 big-endian bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we use little-endian now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed!
base3-encoding/src/lib.rs
Outdated
| result.extend_from_slice(&(bit_vec_base.len() as u16).to_le_bytes()); | ||
|
|
||
| // Process the bits of `bit_vec_base` and `bit_vec_fallback` in chunks of 80 bits | ||
| for (base_chunk, fallback_chunk) in bit_vec_base.chunks(80).zip(bit_vec_fallback.chunks(80)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to use BASE3_SYMBOL_PER_CHUNK instead of 80 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
base3-encoding/src/lib.rs
Outdated
| return Err(DecodeError::CorruptDataPayload); | ||
| } | ||
|
|
||
| let mut bit_vec_base = BitVec::with_capacity(total_bits); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we allow passing in the maximum expected bit vector length (probably 4096) so we can return CorruptDataPayload early?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point! I added a max_len parameter to the decode function, which we can set to be 4096 for alpenglow certificates.
Yeah that is actually a nice point. Seems like u8 will give a more readable implementation. If we use u64 or u128, then the symbol will depend on the position within the 40 or 80 symbol chunk. I am happy to go with the simpler u8 route. @wen-coding, are you okay with using u8? If so, I can make the change and push. I think we can always optimize it further or experiment with u64 again before the full alpenglow launch if it turns out to be too slow. |
|
Sure, I think u8 also works
…On Tue, Jun 24, 2025 at 08:50 samkim-crypto ***@***.***> wrote:
*samkim-crypto* left a comment (anza-xyz/solana-sdk#200)
<#200 (comment)>
The reason I was proposing u8 was not just to save 4 bytes, but mostly for
consistency of representation. Think for example to 5 ternary-bits: (0, 2,
1, 0, 0) -> 3+9 -> 0x0C. If we use the u8 implementation, the same sequence
repeated would always translate into 0x0C, e.g. (0, 2, 1, 0, 0, 0, 2, 1, 0,
0, 0, 2, 1, 0, 0, 0, 2, 1, 0, 0) -> 0x0C0C0C0C. If we use the other
representations the numbering will be more confusing to read. Imo the perf
speed up of u64 is negligible (considering we're doing BLS when we do
this), so I'd rather keep it simple with u8. (I'm sure there's ways to use
avx and speed it up, but not sure it's worth it)
Yeah that is actually a nice point. Seems like u8 will give a more
readable implementation. If we use u64 or u128, then the symbol will depend
on the position within the 40 or 80 symbol chunk. I am happy to go with the
simpler u8 route.
@wen-coding <https://github.com/wen-coding>, are you okay with using u8?
If so, I can make the change and push. I think we can always optimize it
further or experiment with u64 again before the full alpenglow launch if it
turns out to be too slow.
—
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3FJ5FKJO7FJGM6ZFUVJB633FCOFVAVCNFSM6AAAAAB7SLTK2OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSOJYGQYDOOJWGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
87d0712 to
4831277
Compare
|
Okay, I addressed all the comments from the review. I also ended up making the encoding and decoding scheme to work with byte arrays and made the |
| .checked_div(8) | ||
| .ok_or(EncodeError::ArithmeticOverflow)?; | ||
|
|
||
| if base_bytes.len() < required_bytes || fallback_bytes.len() < required_bytes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it okay if base_bytes.len() and fallback_bytes.len() > required_bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the way the function signature is defined is that the num_bits represent the exact number of bits that should be encoded from base_bytes and fallback_bytes. If base_bytes.len() or fallback_bytes.len() is greater than the required bytes, the extra bits will just be ignored. I'll write a comment to clarify that.
base3-encoding/src/lib.rs
Outdated
| .ok_or(EncodeError::ArithmeticOverflow)?; | ||
|
|
||
| let capacity = total_byte_length | ||
| .checked_add(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a note here saying we use 2 bytes to hold the byte length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
| let byte_idx = i / 8; | ||
| let bit_idx = i % 8; | ||
|
|
||
| let base_bit = (base_bytes[byte_idx] >> bit_idx) & 1 == 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I have a dumb question, why did we switch to raw bytes instead of using bitvec as input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I thought bitvec is quite specific, so I made the bitvec crate an optional dependency. In theory, projects that do not use the bitvec crate, but still wants to use the solana-base3-encoding crate can choose to use it without adding a dependency to it.
It is a little hard to imagine a project needing to use this very specific logic, but in the future, maybe there can be another rust based Solana client that wants to implement alpenglow, but does not want to use the bitvec crate for their bit-vector implementation. They can choose to import solana-base3-encoding without the bitvec feature.
It does make the implementation slightly tedious though I agree... If you think working directly with bitvec is a more preferable approach, then I can change it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little bit worried that right now the caller uses bitvec and we need to do conversion, maybe we should just switch back to raw bits in the caller as well when we add base 3 encoding there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a valid concern. According to the docs, from_vec and as_raw_slice, which is what we use for the conversion here and here, are in-place functions that does not require any memory allocations, so this should be very cheap. Let's try things out with bitvec and then if it turns out too slow, we can try raw bytes.
| .ok_or(EncodeError::ArithmeticOverflow)? | ||
| .min(num_bits); | ||
|
|
||
| for i in (start_bit..end_bit).rev() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another dumb question: is it maybe easier to loop on bit in the outer loop and pack until you hit a full chunk then push the full chunk to the output array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is possible. I guess the code would just be a little more verbose since it requires work to manage the state for the chunk manually. The iteration through chunks seem more readable and the compiler should optimize the chunk loop pretty well, but I can explore more if the manual implementation is faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can check this in for now and optimize if it's a problem.
* Add solana base3 encoding crate * clippy * use little-endian instead of big-endian * fix big-endian to little-endian in the docs * use `BASE3_SYMBOL_PER_CHUNK` for 80 * add a `max_len` parameter * use u8 to translate the symbols * keep the ternary-bits in ascending order * update the tests for `u8` encoding * make bitvec crate dependency optional * clippy * cargo fmt * cargo sort * add a clarifying comment about extra bytes * add a comment noting that we use 2 bytes to hold the bit length
* Add comments on constants * Improve offset comments * Add bitmask to dictionary * Renamed to field_at_offset
* Add comments on constants * Improve offset comments * Add bitmask to dictionary * Renamed to field_at_offset
* Add comments on constants * Improve offset comments * Add bitmask to dictionary * Renamed to field_at_offset
* Add comments on constants * Improve offset comments * Add bitmask to dictionary * Renamed to field_at_offset
* Add comments on constants * Improve offset comments * Add bitmask to dictionary * Renamed to field_at_offset
* Add comments on constants * Improve offset comments * Add bitmask to dictionary * Renamed to field_at_offset
* Add comments on constants * Improve offset comments * Add bitmask to dictionary * Renamed to field_at_offset
* Add comments on constants * Improve offset comments * Add bitmask to dictionary * Renamed to field_at_offset
* Add comments on constants * Improve offset comments * Add bitmask to dictionary * Renamed to field_at_offset
* Add comments on constants * Improve offset comments * Add bitmask to dictionary * Renamed to field_at_offset
* Add comments on constants * Improve offset comments * Add bitmask to dictionary * Renamed to field_at_offset
* Add derive address helpers * Update lock file * Fix doc links * Add missing dependency * Refactor directory structure (#18) * Use macro rules * Update directory structure * Add account info and pubkey helpers (#21) * Rename unchecked methods (#22) * Add map and filter_map to Ref and RefMut (#27) * Add map and filter_map to Ref and RefMut * Add unit tests * Apply suggestions * Add bit flag for original data length (#28) * Add bit flag * Remove declarative macro * Add `checked_create_program_address` helper (#30) * Add unchecked helper * Fix lint * Add inline * Rename to checked * Cosmetics * Fix sol log params * A close macro implementation for AccountInfo (#42) * Added close and based_close * added docs comments + wrapped up and tested both function * cargo clippy and fmt * added the new close and changed the name for * fixed and tested after comments * Fixed Realloc Macro (#45) * Fixed compiler bitching about realloc * Added a better alterantive to the black_box * Fixed latest comments * deleted some line after the refactor * Update comments * Fix last u64 owner index on close (#55) * Fix clippy warnings (#59) * Improve close unchecked * sdk: Improve comments (#64) * [wip]: Add new scripts * [wip]: Use matric strategy * [wip]: Fix members parsing * [wip]: Add CI env variables * [wip]: Remove nothrow * [wip]: Filter changes * [wip]: Add audit step * [wip]: Add semver checks * [wip]: Refactor publish workflow * [wip]: Refactor * [wip]: Fix commands * Fix formatting * Remove detect changes step * Review comments * Fix lint comments * Expand crate comment * Ignore crate comment tests * Add missing docs * More missing docs * Add missing release component * Pin cargo-release version * Fix merge * Review comments * sdk: Lightweight borrow check (#65) * [wip]: Add new scripts * [wip]: Add CI env variables * [wip]: Remove nothrow * [wip]: Filter changes * [wip]: Add audit step * [wip]: Add semver checks * [wip]: Refactor publish workflow * [wip]: Refactor * [wip]: Fix commands * Fix formatting * Remove detect changes step * Add check methods * Use check variant on close * Fix merge * Address review comments (#78) * [wip]: Address review comments * [wip]: Fix pointer reference * [wip]: Add logger buffer size tests * Remove unused * More logger tests * Rename program to cpi * Remove dynamic allocation * Fixed signed tests * Fix review comments * Fix unsigned test case * Add is_owner_by helper * Account borrow state check (#147) * Improve fallback and docs * Add borrow state check * Add inline * Review comments * Revert doc link merge change * Update doc comments on close account (#173) * Update doc comments * Update sdk/pinocchio/src/account_info.rs Co-authored-by: Jon C <[email protected]> * Update sdk/pinocchio/src/account_info.rs Co-authored-by: Jon C <[email protected]> --------- Co-authored-by: Jon C <[email protected]> * Remove *const cast (#170) * Add cargo miri test to CI (#178) * Add miri step * Fix miri issues * Install miri component * Deprecate AccountInfo::realloc (#174) * Add resize * Deprecate realloc * Simplify program entrypoint (#166) * Fix review comments * Revert offset increment change * Add invoke instruction helper * Typos * Remove new helpers * Remove unused * Address review comments * Tweak inline attributes * Use invoke signed unchecked * Refactor inline * Renamed to with_bounds * Update docs * Revert change * Add constant length check * Simplify accounts deserialization * Invert borrow state logic * Use expr instead * Add missing import * Address review comments * Revert unnecessary repr * Fix rebase * Tweak docs * Fix doc reference * Fix miri errors * More review comments * Simplify realloc logic (#175) * Simplify realloc logic * Address review comments * Fix `assign` unsoundness (#180) * Fix assign unsoundness * Remove unsafe * ci: Add spellcheck step (#164) * Add invoke instruction helper * Typos * Remove new helpers * Remove unused * Address review comments * Tweak inline attributes * Use invoke signed unchecked * Refactor inline * Renamed to with_bounds * Update docs * Revert change * Add constant length check * Add spellcheck step * Tweak action * Fix typos * More fixes * Yet more fixes * Fixes * Add j1 option * More and more fixes * Add missing acronym * Fix merge * Fix spelling * Fix spelling * Clarify the use of constant values (#200) * Add comments on constants * Improve offset comments * Add bitmask to dictionary * Renamed to field_at_offset * Ignore `zero_init` parameter (#203) Ignore zero_init parameter * Add resize_unchecked method to account info (#230) * add resize_unchecked method to account info * Apply suggestion from @febo Co-authored-by: Fernando Otero <[email protected]> * Apply suggestion from @febo Co-authored-by: Fernando Otero <[email protected]> * Apply suggestion from @febo Co-authored-by: Fernando Otero <[email protected]> --------- Co-authored-by: Fernando Otero <[email protected]> * Feat: Add debug/copy derives and enable missing debug/copy lint (#228) * Add debug/copy derives and enable missing debug/copy lint * Update sdk/pinocchio/src/sysvars/rent.rs Co-authored-by: Fernando Otero <[email protected]> * Update sdk/pinocchio/src/entrypoint/mod.rs Co-authored-by: Fernando Otero <[email protected]> * Update sdk/pinocchio/src/instruction.rs Co-authored-by: Fernando Otero <[email protected]> * Update sdk/pinocchio/src/sysvars/clock.rs Co-authored-by: Fernando Otero <[email protected]> * Update sdk/pinocchio/src/sysvars/fees.rs * Update sdk/pinocchio/src/sysvars/fees.rs * Update sdk/pinocchio/src/sysvars/instructions.rs Co-authored-by: Fernando Otero <[email protected]> * Update sdk/pinocchio/src/sysvars/instructions.rs Co-authored-by: Fernando Otero <[email protected]> * Update sdk/pinocchio/src/sysvars/instructions.rs Co-authored-by: Fernando Otero <[email protected]> * Update sdk/pinocchio/src/sysvars/clock.rs * Fix syntax error in Instructions struct derive macro --------- Co-authored-by: Fernando Otero <[email protected]> * Feat: make AccountInfo::data_ptr public (#232) * make data_ptr public * Update sdk/pinocchio/src/account_info.rs Co-authored-by: Fernando Otero <[email protected]> * add some tests for data ptr * Fix spelling --------- Co-authored-by: Fernando Otero <[email protected]> * Feat: Add try_maps on AccountInfo Ref/RefMut (#229) * Add try_maps on AccountInfo Ref/RefMut * update tests and api for try map ref * pinocchio: Move `NON_DUP_MARKER` const (#245) Move NON_DUP_MARKER const * chore: fix typo in comment (#240) Signed-off-by: vetclippy <[email protected]> * pinocchio: Add `pubkey_eq` helper (#248) * Add pubkey_eq helper * Fix typo * Update pubkey comparison * Add proptest * Add unlikely * Replace proptest * pinocchio: Add `AccountInfo` invariant details (#254) Add invariant details * Add account view * Improve account borrows * Add constructor * Add account_ptr method * Add copy feature * Add docs configuration * Rename to address * Use clone * Remove unused * Add missing doc_auto_cfg * Fix lints * Add no_std check * Rename account type --------- Signed-off-by: vetclippy <[email protected]> Co-authored-by: Jean Marchand (Exotic Markets) <[email protected]> Co-authored-by: Leonardo Donatacci <[email protected]> Co-authored-by: Dimitris Apostolou <[email protected]> Co-authored-by: Jon C <[email protected]> Co-authored-by: Sammy Harris <[email protected]> Co-authored-by: vetclippy <[email protected]>
* Add comments on constants * Improve offset comments * Add bitmask to dictionary * Renamed to field_at_offset
This PR introduces the solana-base3-encoding crate, created to efficiently encode Alpenglow vote certificates.
Alpenglow certificates contain aggregate signatures, which don't specify individual signers. For simple certificate types like Finalize or Notarize, a bit vector is sufficient to identify the voters.
However, more complex certificates like NotarizeFallback or Skip can contain two different vote types. Using a separate bit vector for each type is not scalable. As the validator set grows to 4,000, the two bit vectors would exceed 1 KB, making certificates harder to fit in a single network packet.
This crate provides a more compact encoding scheme. The design is tailored to the CertificateMessage, taking two BitVec inputs and combining them into a base-3 representation.
This does seem to make the crate quite specific for alpenglow certificates, so maybe we can either have this crate in the
alpenglow-voterepo or think of ways to make the interface more generic.