Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@samkim-crypto
Copy link
Contributor

@samkim-crypto samkim-crypto commented Jun 18, 2025

This PR introduces the solana-base3-encoding crate, created to efficiently encode Alpenglow vote certificates.

Alpenglow certificates contain aggregate signatures, which don't specify individual signers. For simple certificate types like Finalize or Notarize, a bit vector is sufficient to identify the voters.

However, more complex certificates like NotarizeFallback or Skip can contain two different vote types. Using a separate bit vector for each type is not scalable. As the validator set grows to 4,000, the two bit vectors would exceed 1 KB, making certificates harder to fit in a single network packet.

This crate provides a more compact encoding scheme. The design is tailored to the CertificateMessage, taking two BitVec inputs and combining them into a base-3 representation.

This does seem to make the crate quite specific for alpenglow certificates, so maybe we can either have this crate in the alpenglow-vote repo or think of ways to make the interface more generic.

@samkim-crypto samkim-crypto requested a review from wen-coding June 19, 2025 09:32
@samkim-crypto
Copy link
Contributor Author

@0x0ece It would be great to get your feedback on this!

@samkim-crypto samkim-crypto marked this pull request as ready for review June 19, 2025 09:33
@samkim-crypto
Copy link
Contributor Author

Had a private discussion with @0x0ece. As per discussion, I updated the encoding/decoding to use little-endian instead of big-endian.

Currently I use u128 to encode a block of symbols. @0x0ece pointed out that u128 might not be the fastest in terms of performance. So I created https://github.com/samkim-crypto/base3-encoding-tests and compared the three implementations using u8, u64, and u128. Indeed it seems encoding using u64 is faster than using u128.

In terms of the length of the encoded bytes for length 4096 bit-vectors, we have

u8 -> 822 bytes
u64 -> 826 bytes
u128 -> 834 bytes

Using u8 gives the more compact encoding, but with just 4 additional bytes, an implementation using u64 seems to be the way to go, but I am interested to hear people's thoughts.

@wen-coding
Copy link
Contributor

Yeah, I think we can spare 4 bytes for better performance.

@0x0ece
Copy link
Contributor

0x0ece commented Jun 23, 2025

Two minor things (again, sorry...)

  1. We could keep the ternary-bits in ascending order (same order as the original bits) - this also seems slightly faster (in all impl)
    0x0ece/base3-encoding-tests@5caeafc

  2. The reason I was proposing u8 was not just to save 4 bytes, but mostly for consistency of representation. Think for example to 5 ternary-bits: (0, 2, 1, 0, 0) -> 3+9 -> 0x0C. If we use the u8 implementation, the same sequence repeated would always translate into 0x0C, e.g. (0, 2, 1, 0, 0, 0, 2, 1, 0, 0, 0, 2, 1, 0, 0, 0, 2, 1, 0, 0) -> 0x0C0C0C0C. If we use the other representations the numbering will be more confusing to read. Imo the perf speed up of u64 is negligible (considering we're doing BLS when we do this), so I'd rather keep it simple with u8. (I'm sure there's ways to use avx and speed it up, but not sure it's worth it)

}

const BASE3_SYMBOL_PER_CHUNK: usize = 80;
const ENCODED_BYTES_PER_CHUNK: usize = 16; // std::mem::size_of::<u128>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we decided to use u64?

/// 3. Iterates through the input vectors in chunks of 80 bits.
/// 4. For each chunk, it converts the 80 bit-pairs into 80 ternary digits and
/// constructs a `u128` number from them.
/// 5. This `u128` number is appended to the output buffer as 16 big-endian bytes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we use little-endian now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed!

result.extend_from_slice(&(bit_vec_base.len() as u16).to_le_bytes());

// Process the bits of `bit_vec_base` and `bit_vec_fallback` in chunks of 80 bits
for (base_chunk, fallback_chunk) in bit_vec_base.chunks(80).zip(bit_vec_fallback.chunks(80)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to use BASE3_SYMBOL_PER_CHUNK instead of 80 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

return Err(DecodeError::CorruptDataPayload);
}

let mut bit_vec_base = BitVec::with_capacity(total_bits);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow passing in the maximum expected bit vector length (probably 4096) so we can return CorruptDataPayload early?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point! I added a max_len parameter to the decode function, which we can set to be 4096 for alpenglow certificates.

@samkim-crypto
Copy link
Contributor Author

The reason I was proposing u8 was not just to save 4 bytes, but mostly for consistency of representation. Think for example to 5 ternary-bits: (0, 2, 1, 0, 0) -> 3+9 -> 0x0C. If we use the u8 implementation, the same sequence repeated would always translate into 0x0C, e.g. (0, 2, 1, 0, 0, 0, 2, 1, 0, 0, 0, 2, 1, 0, 0, 0, 2, 1, 0, 0) -> 0x0C0C0C0C. If we use the other representations the numbering will be more confusing to read. Imo the perf speed up of u64 is negligible (considering we're doing BLS when we do this), so I'd rather keep it simple with u8. (I'm sure there's ways to use avx and speed it up, but not sure it's worth it)

Yeah that is actually a nice point. Seems like u8 will give a more readable implementation. If we use u64 or u128, then the symbol will depend on the position within the 40 or 80 symbol chunk. I am happy to go with the simpler u8 route.

@wen-coding, are you okay with using u8? If so, I can make the change and push. I think we can always optimize it further or experiment with u64 again before the full alpenglow launch if it turns out to be too slow.

@wen-coding
Copy link
Contributor

wen-coding commented Jun 24, 2025 via email

@samkim-crypto
Copy link
Contributor Author

Okay, I addressed all the comments from the review. I also ended up making the encoding and decoding scheme to work with byte arrays and made the bitvec create optional to make the crate more general and easier to maintain on future bitvec version updated.

@samkim-crypto samkim-crypto requested a review from joncinque June 24, 2025 21:41
.checked_div(8)
.ok_or(EncodeError::ArithmeticOverflow)?;

if base_bytes.len() < required_bytes || fallback_bytes.len() < required_bytes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay if base_bytes.len() and fallback_bytes.len() > required_bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the way the function signature is defined is that the num_bits represent the exact number of bits that should be encoded from base_bytes and fallback_bytes. If base_bytes.len() or fallback_bytes.len() is greater than the required bytes, the extra bits will just be ignored. I'll write a comment to clarify that.

.ok_or(EncodeError::ArithmeticOverflow)?;

let capacity = total_byte_length
.checked_add(2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a note here saying we use 2 bytes to hold the byte length?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

let byte_idx = i / 8;
let bit_idx = i % 8;

let base_bit = (base_bytes[byte_idx] >> bit_idx) & 1 == 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I have a dumb question, why did we switch to raw bytes instead of using bitvec as input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I thought bitvec is quite specific, so I made the bitvec crate an optional dependency. In theory, projects that do not use the bitvec crate, but still wants to use the solana-base3-encoding crate can choose to use it without adding a dependency to it.

It is a little hard to imagine a project needing to use this very specific logic, but in the future, maybe there can be another rust based Solana client that wants to implement alpenglow, but does not want to use the bitvec crate for their bit-vector implementation. They can choose to import solana-base3-encoding without the bitvec feature.

It does make the implementation slightly tedious though I agree... If you think working directly with bitvec is a more preferable approach, then I can change it back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit worried that right now the caller uses bitvec and we need to do conversion, maybe we should just switch back to raw bits in the caller as well when we add base 3 encoding there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a valid concern. According to the docs, from_vec and as_raw_slice, which is what we use for the conversion here and here, are in-place functions that does not require any memory allocations, so this should be very cheap. Let's try things out with bitvec and then if it turns out too slow, we can try raw bytes.

.ok_or(EncodeError::ArithmeticOverflow)?
.min(num_bits);

for i in (start_bit..end_bit).rev() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another dumb question: is it maybe easier to loop on bit in the outer loop and pack until you hit a full chunk then push the full chunk to the output array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is possible. I guess the code would just be a little more verbose since it requires work to manage the state for the chunk manually. The iteration through chunks seem more readable and the compiler should optimize the chunk loop pretty well, but I can explore more if the manual implementation is faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can check this in for now and optimize if it's a problem.

@samkim-crypto samkim-crypto merged commit 50dfbd0 into anza-xyz:master Jul 1, 2025
24 checks passed
febo pushed a commit to febo/solana-sdk that referenced this pull request Sep 21, 2025
* Add solana base3 encoding crate

* clippy

* use little-endian instead of big-endian

* fix big-endian to little-endian in the docs

* use `BASE3_SYMBOL_PER_CHUNK` for 80

* add a `max_len` parameter

* use u8 to translate the symbols

* keep the ternary-bits in ascending order

* update the tests for `u8` encoding

* make bitvec crate dependency optional

* clippy

* cargo fmt

* cargo sort

* add a clarifying comment about extra bytes

* add a comment noting that we use 2 bytes to hold the bit length
febo added a commit to febo/solana-sdk that referenced this pull request Sep 21, 2025
* Add comments on constants

* Improve offset comments

* Add bitmask to dictionary

* Renamed to field_at_offset
febo added a commit to febo/solana-sdk that referenced this pull request Sep 22, 2025
* Add comments on constants

* Improve offset comments

* Add bitmask to dictionary

* Renamed to field_at_offset
febo added a commit to febo/solana-sdk that referenced this pull request Sep 22, 2025
* Add comments on constants

* Improve offset comments

* Add bitmask to dictionary

* Renamed to field_at_offset
febo added a commit to febo/solana-sdk that referenced this pull request Sep 24, 2025
* Add comments on constants

* Improve offset comments

* Add bitmask to dictionary

* Renamed to field_at_offset
febo added a commit to febo/solana-sdk that referenced this pull request Sep 25, 2025
* Add comments on constants

* Improve offset comments

* Add bitmask to dictionary

* Renamed to field_at_offset
febo added a commit to febo/solana-sdk that referenced this pull request Oct 2, 2025
* Add comments on constants

* Improve offset comments

* Add bitmask to dictionary

* Renamed to field_at_offset
febo added a commit to febo/solana-sdk that referenced this pull request Oct 2, 2025
* Add comments on constants

* Improve offset comments

* Add bitmask to dictionary

* Renamed to field_at_offset
febo added a commit to febo/solana-sdk that referenced this pull request Oct 3, 2025
* Add comments on constants

* Improve offset comments

* Add bitmask to dictionary

* Renamed to field_at_offset
febo added a commit to febo/solana-sdk that referenced this pull request Oct 16, 2025
* Add comments on constants

* Improve offset comments

* Add bitmask to dictionary

* Renamed to field_at_offset
febo added a commit to febo/solana-sdk that referenced this pull request Oct 17, 2025
* Add comments on constants

* Improve offset comments

* Add bitmask to dictionary

* Renamed to field_at_offset
febo added a commit to febo/solana-sdk that referenced this pull request Oct 18, 2025
* Add comments on constants

* Improve offset comments

* Add bitmask to dictionary

* Renamed to field_at_offset
febo added a commit that referenced this pull request Oct 23, 2025
* Add derive address helpers

* Update lock file

* Fix doc links

* Add missing dependency

* Refactor directory structure (#18)

* Use macro rules

* Update directory structure

* Add account info and pubkey helpers (#21)

* Rename unchecked methods (#22)

* Add map and filter_map to Ref and RefMut (#27)

* Add map and filter_map to Ref and RefMut

* Add unit tests

* Apply suggestions

* Add bit flag for original data length (#28)

* Add bit flag

* Remove declarative macro

* Add `checked_create_program_address` helper (#30)

* Add unchecked helper

* Fix lint

* Add inline

* Rename to checked

* Cosmetics

* Fix sol log params

* A close macro implementation for AccountInfo (#42)

* Added close and based_close

* added docs comments + wrapped up and tested both function

* cargo clippy and fmt

* added the new close and changed the name for

* fixed and tested after comments

* Fixed Realloc Macro (#45)

* Fixed compiler bitching about realloc

* Added a better alterantive to the black_box

* Fixed latest comments

* deleted some line after the refactor

* Update comments

* Fix last u64 owner index on close (#55)

* Fix clippy warnings (#59)

* Improve close unchecked

* sdk: Improve comments (#64)

* [wip]: Add new scripts

* [wip]: Use matric strategy

* [wip]: Fix members parsing

* [wip]: Add CI env variables

* [wip]: Remove nothrow

* [wip]: Filter changes

* [wip]: Add audit step

* [wip]: Add semver checks

* [wip]: Refactor publish workflow

* [wip]: Refactor

* [wip]: Fix commands

* Fix formatting

* Remove detect changes step

* Review comments

* Fix lint comments

* Expand crate comment

* Ignore crate comment tests

* Add missing docs

* More missing docs

* Add missing release component

* Pin cargo-release version

* Fix merge

* Review comments

* sdk: Lightweight borrow check (#65)

* [wip]: Add new scripts

* [wip]: Add CI env variables

* [wip]: Remove nothrow

* [wip]: Filter changes

* [wip]: Add audit step

* [wip]: Add semver checks

* [wip]: Refactor publish workflow

* [wip]: Refactor

* [wip]: Fix commands

* Fix formatting

* Remove detect changes step

* Add check methods

* Use check variant on close

* Fix merge

* Address review comments (#78)

* [wip]: Address review comments

* [wip]: Fix pointer reference

* [wip]: Add logger buffer size tests

* Remove unused

* More logger tests

* Rename program to cpi

* Remove dynamic allocation

* Fixed signed tests

* Fix review comments

* Fix unsigned test case

* Add is_owner_by helper

* Account borrow state check (#147)

* Improve fallback and docs

* Add borrow state check

* Add inline

* Review comments

* Revert doc link merge change

* Update doc comments on close account (#173)

* Update doc comments

* Update sdk/pinocchio/src/account_info.rs

Co-authored-by: Jon C <[email protected]>

* Update sdk/pinocchio/src/account_info.rs

Co-authored-by: Jon C <[email protected]>

---------

Co-authored-by: Jon C <[email protected]>

* Remove *const cast (#170)

* Add cargo miri test to CI (#178)

* Add miri step

* Fix miri issues

* Install miri component

* Deprecate AccountInfo::realloc (#174)

* Add resize

* Deprecate realloc

* Simplify program entrypoint (#166)

* Fix review comments

* Revert offset increment change

* Add invoke instruction helper

* Typos

* Remove new helpers

* Remove unused

* Address review comments

* Tweak inline attributes

* Use invoke signed unchecked

* Refactor inline

* Renamed to with_bounds

* Update docs

* Revert change

* Add constant length check

* Simplify accounts deserialization

* Invert borrow state logic

* Use expr instead

* Add missing import

* Address review comments

* Revert unnecessary repr

* Fix rebase

* Tweak docs

* Fix doc reference

* Fix miri errors

* More review comments

* Simplify realloc logic (#175)

* Simplify realloc logic

* Address review comments

* Fix `assign` unsoundness (#180)

* Fix assign unsoundness

* Remove unsafe

* ci: Add spellcheck step (#164)

* Add invoke instruction helper

* Typos

* Remove new helpers

* Remove unused

* Address review comments

* Tweak inline attributes

* Use invoke signed unchecked

* Refactor inline

* Renamed to with_bounds

* Update docs

* Revert change

* Add constant length check

* Add spellcheck step

* Tweak action

* Fix typos

* More fixes

* Yet more fixes

* Fixes

* Add j1 option

* More and more fixes

* Add missing acronym

* Fix merge

* Fix spelling

* Fix spelling

* Clarify the use of constant values (#200)

* Add comments on constants

* Improve offset comments

* Add bitmask to dictionary

* Renamed to field_at_offset

* Ignore `zero_init` parameter (#203)

Ignore zero_init parameter

* Add resize_unchecked method to account info (#230)

* add resize_unchecked method to account info

* Apply suggestion from @febo

Co-authored-by: Fernando Otero <[email protected]>

* Apply suggestion from @febo

Co-authored-by: Fernando Otero <[email protected]>

* Apply suggestion from @febo

Co-authored-by: Fernando Otero <[email protected]>

---------

Co-authored-by: Fernando Otero <[email protected]>

* Feat: Add debug/copy derives and enable missing debug/copy lint (#228)

* Add debug/copy derives and enable missing debug/copy lint

* Update sdk/pinocchio/src/sysvars/rent.rs

Co-authored-by: Fernando Otero <[email protected]>

* Update sdk/pinocchio/src/entrypoint/mod.rs

Co-authored-by: Fernando Otero <[email protected]>

* Update sdk/pinocchio/src/instruction.rs

Co-authored-by: Fernando Otero <[email protected]>

* Update sdk/pinocchio/src/sysvars/clock.rs

Co-authored-by: Fernando Otero <[email protected]>

* Update sdk/pinocchio/src/sysvars/fees.rs

* Update sdk/pinocchio/src/sysvars/fees.rs

* Update sdk/pinocchio/src/sysvars/instructions.rs

Co-authored-by: Fernando Otero <[email protected]>

* Update sdk/pinocchio/src/sysvars/instructions.rs

Co-authored-by: Fernando Otero <[email protected]>

* Update sdk/pinocchio/src/sysvars/instructions.rs

Co-authored-by: Fernando Otero <[email protected]>

* Update sdk/pinocchio/src/sysvars/clock.rs

* Fix syntax error in Instructions struct derive macro

---------

Co-authored-by: Fernando Otero <[email protected]>

* Feat: make AccountInfo::data_ptr public (#232)

* make data_ptr public

* Update sdk/pinocchio/src/account_info.rs

Co-authored-by: Fernando Otero <[email protected]>

* add some tests for data ptr

* Fix spelling

---------

Co-authored-by: Fernando Otero <[email protected]>

* Feat: Add try_maps on AccountInfo Ref/RefMut (#229)

* Add try_maps on AccountInfo Ref/RefMut

* update tests and api for try map ref

* pinocchio: Move `NON_DUP_MARKER` const (#245)

Move NON_DUP_MARKER const

* chore: fix typo in comment (#240)

Signed-off-by: vetclippy <[email protected]>

* pinocchio: Add `pubkey_eq` helper (#248)

* Add pubkey_eq helper

* Fix typo

* Update pubkey comparison

* Add proptest

* Add unlikely

* Replace proptest

* pinocchio: Add `AccountInfo` invariant details (#254)

Add invariant details

* Add account view

* Improve account borrows

* Add constructor

* Add account_ptr method

* Add copy feature

* Add docs configuration

* Rename to address

* Use clone

* Remove unused

* Add missing doc_auto_cfg

* Fix lints

* Add no_std check

* Rename account type

---------

Signed-off-by: vetclippy <[email protected]>
Co-authored-by: Jean Marchand (Exotic Markets) <[email protected]>
Co-authored-by: Leonardo Donatacci <[email protected]>
Co-authored-by: Dimitris Apostolou <[email protected]>
Co-authored-by: Jon C <[email protected]>
Co-authored-by: Sammy Harris <[email protected]>
Co-authored-by: vetclippy <[email protected]>
febo added a commit to febo/solana-sdk that referenced this pull request Oct 23, 2025
* Add comments on constants

* Improve offset comments

* Add bitmask to dictionary

* Renamed to field_at_offset
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants