Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@sz3
Copy link
Owner

@sz3 sz3 commented Aug 10, 2025

When we added the split symbol/color decode and the motivating color correction logic, that included a potential issue where said logic might get confused by the very specific case of the final "normal" block of the fountain encode.

As a refresher: wirehair encodes the initial file verbatim as its initial N blocks -- only after exhausting these initial file bytes does it begin its (almost magical) GF256-inspired permutations. Each block up to the final of these initial N will be exactly the expected block size, and each of the successive ones will also match exactly. But the one at the tail end, provided the block size and file size do not divide perfectly (which will be the normal case), will be smaller. We want all the blocks to exactly match, so the encoder skips that one. (example: block size 22, file size 100. After the 0th through 3rd blocks, we'll have 12 bytes remaining, meaning block_id=4 will be smaller than expected. So the next block we'll send will be block_id=5)

The color correction logic exploits the presence of a deterministic header -- which contains the block_id as its trailing 2 bytes. We can do one of the following:

  • use the first 4 bytes of the header, ignoring the block_id, because it changes. (This is what I did in my initial python implementation)
  • use all 6 bytes, risking disaster(?) -- a failed color decode -- in the rare case where the color correction algorithm becomes confused by having the wrong block_id. (I expected it could fail, but had never seen it...)
  • or ... use all 6 bytes, and handle the special case where we skip one of the block_ids.

We're now doing option (3).

Follow up to #134, #135, and #91.

I'm not sure how frequent this problem was in the wild, but it would've manifested only for very small files.

When we added the split symbol/color decode and the motivating color
correction logic, that included a potential issue where said logic might
get confused by the very specific case of the final "normal" block of
the fountain encode.

As a refresher: wirehair encodes the initial file verbatim as its
initial N blocks -- only after exhausting these initial file bytes does
it begin its (almost magical) GF256-inspired permutations. Each block up
to the final of these initial N will be exactly the expected block size,
and each of the successive ones will also match exactly. But the one at
the tail end, provided the block size and file size do not divide
perfectly (which will be the normal case), will be smaller. We want all
the blocks to exactly match, so the encoder skips that one. (example:
block size 22, file size 100. After the 0th through 3rd blocks, we'll
have 12 bytes remaining, meaning block_id=4 will be smaller than
expected. So the next block we'll send will be block_id=5)

The color correction logic exploits the presence of a deterministic
header -- which contains the block_id as its trailing 2 bytes. We can do
one of the following:
* use the first 4 bytes of the header, ignoring the block_id, because it
changes. (This is what I did in my initial python implementation)
* use all 6 bytes, risking disaster(?) -- a failed color decode -- in
the rare case where the color correction algorithm becomes confused by
having the wrong block_id. (I expected it could fail, but had never seen
it...)
* or ... use all 6 bytes, and handle the special case where we skip one
of the block_ids.
unsigned computeRadioactiveBlockId(const FountainMetadata& md, unsigned chunk_size)
{
if (md.file_size() % chunk_size == 0)
return 0xFFFFFFFF; // there isn't one
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there isn't a missized-block, this serves as a dummy value we'll never hit. (block_id is a 16 bit field)

update_block_id_internal(block_id()+1, _data.data()+4);
unsigned next = block_id()+1;
if (next == radioactive_block_id)
next += 1;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gist is: there might be a block_id we never use (due to its size not matching). If we have such a block_id, we pass it in and increment past it.

@sz3 sz3 merged commit befb3f2 into master Aug 10, 2025
11 checks passed
@sz3 sz3 deleted the fix-color-correction-magic branch August 10, 2025 00:24
@sz3 sz3 mentioned this pull request Aug 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant