aarch64 base64decode: replace TBL+RHADD with CLZ for delta_hash #783

tantei3 · 2025-05-04T07:19:53Z

For regular and URL decode, I observed that while computing res = lo & hi, we currently mapped 0 to error, 1 to space and we still have 7 bits available.
So now the tables are updated to map:
lowercase alphabets to 0x80 and 0x40
uppercase alphabets to 0x20 and 0x10
numbers to 0x8
/ (- in URL) to 0x4
+ (_ in URL) to 0x2

With this scheme and ensuring that the value of lo & hi have only 1 bit set, we don't need to use vrhaddq_u8(vqtbl1q_u8(delta_asso, lo_nibbles0), hi_bits0); to map it to indices and we can simply use CLZ to map these res bits to continuous range 0 to 7 which should save us 1 instruction. Now the offset to add can be placed in accordance with the result of the CLZ output for regular and URL decoding.

For Hybrid decoding unfortunately I couldn't get this scheme to work as it needed 4 bits for + _ - / with each having different offsets and 2 bits for numbers + spaces left only 2 bits for letters which in the current scheme have collision and need at least 3 bits to classify.

Signed-off-by: Shreesh Adiga <[email protected]>

tantei3 added 2 commits May 4, 2025 12:36

aarch64 base64decode: replace TBL+RHADD with CLZ for delta_hash

0e1336b

Signed-off-by: Shreesh Adiga <[email protected]>

update the generation script for LUT

e01e50d

tantei3 marked this pull request as ready for review May 4, 2025 07:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aarch64 base64decode: replace TBL+RHADD with CLZ for delta_hash #783

aarch64 base64decode: replace TBL+RHADD with CLZ for delta_hash #783

tantei3 commented May 4, 2025 •

edited

Loading

aarch64 base64decode: replace TBL+RHADD with CLZ for delta_hash #783

Are you sure you want to change the base?

aarch64 base64decode: replace TBL+RHADD with CLZ for delta_hash #783

Conversation

tantei3 commented May 4, 2025 • edited Loading

tantei3 commented May 4, 2025 •

edited

Loading