aarch64 base64decode: replace TBL+RHADD with CLZ for delta_hash #783
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For regular and URL decode, I observed that while computing
res = lo & hi
, we currently mapped 0 to error, 1 to space and we still have 7 bits available.So now the tables are updated to map:
lowercase alphabets to 0x80 and 0x40
uppercase alphabets to 0x20 and 0x10
numbers to 0x8
/ (- in URL)
to 0x4+ (_ in URL)
to 0x2With this scheme and ensuring that the value of
lo & hi
have only 1 bit set, we don't need to usevrhaddq_u8(vqtbl1q_u8(delta_asso, lo_nibbles0), hi_bits0);
to map it to indices and we can simply useCLZ
to map these res bits to continuous range 0 to 7 which should save us 1 instruction. Now the offset to add can be placed in accordance with the result of the CLZ output for regular and URL decoding.For Hybrid decoding unfortunately I couldn't get this scheme to work as it needed 4 bits for + _ - / with each having different offsets and 2 bits for numbers + spaces left only 2 bits for letters which in the current scheme have collision and need at least 3 bits to classify.