Thanks to visit codestin.com
Credit goes to github.com

Skip to content

fix stop_before_partial inconsistency #778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 3, 2025
Merged

Conversation

lemire
Copy link
Member

@lemire lemire commented May 1, 2025

@pauldreik identified an issue with our implementation of stop_before_partial. This PR should fix it. Tests added.

@lemire lemire requested a review from pauldreik May 1, 2025 20:54
@lemire lemire mentioned this pull request May 1, 2025
@lemire lemire marked this pull request as draft May 2, 2025 00:37
@lemire lemire marked this pull request as ready for review May 2, 2025 02:45
@lemire lemire added the priority label May 2, 2025
@pauldreik
Copy link
Collaborator

I ran the fuzzer and found another case. I modified the block size to be 4096 temporarily and managed to get a reproducing test case (committed).
Since we don't run the atomic base64 fuzzer in oss-fuzz (C++20 support being insufficient) I added a CI job, it should be useful at least for now (if that should be merged or not can be debated). It does however use the 128 block size so the reproducer test case it spits out in case of error will not reproduce unless manually changing the block size.

@pauldreik
Copy link
Collaborator

I extended the base64 fuzzer to compare the safe version with the normal version, but I am not sure this is doing the right thing. It hits the abort() with "output differed" pretty easily:

diff --git a/fuzz/base64.cpp b/fuzz/base64.cpp
index 7fe6ded5..964bbea6 100644
--- a/fuzz/base64.cpp
+++ b/fuzz/base64.cpp
@@ -98,6 +98,21 @@ void decode_safe(std::span<const FromChar> base64_, const auto selected_option,
   } break;
   case simdutf::error_code::SUCCESS: {
     // possibility to compare with the normal function
+    std::vector<char> ref_output(output.size() + 10, '\0');
+    const auto ref_result = simdutf::base64_to_binary(
+        base64.data(), std::min(base64.size(), convertresult.count),
+        ref_output.data(), selected_option, last_chunk_option);
+    if (ref_result.error != simdutf::error_code::SUCCESS) {
+      std::cerr << "result code differed, got " << ref_result.error << '\n';
+      std::abort();
+    }
+    // strip away excess
+    ref_output.resize(ref_result.count);
+    output.resize(outlen);
+    if (output != ref_output) {
+      std::cerr << "output differed\n";
+      std::abort();
+    }
   } break;
   default:;
   }

@lemire
Copy link
Member Author

lemire commented May 3, 2025

@pauldreik

I extended the base64 fuzzer to compare the safe version with the normal version, but I am not sure this is doing the right thing. It hits the abort() with "output differed" pretty easily:

I am not sure that this code is correct.

But the other issue you have found was indeed a bug and I have fixed it in a later commit. We should now go green.

@lemire
Copy link
Member Author

lemire commented May 3, 2025

I have also added additional tests.

@pauldreik
Copy link
Collaborator

all tests but the newly added atomic fuzz job went through. I ran the fuzzer locally (with block size 4096) and got another case, which I pushed as a test case.

@lemire
Copy link
Member Author

lemire commented May 3, 2025

I am going to merge this and start a patch release candidate because we have users waiting for this patch.

(If we miss something we can go with an another patch.)

@lemire lemire merged commit 3513ad7 into master May 3, 2025
76 of 77 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants