-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Hi Rapfi developers,
First of all, congratulations on the amazing project, it’s really impressive work!
While trying to run
pbrain-rapfi BENCH
in debug mode, I ran into two issues that I believe could be improved:
-
Thread creation inside the SearchThread constructor
The constructor of SearchThread currently starts the thread. This causes problems in debug mode (e.g. in gdb) because the thread starts while the object is still being constructed. I fixed this locally by separating the constructor from an init() method that starts the thread. I’ve also opened a PR for this fix. -
Crash due to misaligned AVX2 load
This line crashes in debug mode (even first time with i == 0):
auto outputW = F32LS::load(bucket.policy_output_weight + i * PWConvB::RegWidth);
On AVX2 machines, this calls:
return simde_mm256_load_ps(addr);
which requires the memory to be 32-byte aligned. However, bucket.policy_output_weight is only aligned to 16 mod 32, which you can verify by:
std::uintptr_t ptr = reinterpret_cast<std::uintptr_t>(bucket.policy_output_weight);
std::cout << "policy_output_weight alignment: " << (ptr % 32) << " bytes\n";
This happens because the previous member of struct HeadBucket, value_l3 an FCWeight<4, 64>, has a size of 272 bytes, which is 16 mod 32, so the following field isn’t properly aligned. All earlier members of HeadBucket have sizes that are 0 mod 32, so this misalignment starts here. In release mode, the code probably falls back to simde_mm256_loadu_ps, so there’s no crash, but this could still cause performance degradation. I am not sure how easy to fix this problem.
After applying some workarounds to these two problems debug mode worked fine on my side!
System and configuration:
- OS: Ubuntu (AVX2-capable machine)
- Compilers: g++ and clang++ (both up-to-date, no behavioral differences)
- Evaluator: Default config.toml from the Network folder (mix9svq)
- Debug environment: VSCode with gdb-based debugging