Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Two improvements: thread creation in constructor + AVX2 load alignment issue #26

@jano-wol

Description

@jano-wol

Hi Rapfi developers,

First of all, congratulations on the amazing project, it’s really impressive work!

While trying to run
pbrain-rapfi BENCH
in debug mode, I ran into two issues that I believe could be improved:

  1. Thread creation inside the SearchThread constructor
    The constructor of SearchThread currently starts the thread. This causes problems in debug mode (e.g. in gdb) because the thread starts while the object is still being constructed. I fixed this locally by separating the constructor from an init() method that starts the thread. I’ve also opened a PR for this fix.

  2. Crash due to misaligned AVX2 load
    This line crashes in debug mode (even first time with i == 0):
    auto outputW = F32LS::load(bucket.policy_output_weight + i * PWConvB::RegWidth);
    On AVX2 machines, this calls:
    return simde_mm256_load_ps(addr);
    which requires the memory to be 32-byte aligned. However, bucket.policy_output_weight is only aligned to 16 mod 32, which you can verify by:

std::uintptr_t ptr = reinterpret_cast<std::uintptr_t>(bucket.policy_output_weight);
std::cout << "policy_output_weight alignment: " << (ptr % 32) << " bytes\n";

This happens because the previous member of struct HeadBucket, value_l3 an FCWeight<4, 64>, has a size of 272 bytes, which is 16 mod 32, so the following field isn’t properly aligned. All earlier members of HeadBucket have sizes that are 0 mod 32, so this misalignment starts here. In release mode, the code probably falls back to simde_mm256_loadu_ps, so there’s no crash, but this could still cause performance degradation. I am not sure how easy to fix this problem.

After applying some workarounds to these two problems debug mode worked fine on my side!

System and configuration:

  • OS: Ubuntu (AVX2-capable machine)
  • Compilers: g++ and clang++ (both up-to-date, no behavioral differences)
  • Evaluator: Default config.toml from the Network folder (mix9svq)
  • Debug environment: VSCode with gdb-based debugging

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions