Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Missed optimization: unreachable branch could be pruned after taking into account the possible values of a constexpr lookup table #125003

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
inicula opened this issue Jan 30, 2025 · 0 comments · May be fixed by #127695

Comments

@inicula
Copy link

inicula commented Jan 30, 2025

Context

Consider the following code:

#include <array>
#include <cstddef>

constexpr size_t MAX_DATA_INDEX = 1024;

constexpr std::array<size_t, 256> GET_DATA_INDEX_V1 = {};

bool foo(unsigned *data, unsigned char first_idx)
{
    size_t second_idx = GET_DATA_INDEX_V1[first_idx];
    if (second_idx >= MAX_DATA_INDEX)
        return false;

    data[second_idx] = true;
    return true;
}

constexpr std::array<size_t, 256> GET_DATA_INDEX_V2 = {0, 1, 2, 0, 0, /* ... */};

bool bar(unsigned *data, unsigned char first_idx)
{
    size_t second_idx = GET_DATA_INDEX_V2[first_idx];
    if (second_idx >= MAX_DATA_INDEX)
        return false;

    data[second_idx] = true;
    return true;
}

And the assembly that gets generated with clang 19.1.0 (and trunk), with -O3 enabled:

foo(unsigned int*, unsigned char):
        mov     dword ptr [rdi], 1
        mov     al, 1
        ret

bar(unsigned int*, unsigned char):
        mov     eax, esi
        lea     rcx, [rip + GET_DATA_INDEX_V2]
        mov     rax, qword ptr [rcx + 8*rax]
        cmp     rax, 1023
        ja      .LBB1_2
        mov     dword ptr [rdi + 4*rax], 1
.LBB1_2:
        cmp     rax, 1024
        setb    al
        ret

GET_DATA_INDEX_V2:
        .quad   0
        .quad   1
        .quad   2
        .zero   2024

Discussion

When the lookup table is filled with a single value (e.g. zero), like in foo(), clang is able to get rid of the second_idx >= MAX_DATA_INDEX branch, because it sees at compile time that the only possible value that second_idx can have is zero, which is less than MAX_DATA_INDEX.

But in bar(), where the lookup table is filled with other values besides 0, clang doesn't see that all possible values of second_idx satisfy second_idx < MAX_DATA_INDEX, even though this is also derivable at compile time. So the second_idx >= MAX_DATA_INDEX branch is also redundant in bar(), and ideally that branch would be deleted.

How hard would it be to add an optimization to clang/LLVM such that the second_idx >= MAX_DATA_INDEX branch in bar() gets pruned?

You could imagine some code that has a bunch of transformation layers through which it passes an initial index (first_idx -> second_idx -> third_idx -> ... -> last_idx). And when moving from one layer to the next, it does bounds checking to verify that the current index is less than MAX_DATA_INDEX. In this case, it seems like this kind of optimization could get rid of a pretty significant amount of effectively dead code. If this were to work with more complex predicates (and not just bounds checking), then the utility would be even greater.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants