Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix linker failure when building opcache statically #18939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

arnaud-lb
Copy link
Member

@arnaud-lb arnaud-lb commented Jun 25, 2025

Spun of #18660

RFC: https://wiki.php.net/rfc/make_opcache_required

We use linker relocations to fetch the TLS index and offset of _tsrm_ls_cache. When building Opcache statically, linkers may attempt to optimize that into a more efficient code sequence (relaxing from "General Dynamic" to "Local Exec" model [1]). Unfortunately, linkers will fail, rather than ignore our relocations, when they don't recognize the exact code sequence they are expecting.

This results in errors as reported by #15074:

TLS transition from R_X86_64_TLSGD to R_X86_64_GOTTPOFF against
`_tsrm_ls_cache' at 0x12fc3 in section `.text' failed"

Here I take a different approach:

  • Emit the exact full code sequence expected by linkers
  • Extract the TLS index/offset by inspecting the linked ASM code, rather than executing it (execution would give us the thread-local address).
  • We detect when the code was relaxed, in which case we can extract the TCB offset instead.
  • This is done in a conservative way so that if the linker did something we didn't expect, we fallback to a safer (but slower) mechanism.

One benefit of that is we are now able to use the Local Exec model in more cases, in JIT'ed code. This makes non-glibc builds faster in these cases.

This is tested on Linux (glibc, musl), FreeBSD, MacOS, Windows; lld, gold, bdf; clang; gcc; VS; x86, x86_64, aarch64 (not MacOS/Apple Silicon, as JIT+ZTS is not supported on this combo yet, and not Windows/aarch64 for the same reason), with various combinations of static/shared/dl(). The PR includes these tests. Other OSes fallback to the slower mechanism.

[1] https://www.akkadia.org/drepper/tls.pdf

This fixes the following linker error:

    TLS transition from R_X86_64_TLSGD to R_X86_64_GOTTPOFF against
    `_tsrm_ls_cache' at 0x12fc3 in section `.text' failed"

The error arises from how we obtain information about the _tsrm_ls_cache TLS
variable for use in JIT'ed code:

Normally, TLS variables are resolved via linker relocations [1], which of course
can not be used in JIT'ed code. Therefore we emit the relocation in AOT code and
use the result in JIT.

Specifically we use a fragment of the "General Dynamic" code sequence described
in [1]. Using the full code sequence would give us the address of the variable
in the current thread. Therefore we only use a fragment that gives us the
variable's TLS index and offset.

When Opcache is statically linked into the binary, linkers attempt to relax
(rewrite) this code sequence into a more efficient one. However, this fails
because they will not recognize the code sequence.

We now take a different approach:

 * Emit the exact full code sequence expected by linkers
 * Extract the TLS index/offset or TCB offset by inspecting the ASM code, rather
   than executing it (execution would give us the thread-local address).
 * This is done in a conservative way so that if the linker did
   something we didn't expect, we fallback to a safer (but slower) mechanism.

[1] https://www.akkadia.org/drepper/tls.pdf
Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot comment meaningfully on the code itself, but it certainly looks more maintainable now and I like that it's independently tested in CI.

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the small typo I have no further remarks. I leave the actual review to someone who knows how this is supposed to work.

@arnaud-lb arnaud-lb requested a review from nielsdos June 25, 2025 11:44
@arnaud-lb arnaud-lb marked this pull request as ready for review June 25, 2025 11:45
@arnaud-lb arnaud-lb requested a review from dstogov as a code owner June 25, 2025 11:45
@nielsdos
Copy link
Member

Good grief.
I will check this over the weekend. Honestly I hope there's another way. This is quite complex.

@arnaud-lb
Copy link
Member Author

Thank you @nielsdos. It's not incredibly complex when considering that we compare the bytecode verbatim (ignoring the imm operands). There is no full blown bytecode decoding or interpreting.

In intel bytecode imms are encoded on whole bytes, so we compare the rest. In arm we mask the imms out of the fixed-size 32bit instructions before comparison.

In zend_jit_resolve_tsrm_ls_cache_offsets() there are just two possibilities:

  1. The bytecode is exactly what we emitted.
  2. The bytecode changed, and is exactly what we expect the linker may have emitted instead.

In both cases we extract imm values at fixed positions in the bytecode. The rest is what we did before: In the first case the imm values give us the address of the TLS descriptor, and in the second case the TCB offset.

@nielsdos
Copy link
Member

Yes I understood the idea on a high level, but considering it deals with different platforms and instruction encodings, I would not call it simple. I wish there was a different solution

@arnaud-lb
Copy link
Member Author

arnaud-lb commented Jun 26, 2025

I wish too. Unfortunately I didn't find a good alternative.

Here is what I've checked:

  • Disabling linker relaxation would have solved the issue, but it's not possible. Relaxation is just part of relocation processing and can not be disabled, in linkers I've checked.
  • Forcing _tsrm_ls_cache to use the global-dynamic model or using a JIT-specific TSRM cache variable with a forced model might work, but using this model in JIT is slower. One additional benefit of this PR is that it makes the symfony demo benchmark 3% faster in some cases, because JIT is now able to use the local-exec model in these cases. Also I can't seem to be able to actually force the dynamic model.
  • Emitting code that's not eligible for relaxation (already relaxed), so the linker doesn't attempt it, would have worked as well. E.g. emitting _tsrm_ls_cache@gottpoff when we know the variable is eligible for the local-exec model, or _tsrm_ls_cache@tlsgd otherwise. But we can not know that in advance, e.g. a static archive of PHP may be ultimately linked into a shared library or an executable, making gottpoff invalid in the former case or tlsgd relaxable in the latter.
  • Having a separate TSRM cache for JIT, backed by pthread_get_key / pthread_getspecific so we don't have to deal with relocations. This is what v8/jsc do as far as I can see, but they only support "inlining" pthread_getspecific on MacOS and Windows. Unfortunately inlining it on all platforms we care about would rely on considerably more platform/libc specificity, with no ABI guarantees at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants