Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[BOLT][test] Fix callcont-fallthru.s after #129481 #135867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: users/aaupov/spr/main.bolttest-fix-callcont-fallthrus-after-129481
Choose a base branch
from

Conversation

aaupov
Copy link
Contributor

@aaupov aaupov commented Apr 15, 2025

Only set --synthetic nm flag in link_fdata if requested explicitly.

Test Plan: bin/llvm-lit -a tools/bolt/test/X86/callcont-fallthru.s

Created using spr 1.3.4
@llvmbot
Copy link
Member

llvmbot commented Apr 15, 2025

@llvm/pr-subscribers-bolt

Author: Amir Ayupov (aaupov)

Changes

Force the use of llvm-nm for PREAGGPLT check.

Test Plan: bin/llvm-lit -a tools/bolt/test/X86/callcont-fallthru.s


Full diff: https://github.com/llvm/llvm-project/pull/135867.diff

1 Files Affected:

  • (modified) bolt/test/X86/callcont-fallthru.s (+1-1)
diff --git a/bolt/test/X86/callcont-fallthru.s b/bolt/test/X86/callcont-fallthru.s
index ee72d8f62e032..6b5caa08d3128 100644
--- a/bolt/test/X86/callcont-fallthru.s
+++ b/bolt/test/X86/callcont-fallthru.s
@@ -9,7 +9,7 @@
 # RUN: link_fdata %s %t %t.pa3 PREAGG3
 # RUN: link_fdata %s %t %t.pat PREAGGT1
 # RUN: link_fdata %s %t %t.pat2 PREAGGT2
-# RUN: link_fdata %s %t %t.patplt PREAGGPLT
+# RUN: link_fdata %s %t %t.patplt PREAGGPLT --nmtool llvm-nm
 
 ## Check normal case: fallthrough is not LP or secondary entry.
 # RUN: llvm-strip --strip-unneeded %t -o %t.strip

Created using spr 1.3.4
@aaupov aaupov requested a review from paschalis-mpeis April 15, 2025 22:45
Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Amir,

Thanks for the PR. Unfortunately, it is still failing. The trick below doesn't seem to work on my buildbot machine:

Link against a DSO to ensure PLT entries.

So doing:

nm --synthetic callcont-fallthru.s.tmp

won't list a puts@plt symbol, which is what causes an link_fdata.py assertion:

AssertionError: ERROR: symbol puts@plt is not defined in binary

On my dev AArch64 instance --synthetic does the trick. BTW run lines 4 and 6 appear identical when inspected (-###)

@aaupov
Copy link
Contributor Author

aaupov commented Apr 16, 2025

@MaskRay – can you please advise how to force a PLT entry if linking with a DSO hack doesn't work?

@MaskRay
Copy link
Member

MaskRay commented Apr 17, 2025

Hey Amir,

Thanks for the PR. Unfortunately, it is still failing. The trick below doesn't seem to work on my buildbot machine:

Link against a DSO to ensure PLT entries.

So doing:

nm --synthetic callcont-fallthru.s.tmp

won't list a puts@plt symbol, which is what causes an link_fdata.py assertion:

AssertionError: ERROR: symbol puts@plt is not defined in binary

On my dev AArch64 instance --synthetic does the trick. BTW run lines 4 and 6 appear identical when inspected (-###)

You need a libc.so that defines puts, and then creates an executable that references puts and links against libc.so. Then the executable will have a PLT entry, and you do not need the --unresolved-symbols=ignore-all hack.

@paschalis-mpeis
Copy link
Member

Thanks a lot both! In case there's some delay in resolving this edge case, may I suggest temporarily disabling this test on AArch64 until a more consistent workaround is in place?

@paschalis-mpeis
Copy link
Member

Hey folks, any updates on this?

I spent some time experimenting with @MaskRay's suggestion. I used a mock libc shared object that had a puts symbol.
Indeed there won't be unresolved symbols now, however, still GNU nm doesn't show a PLT entry when using --synthetic .

@yota9
Copy link
Member

yota9 commented Apr 29, 2025

Maybe I'm missing something but why use /dev/null library hack in the first place here? There is stub.c available with puts symbol already, just use it to compile so and link against it, plt entry should appear normally.

@paschalis-mpeis
Copy link
Member

Hey @yota9, thanks for the input. I tried something similar.
Even when I use stub.c and link it with:

-# RUN: %clang %cflags -fpic -shared -xc /dev/null -o %t.so
-## Link against a DSO to ensure PLT entries.
+# RUN: %clang %cflags %p/../Inputs/stub.c -fPIC -shared -o %t.so

then running GNU nm:

nm %t --synthetic

would emit only

                 U puts

which link_fdata rejects. On some other machines though, GNU nm emits:

                 U puts
0000000000001234 T puts@plt

which works well. In both cases it was the same nm driver version.
TMU this inconsistency was reported on x86 machines too.


I might've missed something on my end. I briefly discussed this with Amir (see discord) as I'm trying to unblock our AArch64 buildbot. We figured it's fine to disable this test on AArch64 until the issue gets resolved. Could you mind taking a look at #137831, and consider accepting it?

@yota9
Copy link
Member

yota9 commented Apr 30, 2025

@paschalis-mpeis Could you please check if the binaries are identical and it is indeed nm problem? E.g. with objdump, is plt entry is there? Maybe there is problem related to the plt section type, e.g. one of the binaries has .plt.sec or .plt.got section and there is some kind of but in nm that not lists symbols from these sections. Then we can use custom linker script with .plt section only.
My next suggestion would be just using llvm-nm here. Pass llvm-nm with --nmtool arg to link_fdata.py, since the lit.cfg.py has it in the list of mandatory tools for bolt testing, so we won't have environment dependencies here. Maybe even add nmtool as an link_fdata_cmd arg in lit.cfg.py , so all tests would use it by default...

@paschalis-mpeis
Copy link
Member

Hey @yota9, thanks for the suggestions!

Indeed, the PLT entries exist in both binaries. For example running:

build/bin/llvm-objdump -d -j .plt build/tools/bolt/test/X86/Output/callcont-fallthru.s.tmp

shows:

build/tools/bolt/test/X86/Output/callcont-fallthru.s.tmp:       file format elf64-x86-64

Disassembly of section .plt:
0000000000001430 <.plt>:
    1430: ff 35 f2 20 00 00             pushq   0x20f2(%rip)            # 0x3528 <puts+0x3528>
    1436: ff 25 f4 20 00 00             jmpq    *0x20f4(%rip)           # 0x3530 <puts+0x3530>
    143c: 0f 1f 40 00                   nopl    (%rax)

0000000000001440 <puts@plt>:
    1440: ff 25 f2 20 00 00             jmpq    *0x20f2(%rip)           # 0x3538 <puts+0x3538>
    1446: 68 00 00 00 00                pushq   $0x0
    144b: e9 e0 ff ff ff                jmp     0x1430 <.plt>

I noticed some code differences in the binaries but I haven't looked deeper into it.

It looks like it's differences in GNU nm though:

On my AArch64 dev-machine, nm --synthetic lists puts@plt, but when I copy that same binary over to our upcoming AArch64 buildbot, it's missing.

Conversely, nm --synthetic on the buildbot does not list puts@plt, but when if I copy that binary to the dev-machine it does appear.


I too agree that relying on GNU is not ideal. Essentially using any binary tool that does not come from the built LLVM revision. However, llvm-nm does not seem support --synthetic.

BTW, thanks for all the help! I'm focused on AArch64, so while I may be involved to some extent with this, I'll let Amir drive the fix. That's why I'm looking for a code owner to get #137831 stamped. :)
(also cc'ing: @aaupov, @maksfb)

@aaupov
Copy link
Contributor Author

aaupov commented Apr 30, 2025

Thanks for tracking it down, looks like it's an issue with GNU nm. However llvm-nm has no functionality equivalent to nm --synthetic which prints the address of the PLT entry, and the test relies on that.

Let me try to decouple this test from GNU nm.

@paschalis-mpeis
Copy link
Member

Great. A quick way to use an llvm tool could be:

llvm-objdump -d -j .plt %t | grep @plt

This produces output similar to what nm --synthetic produces (when it works):

0000000000001430 <puts@plt>:

You'll need ofc to tweak link_fdata to properly parse symbol+address:

symval, _, symname = symline.split(maxsplit=2)
if symname in symbols and args.no_redefine:
continue
symbols[symname] = symval

Not sure of any cleaner approach? (@yota9, @MaskRay)

@yota9
Copy link
Member

yota9 commented May 2, 2025

I've decided to add synthetic option to llvm-nm here #138232 . Unfortunately it would take some time, as main maintainer won't be able to review it soon, so probably for now we might just mark the test as XFAIL until then.. Not forgetting to replace nm with llvm-nm

@paschalis-mpeis
Copy link
Member

paschalis-mpeis commented May 2, 2025

That is perfect and the way we should go forward with this – thanks @yota9.

The problem is that the test is flaky: it passes on most systems but fails on a few.
UsingXFAIL would make my AArch64 buildbot happy but it will cause failures (Unexpectedly Passed) on other AArch64 machines I've tested . 🤷‍♂️

That's why I propose restricting this to X86 for now, as a way to unblock us in the meantime:


# REQUIRES: x86_64-linux

@yota9
Copy link
Member

yota9 commented May 2, 2025

@paschalis-mpeis Indeed, you're right. Let's wait about @aaupov decision then, it LGTM

yota9 added a commit to yota9/llvm-project that referenced this pull request May 2, 2025
Compatible with GNU nm --syntethic option is used to show special
symbols created by the linker. Current implementation is limited to show
plt entries in the form of symbol@plt and plt entry address. Currently
it would be used for BOLT testing purposes
(llvm#135867)
in order to eliminate external GNU nm dependency.
yota9 added a commit to yota9/llvm-project that referenced this pull request May 2, 2025
Compatible with GNU nm --syntethic option is used to show special
symbols created by the linker. Current implementation is limited to show
plt entries in the form of symbol@plt and plt entry address. Currently
it would be used for BOLT testing purposes
(llvm#135867)
in order to eliminate external GNU nm dependency.
@yota9
Copy link
Member

yota9 commented May 2, 2025

@paschalis-mpeis I realised that if would change the nm to llvm-nm that we can just mark test as xfail, as it would fail until the patch above would be submitted. This way we would guarantee to have proper changes it test.

@paschalis-mpeis
Copy link
Member

Yeap, good idea. I could add XFAIL and modify runline like:

# RUN: link_fdata %s %t %t.patplt PREAGGPLT --synthetic --nmtool=llvm-nm

The differences would be :

  • with REQUIRES we won't cross-run this x86 lit test on AArch64 (as I do currently in #137831)
  • with XFAIL + llvm-nm the test would be expected to fail on both architectures. But once your work is merged, it would unexpectedly pass, which would break the test and prompt us to update it

I'm happy to proceed with this as well.

@yota9
Copy link
Member

yota9 commented May 2, 2025

Yeah, that's right. Although maybe we need to replace nm to llvm-nm in link_fdata to be default... Up to you and @aaupov to decide..

@paschalis-mpeis
Copy link
Member

Yes, and that'd actually be better so we don't depend on whatever host GNU nm the machine has.
Based on this, I'd say @aaupov intends to make this change too. I think he's away – let's see what he says once back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants