[llvm][IR] Treat memcmp and bcmp as libcalls #135706

ilovepi · 2025-04-15T00:22:36Z

Since the backend may emit calls to these functions, they should be
treated like other libcalls. If we don't, then it is possible to
have their definitions removed during LTO because they are dead, only to
have a later transform introduce calls to them.

See https://discourse.llvm.org/t/rfc-addressing-deficiencies-in-llvm-s-lto-implementation/84999
for more information.

ilovepi · 2025-04-15T00:22:49Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

[llvm][IR] Treat memcmp and bcmp as libcalls #135706 👈 (View in Graphite)
[llvm][lto] Precommit test for libcall internalization #135705
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-04-15T00:25:12Z

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-lto

Author: Paul Kirth (ilovepi)

Changes

Since the backend may emit calls to these functions, they should be
treated like other libcalls. If we don't, then it is possible to
have their definitions removed during LTO because they are dead, only to
have a later transform introduce calls to them.

See https://discourse.llvm.org/t/rfc-addressing-deficiencies-in-llvm-s-lto-implementation/84999
for more information.

Full diff: https://github.com/llvm/llvm-project/pull/135706.diff

2 Files Affected:

(modified) llvm/include/llvm/IR/RuntimeLibcalls.def (+2)
(modified) llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll (+1-2)

diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.def b/llvm/include/llvm/IR/RuntimeLibcalls.def
index 2545aebc73391..2c72bc8c012cc 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.def
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.def
@@ -513,6 +513,8 @@ HANDLE_LIBCALL(UO_PPCF128, "__gcc_qunord")
 HANDLE_LIBCALL(MEMCPY, "memcpy")
 HANDLE_LIBCALL(MEMMOVE, "memmove")
 HANDLE_LIBCALL(MEMSET, "memset")
+HANDLE_LIBCALL(MEMCMP, "memcmp")
+HANDLE_LIBCALL(BCMP, "bcmp")
 // DSEPass can emit calloc if it finds a pair of malloc/memset
 HANDLE_LIBCALL(CALLOC, "calloc")
 HANDLE_LIBCALL(BZERO, nullptr)
diff --git a/llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll b/llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll
index 4c6bebf69a074..80421cd9350c8 100644
--- a/llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll
+++ b/llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll
@@ -29,8 +29,7 @@ define i1 @foo(ptr %0, [2 x i32] %1) {
 declare i32 @memcmp(ptr, ptr, i32)
 
 ;; Ensure bcmp is removed from module. Follow up patches can address this.
-; INTERNALIZE-NOT: declare{{.*}}i32 @bcmp
-; INTERNALIZE-NOT: define{{.*}}i32 @bcmp
+; INTERNALIZE: define{{.*}}i32 @bcmp
 define i32 @bcmp(ptr %0, ptr %1, i32 %2) {
   ret i32 0
 }

aeubanks

I'm surprised these missing libcalls have been missing for so long without getting fixed

efriedma-quic · 2025-04-15T07:18:04Z

I don't really want to start adding functions to RuntimeLibcalls.def piecemeal without documented criteria for what, exactly, should be added. Do we need to add every single function that any transformation can generate under any circumstances? Or is there some criteria we can use to restrict this? I mean, we do memcmp->bcmp, yes, but we also touch a bunch of other math and I/O functions. Do we need to add all the functions from BuildLibCalls.h?

Certain libcalls are special because we can generate calls to them even with -fno-builtins: there is no alternative implementation. Like for memcpy, or __stack_chk_fail, or floating-point arithmetic on soft-float targets. memcmp isn't special like this.

ilovepi · 2025-04-15T16:29:32Z

I think that any function that can get added after you've potentially deleted its definition needs to be handled the same way, otherwise you can end up w/ the same kind of bugs. Adding all the functions from BuildLibCalls.h seems roughly correct, since I don't recall running into anything that fails this way that isn't either on that list or in the list of RuntimeLibcalls.

Since the backend may emit calls to these functions, they should be treated like other libcalls. If we don't, then it is possible to have their definitions removed during LTO because they are dead, only to have a later transform introduce calls to them. See https://discourse.llvm.org/t/rfc-addressing-deficiencies-in-llvm-s-lto-implementation/84999 for more information.

ilovepi · 2025-04-15T17:14:25Z

@efriedma-quic I guess I should ask if you're opposed to us adding these to the RuntimeLibcalls? I agree that we should have some criteria spelled out, but I'm not sure we have that pinned down well enough just yet.

Also, I don't see much spelled out either in our docs or comments about the mechanisms here. Do we have anything, or maybe a thread from dicourse/mailing list that hashed some of this out? I'd like to make sure we have that kind of thing written down somewhere.

ilovepi · 2025-04-15T17:33:03Z

I'm surprised these missing libcalls have been missing for so long without getting fixed

I think few people are doing LTO w/ things that provide bcmp/memcmp, like libc. Typically ,what I see is that even when they're statically linked, like for embeded code, they're not built w/ LTO, so its just a normal library, and not participating in LTO.

efriedma-quic · 2025-04-15T19:19:07Z

There are, currently, basically three different ways to supply libc which we support:

Dynamic linking: the libc isn't part of your program at all, it's part of the environment. You only have the abstract interface.
Static linking, no LTO of libc: the libc becomes part of your program at link-time: the linker lowers the libc abstraction into concrete calls.
-fno-builtins: The libc, excluding a small set of functions which are necessary for codegen, becomes part of your program at compile-time; once you're in LLVM IR, the abstraction is gone.

To do "LTO of libc", you want something else... but you need to define what "something else" is. There needs to be a clearly delineated point at which libc implementation bits transition from an abstraction to concrete code. If your libc cooperates, that point can be different for different interfaces, but there still needs to be a specific point. You can't just blindly throw everything at the existing optimizer and hope for the best.

If you say memcmp stays an abstraction past codegen, you're getting basically zero benefit from LTO'ing it, as far as I can tell: the caller is opaque to the callee, and the callee is opaque to the caller. At best. At worst, your code explodes because your memcmp implementation depends on runtime CPU detection code that runs startup, and LTO can't understand the dependency between the detection code and memcmp.

So in essence, I feel like can't review this patch without a proposal that addresses where we're going overall.

ilovepi · 2025-04-16T17:10:33Z

First, thanks for the context. I don't see anything like this written down, so I plan to find some place in our docs to put those details. I'll be sure to CC you and other folks I think will have thoughts on the precise verbiage. The compiler's contract with libc is, from what I can tell, complicated, under specified, and mostly undocumented. Having spoke w/ some libc folks about libc semantics in the past, I don't think it will be easy to pin down all the details to the extent we want. I think writing down what you put above is just the first step.

Maybe part of the issue is that I don't see a fundamental reason why libc is special beyond a few key things:

some apis will need a no-bultin-foo, to prevent their implementation from calling themselves.
some apis have well understood usage that the compiler can leverage (I'd put the memcmp->bcmp optimization in this list, but memcpy/memset are what I think of first)
malloc, because of aliasing

I'm probably neglecting something obvious in that short list, but for most things, I don't think anything special needs to happen. What shouldn't happen though, is that the compiler deletes a function definition, and then reintroduces a call to that function ... maybe that's what you mean by "staying an abstraction past codegen"? I didn't initially read it that way, but I guess in that light I see where you're coming from.

Put another way, I think its strictly a bug in our phase ordering to allow functions to be deleted if they may have calls introduced again. Since memcmp/bcmp are special this way(as are the existing libcalls), I guess maybe that's part of the problem. I was kind of under the impression that RuntimeLibcalls was our mechanism for handling that, though.

As for making a libc cooperate w/ the compiler, perhaps there is a set of attributes we could use (or introduce?). We already have a few of these (attribute malloc comes to mind). Maybe for things marked as being part of libc, we only mark them as dead, but don't collect until the end. Any new calls emitted would make them alive again. I haven't thought this bit through much, yet.

So, I guess let me try to explain my expectations for how we'd like the compiler to behave when LTOing a program along w/ libc. Mostly, we don't want the compiler to change its default behavior. So when it sees a call to malloc, the returned pointer is marked noalias, even if the call were inlined. For other memory routines, the compiler can either use it's own specialized implementations (like it normally does) or it can inline the call. That assumes the definitions were compiled w/ something like -fno-builtin-memcpy for the memcpy implementation (you know, so its functional). For anything that may have a call emitted via compiler transformation, it cannot be DCE'd until we're certain no new calls will be created. In the worst case that means we have to rely on linker GC, but maybe that's acceptable for something as limited as libc. Does that make sense? I have a feeling I'm oversimplifying something in my mental model, but I hope that's at least a reasonable set of goals as a first approximation.

efriedma-quic · 2025-04-16T18:14:25Z

We need to enter the "-fno-builtins" world to make interprocedural optimizations with libc safe.

Most optimizations you care about can be preserved in other ways. For example, if malloc called some intrinsic "llvm.allocate_memory"/"llvm.free_memory" to create/destroy provenance, we can preserve most aliasing-related optimizations. If your libc does runtime CPU detection, we can come up with some way to accurately model aliasing on those globals. But we need a different IR representation to make this work; we can't just treat the implementations as opaque.

If you want to run certain optimizations before we enter the "-fno-builtins" world, you need some pass that transitions IR from the "builtins" world to the "nobuiltins" world.

It might be possible for us to invent a "partial-builtin" mode which treats functions which are called as builtins, but doesn't allow generating calls to functions which aren't already used. Which would allow LTO to accurately to more accurately compute which libc functions are used. But I'm not sure how useful this would actually be in practice; if you're not LTO'ing libc, the dependencies don't really need to be accurate.

There's a smaller set of functions which have more subtle ABI rules: those we call even with -fno-builtins. These are mostly listed in RuntimeLibcalls.def. But memcmp is not one of those functions.

ilovepi · 2025-04-16T18:24:22Z

hmm, that's an interesting direction. We were discussing this internally, and we were outlining some ideas along these lines, but I think you've articulated this quite a bit better than we have so far. I really like this idea of a "no-builtins" world, and transitioning the IR. I also find the idea of "partial-builtins" to be quite compelling, though I agree the usefulness maybe limited to scenarios where you're supplying a libc to LTO. Given that we're often dealing w/ kernel and embedded code, though, I think this is worth exporing more. I plan to discuss this a bit more w/ my team today, and hopefully write up something a bit more cogent than my earlier rambling. @mysterymath and @frobtech may have more to say as well.

ilovepi · 2025-04-30T23:27:19Z

@efriedma-quic We've been discussing the topic of LTOing libc quite a bit internally, and are currently sketching out how this could work. Unsurprisingly, there's quite a lot to think about in both the compiler and linker, and how the two combine in our different versions of LTO, and how that may break in new and fun ways.

I was wondering if you'd be available to join the libc monthly meeting (5/8 9am PST) to discuss your take on the whole idea? I'm not sure what time zone you're normally in, but I think there will be quite a number of folks who are interested in making LTO work well with libc's and LLVM libc in particular. I know a few of my team members would like to pick your brain on the subject, as we're sketching our ideas out. I can try to post a short summary of our thoughts either here or on discouse to make the discussion a bit easier as well.

efriedma-quic · 2025-05-05T21:19:08Z

I can join libc monthly meeting, sure.

ilovepi · 2025-05-07T19:01:39Z

I can join libc monthly meeting, sure.

Great. Looking forward to discussing this then. I'll add the topic to the meeting agenda.

ilovepi mentioned this pull request Apr 15, 2025

[llvm][lto] Precommit test for libcall internalization #135705

Open

ilovepi requested review from aeubanks, mysterymath and nikic April 15, 2025 00:24

ilovepi marked this pull request as ready for review April 15, 2025 00:24

llvmbot added LTO Link time optimization (regular/full LTO or ThinLTO) llvm:ir labels Apr 15, 2025

aeubanks approved these changes Apr 15, 2025

View reviewed changes

ilovepi force-pushed the users/ilovepi/bcmp-libcall branch from af02216 to 4b8f422 Compare April 15, 2025 16:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[llvm][IR] Treat memcmp and bcmp as libcalls #135706

[llvm][IR] Treat memcmp and bcmp as libcalls #135706

Uh oh!

ilovepi commented Apr 15, 2025

Uh oh!

ilovepi commented Apr 15, 2025 •

edited

Loading

Uh oh!

llvmbot commented Apr 15, 2025 •

edited

Loading

Uh oh!

aeubanks left a comment

Uh oh!

efriedma-quic commented Apr 15, 2025

Uh oh!

ilovepi commented Apr 15, 2025

Uh oh!

ilovepi commented Apr 15, 2025

Uh oh!

ilovepi commented Apr 15, 2025

Uh oh!

efriedma-quic commented Apr 15, 2025

Uh oh!

ilovepi commented Apr 16, 2025

Uh oh!

efriedma-quic commented Apr 16, 2025

Uh oh!

ilovepi commented Apr 16, 2025

Uh oh!

ilovepi commented Apr 30, 2025

Uh oh!

efriedma-quic commented May 5, 2025

Uh oh!

ilovepi commented May 7, 2025

Uh oh!

Uh oh!

[llvm][IR] Treat memcmp and bcmp as libcalls #135706

Are you sure you want to change the base?

[llvm][IR] Treat memcmp and bcmp as libcalls #135706

Uh oh!

Conversation

ilovepi commented Apr 15, 2025

Uh oh!

ilovepi commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aeubanks left a comment

Choose a reason for hiding this comment

Uh oh!

efriedma-quic commented Apr 15, 2025

Uh oh!

ilovepi commented Apr 15, 2025

Uh oh!

ilovepi commented Apr 15, 2025

Uh oh!

ilovepi commented Apr 15, 2025

Uh oh!

efriedma-quic commented Apr 15, 2025

Uh oh!

ilovepi commented Apr 16, 2025

Uh oh!

efriedma-quic commented Apr 16, 2025

Uh oh!

ilovepi commented Apr 16, 2025

Uh oh!

ilovepi commented Apr 30, 2025

Uh oh!

efriedma-quic commented May 5, 2025

Uh oh!

ilovepi commented May 7, 2025

Uh oh!

Uh oh!

ilovepi commented Apr 15, 2025 •

edited

Loading

llvmbot commented Apr 15, 2025 •

edited

Loading