Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[llvm][IR] Treat memcmp and bcmp as libcalls #135706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: users/ilovepi/bcmp-libcall-precommit
Choose a base branch
from

Conversation

ilovepi
Copy link
Contributor

@ilovepi ilovepi commented Apr 15, 2025

Since the backend may emit calls to these functions, they should be
treated like other libcalls. If we don't, then it is possible to
have their definitions removed during LTO because they are dead, only to
have a later transform introduce calls to them.

See https://discourse.llvm.org/t/rfc-addressing-deficiencies-in-llvm-s-lto-implementation/84999
for more information.

Copy link
Contributor Author

ilovepi commented Apr 15, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@ilovepi ilovepi marked this pull request as ready for review April 15, 2025 00:24
@llvmbot llvmbot added LTO Link time optimization (regular/full LTO or ThinLTO) llvm:ir labels Apr 15, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 15, 2025

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-lto

Author: Paul Kirth (ilovepi)

Changes

Since the backend may emit calls to these functions, they should be
treated like other libcalls. If we don't, then it is possible to
have their definitions removed during LTO because they are dead, only to
have a later transform introduce calls to them.

See https://discourse.llvm.org/t/rfc-addressing-deficiencies-in-llvm-s-lto-implementation/84999
for more information.


Full diff: https://github.com/llvm/llvm-project/pull/135706.diff

2 Files Affected:

  • (modified) llvm/include/llvm/IR/RuntimeLibcalls.def (+2)
  • (modified) llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll (+1-2)
diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.def b/llvm/include/llvm/IR/RuntimeLibcalls.def
index 2545aebc73391..2c72bc8c012cc 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.def
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.def
@@ -513,6 +513,8 @@ HANDLE_LIBCALL(UO_PPCF128, "__gcc_qunord")
 HANDLE_LIBCALL(MEMCPY, "memcpy")
 HANDLE_LIBCALL(MEMMOVE, "memmove")
 HANDLE_LIBCALL(MEMSET, "memset")
+HANDLE_LIBCALL(MEMCMP, "memcmp")
+HANDLE_LIBCALL(BCMP, "bcmp")
 // DSEPass can emit calloc if it finds a pair of malloc/memset
 HANDLE_LIBCALL(CALLOC, "calloc")
 HANDLE_LIBCALL(BZERO, nullptr)
diff --git a/llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll b/llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll
index 4c6bebf69a074..80421cd9350c8 100644
--- a/llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll
+++ b/llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll
@@ -29,8 +29,7 @@ define i1 @foo(ptr %0, [2 x i32] %1) {
 declare i32 @memcmp(ptr, ptr, i32)
 
 ;; Ensure bcmp is removed from module. Follow up patches can address this.
-; INTERNALIZE-NOT: declare{{.*}}i32 @bcmp
-; INTERNALIZE-NOT: define{{.*}}i32 @bcmp
+; INTERNALIZE: define{{.*}}i32 @bcmp
 define i32 @bcmp(ptr %0, ptr %1, i32 %2) {
   ret i32 0
 }

Copy link
Contributor

@aeubanks aeubanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised these missing libcalls have been missing for so long without getting fixed

@efriedma-quic
Copy link
Collaborator

I don't really want to start adding functions to RuntimeLibcalls.def piecemeal without documented criteria for what, exactly, should be added. Do we need to add every single function that any transformation can generate under any circumstances? Or is there some criteria we can use to restrict this? I mean, we do memcmp->bcmp, yes, but we also touch a bunch of other math and I/O functions. Do we need to add all the functions from BuildLibCalls.h?

Certain libcalls are special because we can generate calls to them even with -fno-builtins: there is no alternative implementation. Like for memcpy, or __stack_chk_fail, or floating-point arithmetic on soft-float targets. memcmp isn't special like this.

Copy link
Contributor Author

ilovepi commented Apr 15, 2025

I think that any function that can get added after you've potentially deleted its definition needs to be handled the same way, otherwise you can end up w/ the same kind of bugs. Adding all the functions from BuildLibCalls.h seems roughly correct, since I don't recall running into anything that fails this way that isn't either on that list or in the list of RuntimeLibcalls.

Since the backend may emit calls to these functions, they should be
treated like other libcalls. If we don't, then it is possible to
have their definitions removed during LTO because they are dead, only to
have a later transform introduce calls to them.

See https://discourse.llvm.org/t/rfc-addressing-deficiencies-in-llvm-s-lto-implementation/84999
for more information.
@ilovepi ilovepi force-pushed the users/ilovepi/bcmp-libcall branch from af02216 to 4b8f422 Compare April 15, 2025 16:31
Copy link
Contributor Author

ilovepi commented Apr 15, 2025

@efriedma-quic I guess I should ask if you're opposed to us adding these to the RuntimeLibcalls? I agree that we should have some criteria spelled out, but I'm not sure we have that pinned down well enough just yet.

Also, I don't see much spelled out either in our docs or comments about the mechanisms here. Do we have anything, or maybe a thread from dicourse/mailing list that hashed some of this out? I'd like to make sure we have that kind of thing written down somewhere.

@ilovepi
Copy link
Contributor Author

ilovepi commented Apr 15, 2025

I'm surprised these missing libcalls have been missing for so long without getting fixed

I think few people are doing LTO w/ things that provide bcmp/memcmp, like libc. Typically ,what I see is that even when they're statically linked, like for embeded code, they're not built w/ LTO, so its just a normal library, and not participating in LTO.

@efriedma-quic
Copy link
Collaborator

There are, currently, basically three different ways to supply libc which we support:

  • Dynamic linking: the libc isn't part of your program at all, it's part of the environment. You only have the abstract interface.
  • Static linking, no LTO of libc: the libc becomes part of your program at link-time: the linker lowers the libc abstraction into concrete calls.
  • -fno-builtins: The libc, excluding a small set of functions which are necessary for codegen, becomes part of your program at compile-time; once you're in LLVM IR, the abstraction is gone.

To do "LTO of libc", you want something else... but you need to define what "something else" is. There needs to be a clearly delineated point at which libc implementation bits transition from an abstraction to concrete code. If your libc cooperates, that point can be different for different interfaces, but there still needs to be a specific point. You can't just blindly throw everything at the existing optimizer and hope for the best.

If you say memcmp stays an abstraction past codegen, you're getting basically zero benefit from LTO'ing it, as far as I can tell: the caller is opaque to the callee, and the callee is opaque to the caller. At best. At worst, your code explodes because your memcmp implementation depends on runtime CPU detection code that runs startup, and LTO can't understand the dependency between the detection code and memcmp.

So in essence, I feel like can't review this patch without a proposal that addresses where we're going overall.

Copy link
Contributor Author

ilovepi commented Apr 16, 2025

First, thanks for the context. I don't see anything like this written down, so I plan to find some place in our docs to put those details. I'll be sure to CC you and other folks I think will have thoughts on the precise verbiage. The compiler's contract with libc is, from what I can tell, complicated, under specified, and mostly undocumented. Having spoke w/ some libc folks about libc semantics in the past, I don't think it will be easy to pin down all the details to the extent we want. I think writing down what you put above is just the first step.

Maybe part of the issue is that I don't see a fundamental reason why libc is special beyond a few key things:

  • some apis will need a no-bultin-foo, to prevent their implementation from calling themselves.
  • some apis have well understood usage that the compiler can leverage (I'd put the memcmp->bcmp optimization in this list, but memcpy/memset are what I think of first)
  • malloc, because of aliasing

I'm probably neglecting something obvious in that short list, but for most things, I don't think anything special needs to happen. What shouldn't happen though, is that the compiler deletes a function definition, and then reintroduces a call to that function ... maybe that's what you mean by "staying an abstraction past codegen"? I didn't initially read it that way, but I guess in that light I see where you're coming from.

Put another way, I think its strictly a bug in our phase ordering to allow functions to be deleted if they may have calls introduced again. Since memcmp/bcmp are special this way(as are the existing libcalls), I guess maybe that's part of the problem. I was kind of under the impression that RuntimeLibcalls was our mechanism for handling that, though.

As for making a libc cooperate w/ the compiler, perhaps there is a set of attributes we could use (or introduce?). We already have a few of these (attribute malloc comes to mind). Maybe for things marked as being part of libc, we only mark them as dead, but don't collect until the end. Any new calls emitted would make them alive again. I haven't thought this bit through much, yet.

So, I guess let me try to explain my expectations for how we'd like the compiler to behave when LTOing a program along w/ libc. Mostly, we don't want the compiler to change its default behavior. So when it sees a call to malloc, the returned pointer is marked noalias, even if the call were inlined. For other memory routines, the compiler can either use it's own specialized implementations (like it normally does) or it can inline the call. That assumes the definitions were compiled w/ something like -fno-builtin-memcpy for the memcpy implementation (you know, so its functional). For anything that may have a call emitted via compiler transformation, it cannot be DCE'd until we're certain no new calls will be created. In the worst case that means we have to rely on linker GC, but maybe that's acceptable for something as limited as libc. Does that make sense? I have a feeling I'm oversimplifying something in my mental model, but I hope that's at least a reasonable set of goals as a first approximation.

@efriedma-quic
Copy link
Collaborator

We need to enter the "-fno-builtins" world to make interprocedural optimizations with libc safe.

Most optimizations you care about can be preserved in other ways. For example, if malloc called some intrinsic "llvm.allocate_memory"/"llvm.free_memory" to create/destroy provenance, we can preserve most aliasing-related optimizations. If your libc does runtime CPU detection, we can come up with some way to accurately model aliasing on those globals. But we need a different IR representation to make this work; we can't just treat the implementations as opaque.

If you want to run certain optimizations before we enter the "-fno-builtins" world, you need some pass that transitions IR from the "builtins" world to the "nobuiltins" world.

It might be possible for us to invent a "partial-builtin" mode which treats functions which are called as builtins, but doesn't allow generating calls to functions which aren't already used. Which would allow LTO to accurately to more accurately compute which libc functions are used. But I'm not sure how useful this would actually be in practice; if you're not LTO'ing libc, the dependencies don't really need to be accurate.


There's a smaller set of functions which have more subtle ABI rules: those we call even with -fno-builtins. These are mostly listed in RuntimeLibcalls.def. But memcmp is not one of those functions.

Copy link
Contributor Author

ilovepi commented Apr 16, 2025

hmm, that's an interesting direction. We were discussing this internally, and we were outlining some ideas along these lines, but I think you've articulated this quite a bit better than we have so far. I really like this idea of a "no-builtins" world, and transitioning the IR. I also find the idea of "partial-builtins" to be quite compelling, though I agree the usefulness maybe limited to scenarios where you're supplying a libc to LTO. Given that we're often dealing w/ kernel and embedded code, though, I think this is worth exporing more. I plan to discuss this a bit more w/ my team today, and hopefully write up something a bit more cogent than my earlier rambling. @mysterymath and @frobtech may have more to say as well.

@ilovepi
Copy link
Contributor Author

ilovepi commented Apr 30, 2025

@efriedma-quic We've been discussing the topic of LTOing libc quite a bit internally, and are currently sketching out how this could work. Unsurprisingly, there's quite a lot to think about in both the compiler and linker, and how the two combine in our different versions of LTO, and how that may break in new and fun ways.

I was wondering if you'd be available to join the libc monthly meeting (5/8 9am PST) to discuss your take on the whole idea? I'm not sure what time zone you're normally in, but I think there will be quite a number of folks who are interested in making LTO work well with libc's and LLVM libc in particular. I know a few of my team members would like to pick your brain on the subject, as we're sketching our ideas out. I can try to post a short summary of our thoughts either here or on discouse to make the discussion a bit easier as well.

@efriedma-quic
Copy link
Collaborator

I can join libc monthly meeting, sure.

@ilovepi
Copy link
Contributor Author

ilovepi commented May 7, 2025

I can join libc monthly meeting, sure.

Great. Looking forward to discussing this then. I'll add the topic to the meeting agenda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:ir LTO Link time optimization (regular/full LTO or ThinLTO)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants