Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[DSE] Mark promise of pre-split coroutine visible to caller #133918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

NewSigma
Copy link
Contributor

@NewSigma NewSigma commented Apr 1, 2025

Currently DSE does not recognize that the coro frame is visible to the caller. It incorrectly eliminates stores right before returning to the caller, even though these values will be used upon resumption. This commit marks promise of pre-split coroutine visible to caller to avoid incorrect elimination.

Fix #105595

Allocas are destroyed when returning from functions. However, this is not the case for pre-split coroutines. Any premature elimination will lead to side effects.

Fix 123347
@NewSigma
Copy link
Contributor Author

NewSigma commented Apr 2, 2025

Request code review from @nikic and @ChuanqiXu9

@hstk30-hw hstk30-hw requested review from nikic and ChuanqiXu9 April 2, 2025 12:01
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some more detail to the issue description why coroutine semantics require this? This hack looks very problematic to me, and seems quite distinct from the existing coroutine workarounds we have (which are about the possibility of the thread identity changing across suspension points).

@llvmbot
Copy link
Member

llvmbot commented Apr 3, 2025

@llvm/pr-subscribers-llvm-transforms

Author: None (NewSigma)

Changes

Allocas are destroyed when returning from functions. However, this is not the case for pre-split coroutines, because coroutine frame should be visible to caller. For example, one can write to the coroutine's promise, suspend, and later read from the caller. Eliminating such stores would introduce side effects.

This commit forces that all allocas of pre-split coroutines remain visible to the caller. While this may miss some optimization opportunities, correctness takes priority. Future work could analyze the lifetimes of allocas if performance regressions become significant.

Fix #123347


Full diff: https://github.com/llvm/llvm-project/pull/133918.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp (+3-1)
  • (added) llvm/test/Transforms/DeadStoreElimination/coro-alloca.ll (+33)
diff --git a/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp b/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
index 935f21fd484f3..780b64e70136f 100644
--- a/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
+++ b/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
@@ -1194,7 +1194,9 @@ struct DSEState {
 
   bool isInvisibleToCallerAfterRet(const Value *V) {
     if (isa<AllocaInst>(V))
-      return true;
+      // Defer alloca store elimination, wait for CoroSplit
+      return !F.isPresplitCoroutine();
+
     auto I = InvisibleToCallerAfterRet.insert({V, false});
     if (I.second) {
       if (!isInvisibleToCallerOnUnwind(V)) {
diff --git a/llvm/test/Transforms/DeadStoreElimination/coro-alloca.ll b/llvm/test/Transforms/DeadStoreElimination/coro-alloca.ll
new file mode 100644
index 0000000000000..ec9dc84f2c4ae
--- /dev/null
+++ b/llvm/test/Transforms/DeadStoreElimination/coro-alloca.ll
@@ -0,0 +1,33 @@
+; Test that store-load operation that crosses suspension point will not be eliminated by DSE before CoroSplit
+; RUN: opt < %s -passes='dse' -S | FileCheck %s
+
+define void @fn(ptr align 8 %arg) presplitcoroutine {
+  %promise = alloca ptr, align 8
+  %awaiter = alloca i8, align 1
+  %id = call token @llvm.coro.id(i32 16, ptr %promise, ptr @fn, ptr null)
+  %hdl = call ptr @llvm.coro.begin(token %id, ptr null)
+  %mem = call ptr @malloc(i64 1)
+  call void @llvm.lifetime.start.p0(i64 8, ptr %promise)
+  store ptr %mem, ptr %promise, align 8
+  %save = call token @llvm.coro.save(ptr null)
+  call void @llvm.coro.await.suspend.void(ptr %awaiter, ptr %hdl, ptr @await_suspend_wrapper_void)
+  %sp = call i8 @llvm.coro.suspend(token %save, i1 false)
+  %flag = icmp ule i8 %sp, 1
+  br i1 %flag, label %resume, label %suspend
+
+resume:
+  call void @llvm.lifetime.end.p0(i64 8, ptr %promise)
+  br label %suspend
+
+suspend:
+  call i1 @llvm.coro.end(ptr null, i1 false, token none)
+  %temp = load ptr, ptr %promise, align 8
+  store ptr %temp, ptr %arg, align 8
+; store when suspend, load when resume
+; CHECK: store ptr null, ptr %promise, align 8
+  store ptr null, ptr %promise, align 8
+  ret void
+}
+
+declare ptr @malloc(i64)
+declare void @await_suspend_wrapper_void(ptr, ptr)

@NewSigma
Copy link
Contributor Author

NewSigma commented Apr 3, 2025

Thanks. I updated my issue description.

@ChuanqiXu9
Copy link
Member

Allocas are destroyed when returning from functions. However, this is not the case for pre-split coroutines, because coroutine frame should be visible to caller. For example, one can write to the coroutine's promise, suspend, and later read from the caller. Eliminating such stores would introduce side effects.

Could you elaborate this? e.g, give a example to describe why it is problematic. I didn't understand it.

This commit forces that all allocas of pre-split coroutines remain visible to the caller. While this may miss some optimization

While I didn't understand the problem, my instinct reaction is, even if we want to do something like this, maybe we can only do this for special allocas, like the promise alloca.

Copy link

github-actions bot commented Apr 3, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@NewSigma
Copy link
Contributor Author

NewSigma commented Apr 3, 2025

Could you elaborate this? e.g, give a example to describe why it is problematic. I didn't understand it.

It seems that DSE does not recognize that the coro frame is visible to the caller. It incorrectly eliminates stores right before returning to the caller, even though these values will be used upon resumption."

While I didn't understand the problem, my instinct reaction is, even if we want to do something like this, maybe we can only do this for special allocas, like the promise alloca.

Yes, this is a safer choice.

@NewSigma NewSigma changed the title [DSE] Defer alloca store elimination for CoroSplit [DSE] Mark promise of pre-split coroutine visible to caller Apr 3, 2025
@NewSigma
Copy link
Contributor Author

NewSigma commented Apr 7, 2025

I will try to elaborate on my concerns about the issue. Hope you will understand. :)

Consider the following example:

resume:
; Do something with %promise
  br label %subpend
suspend:

  store ptr null, ptr %promise, align 8 ; Do not eliminate
  ret void

DSE will treat the function as a normal function, and stores just before 'ret' will be eliminated. This is done by eliminateDeadWritesAtEndOfFunction() in DSE, whose comments reads:

Eliminate writes to objects that are not visible in the caller and are not accessed before returning from the function.

I have two questions:

  1. What does 'end of function' mean when it comes to pre-split coroutines? Does the function end each time the coroutine suspends, or when the coroutine reaches the final suspend?
  2. Is coroutine promise invisible to the caller, even indirectly?

If we consider that function ends each time coroutine suspends, then eliminateDeadWritesAtEndOfFunction() cannot be applied to pre-split coroutines and should be disabled.
If we consider that function ends when the coroutine reaches the final suspend point, we can analyze potential memory usage after the coroutine resumes and avoid mis-optimization. However, I believe it is more appropriate to do this after CoroSplit. After CoroSplit, the original example may looks like:

suspend:
  store ptr null, ptr %promise, align 8 ; Do not eliminate
  ret void
resume:
; Do something with %promise
  store ptr null, ptr %promise, align 8 ; Should eliminate
  ret void

Eager elimination introduces additional compilation costs and will not lead to more optimized code.

Personally, I prefer to consider the pre-split coroutine promise visible to the caller, as this is safer than simply disabling eliminateDeadWritesAtEndOfFunction() for pre-split coroutines. However, this depends on your understanding of coroutine semantics.

This hack looks very problematic to me, and seems quite distinct from the existing coroutine workarounds we have (which are about the possibility of the thread identity changing across suspension points).

Since the issue does nothing with thread_local storage or readnone functions, I consider it essentially different from thread identity changing problem.

@ChuanqiXu9
Copy link
Member

I have two questions:

  1. What does 'end of function' mean when it comes to pre-split coroutines? Does the function end each time the coroutine suspends, or when the coroutine reaches the final suspend?
  2. Is coroutine promise invisible to the caller, even indirectly?
  1. It means the coroutine reaches the final suspend. (or it exits by exception, of course)
  2. Coroutine promise is visible to the caller.

Consider the following example:

resume:
; Do something with %promise
  br label %subpend
suspend:

  store ptr null, ptr %promise, align 8 ; Do not eliminate
  ret void

Out of curiosity, what's the corresponding pattern in C++ side? I mean, how can we see such pattern before coro-split?

@NewSigma
Copy link
Contributor Author

NewSigma commented Apr 8, 2025

Out of curiosity, what's the corresponding pattern in C++ side? I mean, how can we see such pattern before coro-split?

Coro result object conversion function that attempts to modify the promise shall produce the pattern.

@ChuanqiXu9
Copy link
Member

Out of curiosity, what's the corresponding pattern in C++ side? I mean, how can we see such pattern before coro-split?

Coro result object conversion function that attempts to modify the promise shall produce the pattern.

Makes sense.

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please leave a few days to give @nikic a chance to take a look.

@nikic
Copy link
Contributor

nikic commented Apr 10, 2025

I don't have time to look deeply into this right now, but the change looks very concerning to me. Allocas becoming dead at the end of the function is a very core property of allocas. If it does not hold for the promise alloca, it probably should not be an alloca? Is it possible to use a different IR representation?

@ChuanqiXu9
Copy link
Member

I don't have time to look deeply into this right now, but the change looks very concerning to me. Allocas becoming dead at the end of the function is a very core property of allocas. If it does not hold for the promise alloca, it probably should not be an alloca? Is it possible to use a different IR representation?

It sounds make sense to use a different IR representation to address the concern. But I don't have a concrete plan and I feel we lack the human resource right now. Maybe we can mark this as an issue or a bug and asking for volunteers.

for the problem itself, the problem actually may be the return of the get_return_object. It returns value to the caller in the initial suspend. But from the inner perspective, the function coroutine doesn't finish. So semantically I feel it is fine to make promise an alloca. It is dead in the end of the lifetime of the function. But the return doesn't imply the the end in coroutines.

I don't have solution in mind now. I feel it is somewhat fundamental. For the patch itself, I feel it might be better to add FIXME and land it to stop bleeding. WDYT?

@NewSigma
Copy link
Contributor Author

Perhaps we can come up with a better solution. Let's close it for now.

@NewSigma NewSigma closed this Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[clang][coroutines] Run-time crash with optimization when using coroutine with co_await
4 participants