-
Notifications
You must be signed in to change notification settings - Fork 174
💥 [Breaking] Asyncify slot suppliers #2433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
💥 [Breaking] Asyncify slot suppliers #2433
Conversation
temporal-sdk/src/main/java/io/temporal/worker/tuning/SlotSupplier.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/worker/tuning/ResourceBasedSlotSupplier.java
Outdated
Show resolved
Hide resolved
.orElseGet( | ||
() -> | ||
CompletableFuture.supplyAsync(() -> null, delayedExecutor(10)) | ||
.thenCompose(ig -> scheduleSlotAcquisition(ctx))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ChatGPT is telling me that cancellation does not propagate across orElseGet
or thenCompose
operators, but that doesn't seem right and I can't find this in the docs at https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletionStage.html. May need a test somehow confirming cancel after this orElseGet
does actually prevent the delayed call from being invoked. If it doesn't work, I can make some suggestions I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this works fine because the returned future is permitFuture
which is thenCompose
d with the future from here. Added a test that confirms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that ChatGPT says that a cancel of the outer future does not propagate the cancel to the thenCompose
d future. Not sure I believe it because I haven't tested, but it does concern me. I would have to test Java behavior of cancel when the thenCompose
has already run to create a new future. I may set aside some time to do this.
But I fear reading https://stackoverflow.com/questions/25417881/canceling-a-completablefuture-chain and poking around, there may need to be some other mechanism to make sure the outer completable future cancel propagates to cancelling the delayed executor. I couldn't tell if the test was doing that. I wouldn't be surprised if cancel is not hierarchical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Canceling a completable future is not a good way to tell the producer the result is no longer needed, it is to tell the consumer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the standard library would use Future
for this like here https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ScheduledFuture.html. The other option is to return a special interface like SlotSupplierFuture
interface that implements the CompletableFuture
interface and another different "cancel" method that actually does what you want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've dealt with this by returning future and using an anonymous class in the one spot (two counting test code) I actually need another stage. It works fine since the "callback" (incrementing the counter in the tracking slot supplier) only ever needs to be called if get
is actually invoked, and it always is anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there be an example of how a user who is implementing a slot supplier can react to reservation cancellation? Or maybe that one already is. My fear is they'll use naive futures and it'll hang shutdown. Which of course is their fault, I just want to make sure we clarify (maybe via test but for sure via docs) that if they want to handle cancellation of reserve slot, they may need provide their own future implementation that reacts to the cancel
call properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the existing docstring is reasonably clear on that front. I could add a warning about using CompletableFuture specifically, and if they do that they maybe want to be aware of how that can go wrong (which wasn't obvious to me at least so maybe that's a good idea) but I suppose otherwise it's like any other Future returning Java API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose otherwise it's like any other Future returning Java API
Yup but still a bit confusing because most don't manually implement cancel. I propose we alter the javadoc to change from:
These futures may be cancelled if the worker is shutting down or otherwise abandons the reservation. This can cause an {@link InterruptedException} to be thrown, in the thread running your implementation. You may want to catch it to perform any necessary cleanup, and then you should rethrow the exception.
To:
These futures may get
cancel
invoked on them if the worker is shutting down or otherwise abandons the reservation. Users need to make sure they handlecancel
on the resulting future. By default in Java, futures created by users or composed from other futures will not properly propagatecancel
so users need to make sure this is handled properly.
Can still add note about the exception if we do handle that special. Regardless, non-blocking.
temporal-sdk/src/main/java/io/temporal/worker/tuning/SlotSupplier.java
Outdated
Show resolved
Hide resolved
* This function is called before polling for new tasks. Your implementation should return a | ||
* Promise that is completed with a {@link SlotPermit} when one becomes available. | ||
* | ||
* <p>These futures may be cancelled if the worker is shutting down or otherwise abandons the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These futures may be cancelled if the worker is shutting down or otherwise abandons the
You are aware CompletableFuture
cancellation does not propagate upstream correct?
temporal-sdk/src/main/java/io/temporal/internal/worker/ActivityPollTask.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/internal/worker/TrackingSlotSupplier.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/internal/worker/TrackingSlotSupplier.java
Outdated
Show resolved
Hide resolved
b3ee0fa
to
6dbf73d
Compare
Looks like there can be some reserve/release mismatches. Investigating that. |
Actually I've just realized this ends up being a huge mess because anything like |
Why is that an issue? |
6dbf73d
to
a76b6e8
Compare
I suppose it's not really |
3627e75
to
cb46d4b
Compare
* This function is called before polling for new tasks. Your implementation should block until a | ||
* slot is available then return a permit to use that slot. | ||
* This function is called before polling for new tasks. Your implementation should return a | ||
* Promise that is completed with a {@link SlotPermit} when one becomes available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Promise that is completed with a {@link SlotPermit} when one becomes available. | |
* Future that is completed with a {@link SlotPermit} when one becomes available. |
4d6e47c
to
cdd2c1c
Compare
What was changed
Made the
reserveSlot
method onSlotSupplier
interface async.Note that, as it stands, this can actually slightly increase the number of threads used because the slot suppliers have been changed in a way that they will not block the caller (which is the whole point) but in order to do that, the resource based one at least needs a couple threads that it didn't before.
For the fixed size supplier, I think I've managed to come up with something that should always be non-blocking.
Why?
This is to support #1456 where the pollers themselves will be made async, and thus will be able to take advantage of async slot reservation as well.
Checklist
Closes
How was this tested:
Existing tests
Any docs updates needed?