-
Notifications
You must be signed in to change notification settings - Fork 97
[Feature Request] Disabling Deadlock detection in data converters #823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That feature is in Go and Java SDK because they don't support a clear separation of payload codecs and payload converters, but other SDKs do. Deadlock detection is already disabled for the payload codec part of the data converter (because it runs outside the sandbox), just not the payload converter part. This is by intention. You should not do any heavy/IO/async work inside a payload converter. Rather, do it in a payload codec. Sometimes a custom payload codec can be combined with a custom payload converter so the converter can set some info in payload metadata that the codec needs. Does this help? If not, can you share your use case for why deadlock detection needs to be disabled for your payload converter instead? |
Got it. Thanks @cretz I've written a custom data converter which serializers generics and pydantic objects myself. This doesn't take more than 100-200ms at best, but for large payloads, I get this 2s deadlock error. I've given my worker pod 1 CPU core, so it's definitely not a scheduling issue. Trying to understand if something else is the issue here. |
Yes, there may be something else causing deadlock detection. Deadlock detection just means a workflow task is not processing within 2s. This could mean Python is not able to process code quick enough, or something like a spinning loop or accidental thread blocking in the workflow. It could just be a simple case of too many things on one core to process workflow tasks fast enough. |
Thanks @cretz I'm moving my data converter code into the codec to ensure I have enough time to serialize. One quick question before I close the issue: Can you point me to the logic that whitelists the codec from the sandbox? I tried digging in the sdk, but couldn't find anything specific. |
It's not whitelisted per se, it just runs before/after the sandbox. So we decode at sdk-python/temporalio/worker/_workflow.py Lines 251 to 254 in e360398
sdk-python/temporalio/worker/_workflow.py Lines 337 to 341 in e360398
So the logic for codecs is on the boundaries, not within the task processing. It is of course still subject to task timeouts (default 10s) so we recommend trying to make it as fast as possible (e.g. if using a KMS for encryption keys, try to cache keys locally where it makes sense instead of downloading every time). |
Looking to get a workflow.disabledeadlockdetection context manager similar to other languages for data converters:
https://github.com/temporalio/rules/blob/main/rules/TMPRL1101.md
The text was updated successfully, but these errors were encountered: