-
Notifications
You must be signed in to change notification settings - Fork 97
Allow even-parentless workflow spans to always be created #817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
name=span_name, | ||
# Always set span attributes as workflow ID and run ID | ||
attributes=attributes, | ||
time_ns=temporalio.workflow.time_ns(), | ||
link_context=link_context_carrier, | ||
exception=exception, | ||
kind=kind, | ||
parent_missing=opentelemetry.trace.get_current_span() | ||
is opentelemetry.trace.INVALID_SPAN, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'm not familiar enough with python style, should the 2nd line here be indented to indicate these 2 lines are actually 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what the auto formatter did
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should place parens around expressions in this situation
id=f"workflow_{uuid.uuid4()}", | ||
task_queue=worker.task_queue, | ||
) | ||
# Confirm the spans are not there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Confirm the spans are not there | |
# Confirm the spans are there |
parent_missing=opentelemetry.trace.get_current_span() | ||
is opentelemetry.trace.INVALID_SPAN, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parent_missing=opentelemetry.trace.get_current_span() | |
is opentelemetry.trace.INVALID_SPAN, | |
parent_missing=( | |
opentelemetry.trace.get_current_span() | |
is opentelemetry.trace.INVALID_SPAN | |
), |
# If the parent is missing and user hasn't said to always create, do not | ||
# create | ||
if params.parent_missing and not self._always_create_workflow_spans: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line looks like a breaking change, am I getting that wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic was inside the workflow's form of _completed_span
before as:
# If there is no span on the context, we do not create a span
if opentelemetry.trace.get_current_span() is opentelemetry.trace.INVALID_SPAN:
return None
but now that I have to check a parameter from outside the sandbox, I moved the logic to the outside-of-sandbox part instead of the inside-of-sandbox part.
workflow spans will not be created. Setting this to true will | ||
create spans in workflows no matter what, but there is a risk of | ||
them being orphans since they may not have a parent span after | ||
replaying. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not swapped into this work.
- Why don't we default to always creating the parent-less spans? Wouldn't that be more useful to users than dropping them?
- The docstring here uses the term "orphan" but couldn't they equally be viewed as roots, originating in the workflow?
- [Just a question, not blocking this PR] Could it make sense to allow tracing to be enabled in the CLI (and maybe even Schedule starter one day) when starting workflows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we default to always creating the parent-less spans? Wouldn't that be more useful to users than dropping them?
Why we don't now - because it was chosen not to originally and we can't just change on people. Why we didn't originally - because if you create spans without a parent you have orphans. So if it was cached (i.e. never replayed), it'd just be under RunWorkflow
which is created on non-replay start, but when it is replayed, everything after has no parent so it is on its own.
The docstring here uses the term "orphan" but couldn't they equally be viewed as roots, originating in the workflow?
It means spans like StartActivity
may or may not have a parent, depending on whether the workflow is running somewhere separate than when it first created the RunWorkflow
span. People do not expect spans from inside a workflow to be without a parent in my experience.
Could it make sense to allow tracing to be enabled in the CLI (and maybe even Schedule starter one day) when starting workflows?
Yes it can, though OTel usually expects people to programmatically configure tracers, not outside of code. But CLI could definitely accept everything it needs to build https://pkg.go.dev/go.temporal.io/sdk/contrib/opentelemetry#NewTracingInterceptor (basically it'd be whatever was required to build a Go tracer).
To clarify what's happening here: client-side start workflow creates StartWorkflow
, then first non-replay start creates RunWorkflow
and sets that on context (if there's the StartWorkflow
parent), then execute activity creates StartActivity
(implicitly parenting to RunWorkflow
if it's in this instance, StartWorkflow
otherwise). So StartWorkflow
is the only stable span available. There has been talk of temporalio/features#394 to help the situation where a span was not created by the starter, but in the meantime default Python (unlike some other SDKs) chose not to potentially create orphans by default. This option allows orphans to happen. I hope that's clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, yes that's very helpful.
Why don't we default to always creating the parent-less spans? Wouldn't that be more useful to users than dropping them?
Why we don't now - because it was chosen not to originally and we can't just change on people.
Would that really be a (bad) breaking change? Wouldn't it just mean some new traces show up in their observability platform that didn't before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would that really be a (bad) breaking change?
Yes, I think it'd be a bad breaking change. I also think the default that exists is valuable even if we were ok with breaking changes. Orphaned spans not under a parent can cause those looking at traces for a workflow to not see a span.
Wouldn't it just mean some new traces show up in their observability platform that didn't before?
Yes, which can clutter a tracing platform. Today people can trust that they're not just going to have some StartActivity
top-level span flood the top-level of their Jaeger list.
What was changed
Added
always_create_workflow_spans
bool to OTelTracingInterceptor
constructor. Docs best describe the situation:Checklist