-
Notifications
You must be signed in to change notification settings - Fork 97
Allow even-parentless workflow spans to always be created #817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -74,12 +74,22 @@ class should return the workflow interceptor subclass from | |||||||||||||
def __init__( | ||||||||||||||
self, | ||||||||||||||
tracer: Optional[opentelemetry.trace.Tracer] = None, | ||||||||||||||
*, | ||||||||||||||
always_create_workflow_spans: bool = False, | ||||||||||||||
) -> None: | ||||||||||||||
"""Initialize a OpenTelemetry tracing interceptor. | ||||||||||||||
|
||||||||||||||
Args: | ||||||||||||||
tracer: The tracer to use. Defaults to | ||||||||||||||
:py:func:`opentelemetry.trace.get_tracer`. | ||||||||||||||
always_create_workflow_spans: When false, the default, spans are | ||||||||||||||
only created in workflows when an overarching span from the | ||||||||||||||
client is present. In cases of starting a workflow elsewhere, | ||||||||||||||
e.g. CLI or schedules, a client-created span is not present and | ||||||||||||||
workflow spans will not be created. Setting this to true will | ||||||||||||||
create spans in workflows no matter what, but there is a risk of | ||||||||||||||
them being orphans since they may not have a parent span after | ||||||||||||||
replaying. | ||||||||||||||
""" | ||||||||||||||
self.tracer = tracer or opentelemetry.trace.get_tracer(__name__) | ||||||||||||||
# To customize any of this, users must subclass. We intentionally don't | ||||||||||||||
|
@@ -90,6 +100,7 @@ def __init__( | |||||||||||||
self.text_map_propagator: opentelemetry.propagators.textmap.TextMapPropagator = default_text_map_propagator | ||||||||||||||
# TODO(cretz): Should I be using the configured one at the client and activity level? | ||||||||||||||
self.payload_converter = temporalio.converter.PayloadConverter.default | ||||||||||||||
self._always_create_workflow_spans = always_create_workflow_spans | ||||||||||||||
|
||||||||||||||
def intercept_client( | ||||||||||||||
self, next: temporalio.client.OutboundInterceptor | ||||||||||||||
|
@@ -165,10 +176,15 @@ def _start_as_current_span( | |||||||||||||
|
||||||||||||||
def _completed_workflow_span( | ||||||||||||||
self, params: _CompletedWorkflowSpanParams | ||||||||||||||
) -> _CarrierDict: | ||||||||||||||
) -> Optional[_CarrierDict]: | ||||||||||||||
# Carrier to context, start span, set span as current on context, | ||||||||||||||
# context back to carrier | ||||||||||||||
|
||||||||||||||
# If the parent is missing and user hasn't said to always create, do not | ||||||||||||||
# create | ||||||||||||||
if params.parent_missing and not self._always_create_workflow_spans: | ||||||||||||||
return None | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This line looks like a breaking change, am I getting that wrong? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This logic was inside the workflow's form of # If there is no span on the context, we do not create a span
if opentelemetry.trace.get_current_span() is opentelemetry.trace.INVALID_SPAN:
return None but now that I have to check a parameter from outside the sandbox, I moved the logic to the outside-of-sandbox part instead of the inside-of-sandbox part. |
||||||||||||||
|
||||||||||||||
# Extract the context | ||||||||||||||
context = self.text_map_propagator.extract(params.context) | ||||||||||||||
# Create link if there is a span present | ||||||||||||||
|
@@ -286,7 +302,7 @@ class _InputWithHeaders(Protocol): | |||||||||||||
|
||||||||||||||
class _WorkflowExternFunctions(TypedDict): | ||||||||||||||
__temporal_opentelemetry_completed_span: Callable[ | ||||||||||||||
[_CompletedWorkflowSpanParams], _CarrierDict | ||||||||||||||
[_CompletedWorkflowSpanParams], Optional[_CarrierDict] | ||||||||||||||
] | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
|
@@ -299,6 +315,7 @@ class _CompletedWorkflowSpanParams: | |||||||||||||
link_context: Optional[_CarrierDict] | ||||||||||||||
exception: Optional[Exception] | ||||||||||||||
kind: opentelemetry.trace.SpanKind | ||||||||||||||
parent_missing: bool | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
_interceptor_context_key = opentelemetry.context.create_key( | ||||||||||||||
|
@@ -529,17 +546,13 @@ def _completed_span( | |||||||||||||
exception: Optional[Exception] = None, | ||||||||||||||
kind: opentelemetry.trace.SpanKind = opentelemetry.trace.SpanKind.INTERNAL, | ||||||||||||||
) -> None: | ||||||||||||||
# If there is no span on the context, we do not create a span | ||||||||||||||
if opentelemetry.trace.get_current_span() is opentelemetry.trace.INVALID_SPAN: | ||||||||||||||
return None | ||||||||||||||
|
||||||||||||||
# If we are replaying and they don't want a span on replay, no span | ||||||||||||||
if temporalio.workflow.unsafe.is_replaying() and not new_span_even_on_replay: | ||||||||||||||
return None | ||||||||||||||
|
||||||||||||||
# Create the span. First serialize current context to carrier. | ||||||||||||||
context_carrier: _CarrierDict = {} | ||||||||||||||
self.text_map_propagator.inject(context_carrier) | ||||||||||||||
new_context_carrier: _CarrierDict = {} | ||||||||||||||
self.text_map_propagator.inject(new_context_carrier) | ||||||||||||||
# Invoke | ||||||||||||||
info = temporalio.workflow.info() | ||||||||||||||
attributes: Dict[str, opentelemetry.util.types.AttributeValue] = { | ||||||||||||||
|
@@ -548,25 +561,27 @@ def _completed_span( | |||||||||||||
} | ||||||||||||||
if additional_attributes: | ||||||||||||||
attributes.update(additional_attributes) | ||||||||||||||
context_carrier = self._extern_functions[ | ||||||||||||||
updated_context_carrier = self._extern_functions[ | ||||||||||||||
"__temporal_opentelemetry_completed_span" | ||||||||||||||
]( | ||||||||||||||
_CompletedWorkflowSpanParams( | ||||||||||||||
context=context_carrier, | ||||||||||||||
context=new_context_carrier, | ||||||||||||||
name=span_name, | ||||||||||||||
# Always set span attributes as workflow ID and run ID | ||||||||||||||
attributes=attributes, | ||||||||||||||
time_ns=temporalio.workflow.time_ns(), | ||||||||||||||
link_context=link_context_carrier, | ||||||||||||||
exception=exception, | ||||||||||||||
kind=kind, | ||||||||||||||
parent_missing=opentelemetry.trace.get_current_span() | ||||||||||||||
is opentelemetry.trace.INVALID_SPAN, | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: I'm not familiar enough with python style, should the 2nd line here be indented to indicate these 2 lines are actually 1? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is what the auto formatter did There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should place parens around expressions in this situation
Comment on lines
+576
to
+577
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
) | ||||||||||||||
) | ||||||||||||||
|
||||||||||||||
# Add to outbound if needed | ||||||||||||||
if add_to_outbound: | ||||||||||||||
if add_to_outbound and updated_context_carrier: | ||||||||||||||
add_to_outbound.headers = self._context_carrier_to_headers( | ||||||||||||||
context_carrier, add_to_outbound.headers | ||||||||||||||
updated_context_carrier, add_to_outbound.headers | ||||||||||||||
) | ||||||||||||||
|
||||||||||||||
def _set_on_context( | ||||||||||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -332,6 +332,56 @@ def dump_spans( | |||||
return ret | ||||||
|
||||||
|
||||||
@workflow.defn | ||||||
class SimpleWorkflow: | ||||||
@workflow.run | ||||||
async def run(self) -> str: | ||||||
return "done" | ||||||
|
||||||
|
||||||
async def test_opentelemetry_always_create_workflow_spans(client: Client): | ||||||
# Create a tracer that has an in-memory exporter | ||||||
exporter = InMemorySpanExporter() | ||||||
provider = TracerProvider() | ||||||
provider.add_span_processor(SimpleSpanProcessor(exporter)) | ||||||
tracer = get_tracer(__name__, tracer_provider=provider) | ||||||
|
||||||
# Create a worker with an interceptor without always create | ||||||
async with Worker( | ||||||
client, | ||||||
task_queue=f"task_queue_{uuid.uuid4()}", | ||||||
workflows=[SimpleWorkflow], | ||||||
interceptors=[TracingInterceptor(tracer)], | ||||||
) as worker: | ||||||
assert "done" == await client.execute_workflow( | ||||||
SimpleWorkflow.run, | ||||||
id=f"workflow_{uuid.uuid4()}", | ||||||
task_queue=worker.task_queue, | ||||||
) | ||||||
# Confirm the spans are not there | ||||||
spans = exporter.get_finished_spans() | ||||||
logging.debug("Spans:\n%s", "\n".join(dump_spans(spans, with_attributes=False))) | ||||||
assert len(spans) == 0 | ||||||
|
||||||
# Now create a worker with an interceptor with always create | ||||||
async with Worker( | ||||||
client, | ||||||
task_queue=f"task_queue_{uuid.uuid4()}", | ||||||
workflows=[SimpleWorkflow], | ||||||
interceptors=[TracingInterceptor(tracer, always_create_workflow_spans=True)], | ||||||
) as worker: | ||||||
assert "done" == await client.execute_workflow( | ||||||
SimpleWorkflow.run, | ||||||
id=f"workflow_{uuid.uuid4()}", | ||||||
task_queue=worker.task_queue, | ||||||
) | ||||||
# Confirm the spans are not there | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
spans = exporter.get_finished_spans() | ||||||
logging.debug("Spans:\n%s", "\n".join(dump_spans(spans, with_attributes=False))) | ||||||
assert len(spans) > 0 | ||||||
assert spans[0].name == "RunWorkflow:SimpleWorkflow" | ||||||
|
||||||
|
||||||
# TODO(cretz): Additional tests to write | ||||||
# * query without interceptor (no headers) | ||||||
# * workflow without interceptor (no headers) but query with interceptor (headers) | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not swapped into this work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we don't now - because it was chosen not to originally and we can't just change on people. Why we didn't originally - because if you create spans without a parent you have orphans. So if it was cached (i.e. never replayed), it'd just be under
RunWorkflow
which is created on non-replay start, but when it is replayed, everything after has no parent so it is on its own.It means spans like
StartActivity
may or may not have a parent, depending on whether the workflow is running somewhere separate than when it first created theRunWorkflow
span. People do not expect spans from inside a workflow to be without a parent in my experience.Yes it can, though OTel usually expects people to programmatically configure tracers, not outside of code. But CLI could definitely accept everything it needs to build https://pkg.go.dev/go.temporal.io/sdk/contrib/opentelemetry#NewTracingInterceptor (basically it'd be whatever was required to build a Go tracer).
To clarify what's happening here: client-side start workflow creates
StartWorkflow
, then first non-replay start createsRunWorkflow
and sets that on context (if there's theStartWorkflow
parent), then execute activity createsStartActivity
(implicitly parenting toRunWorkflow
if it's in this instance,StartWorkflow
otherwise). SoStartWorkflow
is the only stable span available. There has been talk of temporalio/features#394 to help the situation where a span was not created by the starter, but in the meantime default Python (unlike some other SDKs) chose not to potentially create orphans by default. This option allows orphans to happen. I hope that's clear.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, yes that's very helpful.
Would that really be a (bad) breaking change? Wouldn't it just mean some new traces show up in their observability platform that didn't before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it'd be a bad breaking change. I also think the default that exists is valuable even if we were ok with breaking changes. Orphaned spans not under a parent can cause those looking at traces for a workflow to not see a span.
Yes, which can clutter a tracing platform. Today people can trust that they're not just going to have some
StartActivity
top-level span flood the top-level of their Jaeger list.