Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Allow even-parentless workflow spans to always be created #817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 10, 2025

Conversation

cretz
Copy link
Member

@cretz cretz commented Apr 7, 2025

What was changed

Added always_create_workflow_spans bool to OTel TracingInterceptor constructor. Docs best describe the situation:

            always_create_workflow_spans: When false, the default, spans are
                only created in workflows when an overarching span from the
                client is present. In cases of starting a workflow elsewhere,
                e.g. CLI or schedules, a client-created span is not present and
                workflow spans will not be created. Setting this to true will
                create spans in workflows no matter what, but there is a risk of
                them being orphans since they may not have a parent span after
                replaying.

Checklist

  1. Closes [Feature Request] Make option for OTel workflow spans even if client span not present #794

@cretz cretz requested a review from a team as a code owner April 7, 2025 13:49
name=span_name,
# Always set span attributes as workflow ID and run ID
attributes=attributes,
time_ns=temporalio.workflow.time_ns(),
link_context=link_context_carrier,
exception=exception,
kind=kind,
parent_missing=opentelemetry.trace.get_current_span()
is opentelemetry.trace.INVALID_SPAN,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm not familiar enough with python style, should the 2nd line here be indented to indicate these 2 lines are actually 1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what the auto formatter did

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should place parens around expressions in this situation

id=f"workflow_{uuid.uuid4()}",
task_queue=worker.task_queue,
)
# Confirm the spans are not there
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Confirm the spans are not there
# Confirm the spans are there

Comment on lines +576 to +577
parent_missing=opentelemetry.trace.get_current_span()
is opentelemetry.trace.INVALID_SPAN,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parent_missing=opentelemetry.trace.get_current_span()
is opentelemetry.trace.INVALID_SPAN,
parent_missing=(
opentelemetry.trace.get_current_span()
is opentelemetry.trace.INVALID_SPAN
),

# If the parent is missing and user hasn't said to always create, do not
# create
if params.parent_missing and not self._always_create_workflow_spans:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line looks like a breaking change, am I getting that wrong?

Copy link
Member Author

@cretz cretz Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic was inside the workflow's form of _completed_span before as:

        # If there is no span on the context, we do not create a span
        if opentelemetry.trace.get_current_span() is opentelemetry.trace.INVALID_SPAN:
            return None

but now that I have to check a parameter from outside the sandbox, I moved the logic to the outside-of-sandbox part instead of the inside-of-sandbox part.

workflow spans will not be created. Setting this to true will
create spans in workflows no matter what, but there is a risk of
them being orphans since they may not have a parent span after
replaying.
Copy link
Contributor

@dandavison dandavison Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not swapped into this work.

  1. Why don't we default to always creating the parent-less spans? Wouldn't that be more useful to users than dropping them?
  2. The docstring here uses the term "orphan" but couldn't they equally be viewed as roots, originating in the workflow?
  3. [Just a question, not blocking this PR] Could it make sense to allow tracing to be enabled in the CLI (and maybe even Schedule starter one day) when starting workflows?

Copy link
Member Author

@cretz cretz Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we default to always creating the parent-less spans? Wouldn't that be more useful to users than dropping them?

Why we don't now - because it was chosen not to originally and we can't just change on people. Why we didn't originally - because if you create spans without a parent you have orphans. So if it was cached (i.e. never replayed), it'd just be under RunWorkflow which is created on non-replay start, but when it is replayed, everything after has no parent so it is on its own.

The docstring here uses the term "orphan" but couldn't they equally be viewed as roots, originating in the workflow?

It means spans like StartActivity may or may not have a parent, depending on whether the workflow is running somewhere separate than when it first created the RunWorkflow span. People do not expect spans from inside a workflow to be without a parent in my experience.

Could it make sense to allow tracing to be enabled in the CLI (and maybe even Schedule starter one day) when starting workflows?

Yes it can, though OTel usually expects people to programmatically configure tracers, not outside of code. But CLI could definitely accept everything it needs to build https://pkg.go.dev/go.temporal.io/sdk/contrib/opentelemetry#NewTracingInterceptor (basically it'd be whatever was required to build a Go tracer).

To clarify what's happening here: client-side start workflow creates StartWorkflow, then first non-replay start creates RunWorkflow and sets that on context (if there's the StartWorkflow parent), then execute activity creates StartActivity (implicitly parenting to RunWorkflow if it's in this instance, StartWorkflow otherwise). So StartWorkflow is the only stable span available. There has been talk of temporalio/features#394 to help the situation where a span was not created by the starter, but in the meantime default Python (unlike some other SDKs) chose not to potentially create orphans by default. This option allows orphans to happen. I hope that's clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yes that's very helpful.

Why don't we default to always creating the parent-less spans? Wouldn't that be more useful to users than dropping them?

Why we don't now - because it was chosen not to originally and we can't just change on people.

Would that really be a (bad) breaking change? Wouldn't it just mean some new traces show up in their observability platform that didn't before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that really be a (bad) breaking change?

Yes, I think it'd be a bad breaking change. I also think the default that exists is valuable even if we were ok with breaking changes. Orphaned spans not under a parent can cause those looking at traces for a workflow to not see a span.

Wouldn't it just mean some new traces show up in their observability platform that didn't before?

Yes, which can clutter a tracing platform. Today people can trust that they're not just going to have some StartActivity top-level span flood the top-level of their Jaeger list.

@cretz cretz merged commit 1296cd7 into temporalio:main Apr 10, 2025
14 checks passed
@cretz cretz deleted the otel-always-workflow-span branch April 10, 2025 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Make option for OTel workflow spans even if client span not present
3 participants