Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Cali0707
Copy link
Member

@Cali0707 Cali0707 commented Jul 15, 2025

Fixes #8632 #8631

Proposed Changes

  • Migrate kncloudevents to OTel
  • Move the broker filter to OTel
  • Move the broker ingress to OTel
  • Move the InMemoryChannel Dispatcher to OTel

Release Note

The broker filter, ingress, and InMemoryChannel deployments now expose metrics and traces with OpenTelemetry instead of Zipkin/OpenCensus

@knative-prow
Copy link

knative-prow bot commented Jul 15, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@knative-prow knative-prow bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 15, 2025
@knative-prow knative-prow bot requested review from aslom and lionelvillard July 15, 2025 02:30
@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 15, 2025
@Cali0707
Copy link
Member Author

FYI @creydr after a few tries I'm feeling good about the approach I've taken here so far.

Lmk what you think and I'll finish this up quickly tomorrow!

Copy link
Member

@creydr creydr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Cali0707 for checking on this. For me the approach looks good so far. As I haven't done much with otel so far, maybe @evankanderson can check on the non-WIP PR later too (@evankanderson I only saw you reviewed some of the otel PRs in serving :D )

triggerInformer v1.TriggerInformer,
brokerInformer v1.BrokerInformer,
subscriptionInformer messaginginformers.SubscriptionInformer,
reporter StatsReporter,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need the statsreporter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to refactor the metrics bits, but when I do we can drop it

@knative-prow knative-prow bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 15, 2025
@Cali0707 Cali0707 changed the title [WIP] feat: migrate kncloudevents and broker filter to otel feat: migrate kncloudevents and broker filter to otel Jul 15, 2025
@Cali0707 Cali0707 requested a review from creydr July 15, 2025 15:50
@Cali0707 Cali0707 marked this pull request as ready for review July 15, 2025 15:50
@knative-prow knative-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 15, 2025
@knative-prow knative-prow bot requested a review from Leo6Leo July 15, 2025 15:50
@Cali0707
Copy link
Member Author

/cc @creydr @evankanderson

This should be ready now

@knative-prow knative-prow bot requested a review from evankanderson July 15, 2025 15:50
Signed-off-by: Calum Murray <[email protected]>
@Cali0707
Copy link
Member Author

/hold

Looks like some other builds are breaking...

@knative-prow knative-prow bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 15, 2025
@knative-prow knative-prow bot added the area/test-and-release Test infrastructure, tests or release label Jul 16, 2025
@Cali0707
Copy link
Member Author

/cc @creydr

I have all the unit tests passing locally now 😄

Unfortunately, this ended up making the PR quite large, as I needed to instrument the broker ingress and the IMC dispatcher as well...

Lmk what you think!

@codecov
Copy link

codecov bot commented Jul 16, 2025

Codecov Report

Attention: Patch coverage is 60.84559% with 213 lines in your changes missing coverage. Please review.

Project coverage is 52.17%. Comparing base (6e5cb3a) to head (3da86dc).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
cmd/broker/filter/main.go 0.00% 39 Missing ⚠️
cmd/broker/ingress/main.go 0.00% 37 Missing ⚠️
pkg/broker/filter/filter_handler.go 61.05% 34 Missing and 3 partials ⚠️
pkg/observability/newcontext.go 55.95% 37 Missing ⚠️
pkg/observability/otel/handler.go 0.00% 20 Missing ⚠️
pkg/channel/fanout/fanout_event_handler.go 76.92% 6 Missing and 3 partials ⚠️
pkg/channel/event_receiver.go 72.41% 8 Missing ⚠️
...econciler/inmemorychannel/dispatcher/controller.go 76.92% 4 Missing and 2 partials ⚠️
pkg/broker/ingress/ingress_handler.go 89.36% 4 Missing and 1 partial ⚠️
pkg/broker/filter/server_manager.go 0.00% 4 Missing ⚠️
... and 4 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8635      +/-   ##
==========================================
- Coverage   52.22%   52.17%   -0.05%     
==========================================
  Files         402      401       -1     
  Lines       25174    25199      +25     
==========================================
+ Hits        13147    13148       +1     
- Misses      11224    11261      +37     
+ Partials      803      790      -13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of smaller comments, but nothing to block doing the cleanup later (and I realize we're up against time boundaries on the release process).

/lgtm


pprof := k8sruntime.NewProfilingServer(sl.Named("pprof"))

mp, tp := otel.SetupObservabilityOrDie(ctx, "broker.filter", sl, pprof)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiousity, why don't we set the global metrics / trace provider here, rather than needing to pass them into filter.NewHandler and filter.NewServerManager?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This doesn't need to change for this review, just wondering.)

ctx = observability.WithMinimalEventLabels(ctx, event)
ctx = observability.WithBrokerLabels(ctx, broker)

ctx, span := h.tracer.Start(ctx, tracing.TriggerMessagingDestination(triggerRef.NamespacedName))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why both this and line 280?

if subscriberURI == nil {
// Record the event count.
writer.WriteHeader(http.StatusNotFound)
_ = h.reporter.ReportEventCount(reportArgs, http.StatusNotFound)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why stop recording this metric?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was discussed on the google doc, the idea is that it can be derived from the dispatch duration histogram (which implicitly has a total count)

Since we are instrumenting the handler with otel as well, there is the http.server.request.duration histogram as well, so rejected/invalid events can be found from the difference between the two


func NewServerManager(ctx context.Context, logger *zap.Logger, cmw configmap.Watcher, httpPort, httpsPort int, handler *Handler) (*eventingtls.ServerManager, error) {
func NewServerManager(
ctx context.Context,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have trailing whitespace here:

Suggested change
ctx context.Context,
ctx context.Context,

Comment on lines 56 to 70
otelHandler := otelhttp.NewHandler(handler, "broker.ingress",
otelhttp.WithMeterProvider(meterProvider),
otelhttp.WithTracerProvider(traceProvider),
otelhttp.WithFilter(func(r *http.Request) bool {
return !network.IsKubeletProbe(r)
}),
otelhttp.WithPropagators(tracing.DefaultTextMapPropagator()),
otelhttp.WithSpanNameFormatter(func(operation string, r *http.Request) string {
if r.URL.Path == "" {
return r.Method + " /"
}
return fmt.Sprintf("%s %s", r.Method, r.URL.Path)
}),

)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very similar to the code in filter. It would be great to have a common library function for setting this up, unless there are differences I'm missing.

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jul 16, 2025
@knative-prow knative-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 16, 2025
Signed-off-by: Calum Murray <[email protected]>
@Cali0707 Cali0707 changed the title feat: migrate kncloudevents and broker filter to otel feat: migrate kncloudevents, broker filter, broker ingress, inmemorychannel dispatcher to otel Jul 16, 2025
Co-authored-by: Christoph Stäbler <[email protected]>
Copy link
Member

@creydr creydr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @Cali0707 for working on this 💪

/lgtm

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jul 16, 2025
@knative-prow
Copy link

knative-prow bot commented Jul 16, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Cali0707, creydr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Cali0707
Copy link
Member Author

/test reconciler-tests

@Cali0707
Copy link
Member Author

/unhold

@knative-prow knative-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 16, 2025
@knative-prow knative-prow bot merged commit bedd48f into knative:main Jul 16, 2025
40 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test-and-release Test infrastructure, tests or release lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[OTel]: Update kncloudevents to use OTel for metrics/tracing

3 participants