Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Blazor - rendering metrics and tracing #61609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

pavelsavara
Copy link
Member

@pavelsavara pavelsavara commented Apr 22, 2025

Blazor metrics

  • new meter Microsoft.AspNetCore.Components
    aspnetcore.components.navigation - Total number of route changes.
    aspnetcore.components.event_handler - Duration of processing browser event including business logic.

  • new meter Microsoft.AspNetCore.Components.Lifecycle
    aspnetcore.components.update_parameters - Duration of processing component parameters including business logic.
    aspnetcore.components.render_batch - Duration of rendering batch including browser round-trip.

  • meter Microsoft.AspNetCore.Components.Server.Circuits
    aspnetcore.components.circuit.active - Number of active circuits in memory.
    aspnetcore.components.circuit.connected - Number of circuits connected to client.
    aspnetcore.components.circuit.duration - Duration of circuit lifetime and their total count.

Blazor activity tracing

  • new activity source Microsoft.AspNetCore.Components
  • Microsoft.AspNetCore.Components.CircuitStart: Circuit {circuitId}
    • tags: aspnetcore.components.circuit.id
    • links: HTTP activity
  • Microsoft.AspNetCore.Components.RouteChange: Route {route} -> {componentType}
    • tags: aspnetcore.components.circuit.id, aspnetcore.components.type, aspnetcore.components.route
    • links: HTTP trace, circuit trace
  • Microsoft.AspNetCore.Components.HandleEvent: Event {attributeName} -> {componentType}.{methodName}
    • tags: aspnetcore.components.circuit.id, aspnetcore.components.type, aspnetcore.components.method, aspnetcore.components.attribute.name, error.type
    • links: HTTP trace, circuit trace, router trace

image

image

image

builder.Services.ConfigureOpenTelemetryMeterProvider(meterProvider =>
{
    meterProvider.AddMeter("Microsoft.AspNetCore.Components");
    meterProvider.AddMeter("Microsoft.AspNetCore.Components.Server.Circuits");
});
builder.Services.ConfigureOpenTelemetryTracerProvider(tracerProvider =>
{
    tracerProvider.AddSource("Microsoft.AspNetCore.Components");
    //tracerProvider.AddSource("Microsoft.AspNetCore.SignalR.Server");
});

Feedback

TODO - Metrics need to be documented at https://learn.microsoft.com/en-us/aspnet/core/log-mon/metrics/built-in

Out of scope

  • WASM

Contributes to #53613
Contributes to #29846
Feedback for #61516

@pavelsavara pavelsavara force-pushed the blazor_metrics_feedback branch from 328a584 to cebb68e Compare April 23, 2025 18:01
# Conflicts:
#	src/Components/Components/src/PublicAPI.Unshipped.txt
@pavelsavara pavelsavara changed the title Blazor - rendering metrics - feedback Blazor - rendering metrics and tracing Apr 24, 2025
@JamesNK
Copy link
Member

JamesNK commented Apr 24, 2025

You're adding a lot of metrics here. I think you should do some performance testing. There is performance overhead of metrics - they require some synronization when incrementing counters and recording values.

Having many low level metrics could cause performance issues.

@pavelsavara
Copy link
Member Author

pavelsavara commented Apr 25, 2025

Having many low level metrics could cause performance issues.

I removed few and kept only the most useful ones.
The only aspnetcore.components.parameters.duration is per Blazor component.
It's async and executes customer's business logic.
If they have thousands of them they are in trouble anyway. And this will help them to figure it out.
The rest of them are per request, which should be OK.

I have 2 remaining issues

  • on metrics/duration histogram I'm unable to see any time bigger than 5 in dashboard. I assume it is the 0.005s bucket. Even if I have added 1s delay and validated stopwatch said 1.1s TotalSeconds. I think there is some problem with aggregation or display ? Or maybe all the other fast examples are drowning this one ?
  • on tracing/activities I would like to link HTTP activity from the place where SignalR is creating the Blazor circuit. So I capture it's Activity.Current.Context and use it later to AddLink() on my activity. In some cases it leads to the HTTP activity, but in may cases the HTTP activity is not in the dashboard at all. I'm thinking it could be sampling. I would like to skip the link if I know that the HTTP activity was not selected for sampling. But HTTP activity always (on dev machine without pressure) have .Recorded true and sometimes is missing on the dashboard anyway.

I would appreciate hints, many thanks! @noahfalk @JamesNK

@BrennanConroy
Copy link
Member

  • I would like to link HTTP activity from the place where SignalR is creating the Blazor circuit. So I capture it's Activity.Current.Context and use it later to AddLink() on my activity.

I don't know how Blazor circuits are created, but if it's from a Hub method then Activity.Current won't be the HTTP activity. We hop off the HTTP activity on purpose in SignalR:

// Hub invocation gets its parent from a remote source. Clear any current activity and restore it later.
var previousActivity = Activity.Current;
if (previousActivity != null)
{
Activity.Current = null;
}

  • in may cases the HTTP activity is not in the dashboard at all

Is that because the HTTP request is still running? I don't think activites show up in the dashboard until they're stopped, and if you're using SignalR you're likely using a websocket request which is long running.

@pavelsavara
Copy link
Member Author

I don't know how Blazor circuits are created, but if it's from a Hub method then Activity.Current won't be the HTTP activity. We hop off the HTTP activity on purpose in SignalR:

I'm capturing Activity.Current.Context in ComponentHub constructor, which happens before any Activity changes in the SignalR layer, and so I'm able to capture HTTP Activity. Capture works, I test the Name and I get the TraceId just fine.

Is that because the HTTP request is still running? I don't think activites show up in the dashboard until they're stopped, and if you're using SignalR you're likely using a websocket request which is long running.

This is it, thank you @BrennanConroy !

@pavelsavara
Copy link
Member Author

pavelsavara commented Apr 28, 2025

It's also topic to discuss for long running activities on Blazor.

  • current circuit - this is object in memory, with SignalR connection. Could be hours or even days long.
  • current route - this is more logical than physical. But it makes sense to link click event activities to the current route

We have 2 way how to deal with them I think

  • keep them open for the whole long duration
    • should we install them back to Activity.Current on next HTTP/SignalR request for child Activity sake ?
    • Should we use parent relationship instead of link ? I understood the feedback in 29846 to mean that long running parent makes bad UX in the dashboard.
  • close them soon - at the end of the current event/click/navigation
    • this makes them immediately visible in dashboard/OTEL and linkable
    • but we don't capture the true duration other sub-spans: disconnect, reconnect, close

Right now I have short+links implementation.

I guess developers use OTEL mostly in production and so even the long running traces would be recorded already.

But maybe developers also use it in inner dev loop ? In which case it would be great to have "trace preview" for thing that started but not stopped yet. To not get confused the same way as I did.

@pavelsavara pavelsavara requested a review from samsp-msft April 28, 2025 08:21
@pavelsavara pavelsavara marked this pull request as ready for review April 28, 2025 09:51
@pavelsavara
Copy link
Member Author

pavelsavara commented Apr 30, 2025

P0-P2 - This is useful angle, thanks!

P0 - Is my service healthy? (Most responses are successful/correct, latency is reasonable)

  • aspnetcore.components.event.exceptions - click to which component causes what type of exception. Stats, not trace, this doesn't tell you which sub component failed.
  • aspnetcore.components.event.duration - click to which component is slow ?

P1 - What is the load on my service? (How many requests/sec?)

  • aspnetcore.components.circuits.count - how many sessions I processed today ?
  • aspnetcore.components.event.duration - this is also counter, how many clicks I processed today ?
  • aspnetcore.components.circuits.duration - this is session duration, interesting for many other KPIs
  • aspnetcore.components.navigation.count - how many different Blazor pages people visited ? Route as tag.

P1 - How many resources is my service using?

  • aspnetcore.components.circuits.active_circuits - proxy to how much memory the sessions state holds ?
  • aspnetcore.components.circuits.connected_circuits - how many signalR connections are open ?
  • Delta between the two above is about WebSocket disconnect/re-connect, network quality, browser tab going to sleep etc.

P2 - Why is my service unhealthy?

  • aspnetcore.components.parameters.exceptions - which component failed ? Exception type tag/stats, no trace.
  • aspnetcore.components.rendering.batch.exceptions - which component failed ?
  • aspnetcore.components.parameters.duration - which component makes my app rendering slow ?
  • aspnetcore.components.rendering.batch.duration - which component makes events slow ? Also how many UI elements were in the diff ?

Note, I also mention aspnetcore.components.circuits metrics which already landed before, but we can improve them too if you want.

@pavelsavara
Copy link
Member Author

pavelsavara commented Apr 30, 2025

For example I know what a Blazor circuit is but I don't know what 'OnCircuit' is measuring. Is this a span that measures the entire duration of 1 blazor circuit?

This goes back to my questions about long running activities.
That activity is relatively short lived at the moment, compared to whole life-time of the circuit. We can change that with consequences for inner dev loop.

We can definitely improve naming.

OnCircuit - is representing logical circuit (duration), but at the moment, we stop it earlier.
OnRoute - is representing logical route/page in the app. Logicaly it should be active until you navigate elsewhere, but right now we stop it early. It links to circuit (when Blazor interactive).
OnEvent - in Blazor interactive, something was clicked. It typically happens inside of specific route and circuit which are linked.

Right now, the short circuit and route activities mostly serve as something that click event activities could link to. For the context.

@pavelsavara
Copy link
Member Author

pavelsavara commented Apr 30, 2025

aspnetcore.components.rendering.batch.duration - Duration of rendering batch.

I don't think we currently define in our public Blazor docs what a render batch is.

Maybe we just need to rename it? Anyway, this is more on the troubleshooting side of misbehaving component. Producing long diffs/batches leads to network traffic, latency and slow rendering.

As I suggested above, we could have separate namespace for it with separate opt-in.
Or we could drop it and circle back in the future.

aspnetcore.components.rendering.batch.exception - Total number of exceptions during batch rendering.
should we be counting the number of exceptions per some other period?

We also count exceptions per click/event. But I need to see if the exceptions from batch related problems would appear there.

aspnetcore.components.event.duration - Duration of processing browser event asynchronously.

I assume this is an average duration of all browser event handlers across the entire app regardless of render mode.

At the moment this works only for SignalR interactive. I think we could also make it work for form-submit.
Making it work for WASM means that we need to fix OTEL for WASM and implement some publishing for it.
It's out of scope for Net10.

what does the "asynchronously" imply?

I already renamed this and dropped "async". It means including your DB request or whatever async business logic.

Is this the duration of the OnParametersSet component lifecycle event?

Yes, or OnInitialized.

aspnetcore.components.navigation.count - Total number of route changes.

Is this a total count of all page navigations across the entire app regardless of render mode?

Except WASM.

What would that be used for?

It has the route pattern as tag/dimension that you can use as filter. It's more business oriented KPI. Which of my pages are hot ?

@pavelsavara
Copy link
Member Author

pavelsavara commented Apr 30, 2025

Making circuit/route activity/trace long lived has troubles with re-installing them into Activity.Current.
And then troubles display of hours long one with sub-spans in the UI.
So, I think it's little benefit.

If we keep them short, maybe they should be literally 0ms long. Just an context anchor, grouping other traces.

Re Activity names: they are not very visible in the Aspire UI, and DisplayName prevails.

Circuit Activity/trace is created in internal ComponentHub.StartCircuit and so maybe StartCircuit is good name. There is also public API CircuitHandler.OnCircuitOpenedAsync, but I dislike Async postfix and also OnCircuitOpenedAsync happens short while later, being triggered by another message from JS. But OnCircuitOpened would be my second choice.

Route Activity/trace is created in Router.SetParametersAsync -> Router.Refresh. Not great names for the activity.
Maybe we can change it to OnRouteChanged

Regarding click/event. We already have concept of event. The activity should be active thru whole duration of DispatchEventAsync and would be parent for any business logic and distributed HTTP client traces.

I would like event Activity also trigger for form submit, interop call from JS, and enhanced navigation.

Maybe we can change it to Event or BrowserEvent ?

@pavelsavara
Copy link
Member Author

Sam: Is circuitId useful for DisplayName of Circuit activity? What else we could display there instead. Could we have IP address ?

@samsp-msft
Copy link
Member

I met with @pavelsavara and I now understand what everything is for - it looks great. I think customers will be really happy with this. My concerns about granularity in terms of sending too much data have been mitigated.

@samsp-msft
Copy link
Member

Sam: Is circuitId useful for DisplayName of Circuit activity? What else we could display there instead. Could we have IP address ?

My mind set - if a customer is having an issue with your site, and calls IT to complain - how do they match the traces to the user? Is there somewhere that ID gets displayed to them?

If we don't stick in that data (which is a good from a being secure by default position) is there a hook-point we can document before the activity is finished that the customer can access the activity and add their extra tags to it? As the circuit is created as an instantaneous activity, it may not be on the stack for many calls to user code where they can access it.

@pavelsavara
Copy link
Member Author

pavelsavara commented Apr 30, 2025

I think this problem of mapping traces to users on hotline is not specific to Blazor.

I'm linking the HTTP Activity/trace that created the circuit, so if there is more tags on the HTTP trace, they could use that (after the session and the long running HTTP/WS connection is finished).

Sometimes there is authenticated user in the HTTP context, but I think it's probably not good for use to expose any PII.

If the app developer wanted to add more tags, I believe that they could capture Activity.Current and add something.

Maybe the app developer could also add "show my circuitID" into application settings menu and the IT call center could ask them to click it. The circuitID is random number and it's not a secret from security perspective.

@samsp-msft
Copy link
Member

If the app developer wanted to add more tags, I believe that they could capture Activity.Current and add something.

The activity is created in CircuitFactory at line 127 and is stopped at line 173. Is there any user code executed between these points where the activity would be active, so they can retrieve it and add tags. Once its stopped, AFAIK its too late to add anything to it.

Copy link
Member

@samsp-msft samsp-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some naming suggestions.

For sessions, I wonder if when log messages are fired, is the activity context going to be the session activity, and if not is there a way to force it to be? @noahfalk this is an issue with essentially zero length spans - you might want to force log messages to be parented to it but something else is the activity at that point?

{
{ "component.type", componentType ?? "unknown" },
{ "component.method", methodName ?? "unknown" },
{ "attribute.name", attributeName ?? "unknown" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why this is based on the attribute, but its really also the event name. would that make more sense as "event"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refers to onclick in the example below. Attribute makes it easier for me personally but I don't have strong preference. It's attributeName thru Blazor internals.

@danroth27 do you prefer "event" or something else ?

<button class="btn btn-primary" @onclick="Buy" @focus="OnFocus">Buy</button>


var tags = new TagList
{
{ "component.type", componentType ?? "unknown" },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really have code reference, as in file name and line number.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought is that the thing being refered to was the Blazor component abstraction in which case component.type and component.method made sense to me. The fact that code.class.name isn't currently defined also nudges me in that direction. If we did want to use code.* attributes instead this feels like another spot where OTel semconv feedback could be helpful.

@noahfalk
Copy link
Member

noahfalk commented May 1, 2025

This goes back to my #61609 (comment).
That activity is relatively short lived at the moment, compared to whole life-time of the circuit. We can change that with consequences for inner dev loop.

OnCircuit - is representing logical circuit (duration), but at the moment, we stop it earlier.
OnRoute - is representing logical route/page in the app. Logicaly it should be active until you navigate elsewhere, but right now we stop it early. It links to circuit (when Blazor interactive).
OnEvent - in Blazor interactive, something was clicked. It typically happens inside of specific route and circuit which are linked.

Right now, the short circuit and route activities mostly serve as something that click event activities could link to. For the context.

Thanks, that helps a bunch on understanding where things are at now. A couple thoughts:

  • In the question how to handle long-lived activities I think there is a 3rd option which is not to produce the activity at all. Instead of an event linking to the activity the only reference would be the circuit.id tag or route.id tag.
  • Using links and ~0-length activities as a grouping/linking mechanism feels like a novel use of distributed tracing telemetry that many tools may not visualize in a helpful way. Have you tried looking at any of the telemetry in Azure Monitor's portal? As I recall Azure Monitor's viewer is quite aggressive at following links transitively in both the backwards and forwards directions. I'm suspicious that if you try to visualize any event AzureMonitor is going to wind up showing you every route and event in the entire circuit superimposed on a single very cluttered timeline.
  • If we keep Circuit and Route I agree a name like CircuitStart or RouteStart/RouteChanged might be more readily understood. For Event perhaps we drop the 'On' and just name it 'Event' or 'HandleEvent'?

For sessions, I wonder if when log messages are fired, is the activity context going to be the session activity, and if not is there a way to force it to be? @noahfalk this is an issue with essentially zero length spans - you might want to force log messages to be parented to it but something else is the activity at that point?

Activity.Current can be set at any time, but it can't be set to an Activity that is already stopped. If you did want to produce the set of all log messages that occurred within a given circuit I think the best options would be one of:

  • In the telemetry viewer, query for all activities where circuit.id tag = X, then merge all logs linked to those activities. (This requires tools to support a table JOIN across logs and distributed traces which probably isn't universally supported)
  • Add circuit.id as another field in the log messages and query for it directly. circuit.id could either be an explicit field in select log messages, or it could be added via enrichment. Once the field is there querying for all logs where circuit.id=X would return the desired logs.

@pavelsavara
Copy link
Member Author

All, I created open-telemetry/semantic-conventions#2235 in order to validate and document the new metrics in OTEL community.

As a result I updated all tags in this PR with prefix aspnetcore.components

@pavelsavara
Copy link
Member Author

pavelsavara commented May 7, 2025

@lmolkova's feedback on the other PR is

  • merge duration and exception metrics into single histogram metric. Our HTTP _requestDuration already does that too. I will do it.

"event" means something specific in otel. But also something in the browser and in Blazor. Suggestion to rename aspnetcore.components.event to aspnetcore.components.operation or aspnetcore.components.call.

  • Perhaps we should think about it as event handler ? aspnetcore.components.event_handler

  • circuits.active_circuits -> circuit.active

  • rendering.batch -> render_batch

  • update_parameters -> parameter_set or parameter_binding

  • rename aspnetcore.components metrics prefix to -> aspnetcore.component because of otel pluralization rules.
    This would create mismatch with C# existing namespace. I prefer C# namespace because that is a existing namespace already. I will keep it as it is, unless somebody here convinces me otherwise.

@pavelsavara pavelsavara mentioned this pull request May 7, 2025
2 tasks
- rename aspnetcore.components.circuit to aspnetcore.components.circuits
- rename circuits.active_circuits to circuit.active and connected_circuits to circuit.connected
- rename aspnetcore.components.event.duration to aspnetcore.components.event_handler and include exceptions
- rename aspnetcore.components.update_parameters.duration to aspnetcore.components.update_parameters and include exceptions
- rename aspnetcore.components.rendering.batch.duration to aspnetcore.components.render_diff and include exceptions
@pavelsavara
Copy link
Member Author

@lmolkova I pushed changes based on your feedback.

The remaining issues are

  • update_parameters instrument name - I don't have strong opinion
  • aspnetcore.components namespace vs aspnetcore.component namespace.
    Your proposal has one benefit, which is that aspnetcore.component.type tag sounds much better than aspnetcore.components.type for a specific class name. The downside is that we would not match C# namespace.

I will discuss it with @javiercn and come to some conclusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-blazor Includes: Blazor, Razor Components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants