-
Notifications
You must be signed in to change notification settings - Fork 874
Add tracing for physical connection open #6091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
roji
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vonzshik this looks great - thanks, but let's maybe discuss whether the Activity should contain data source bootstrapping (and therefore type loading)?
| _connectionLogger = dataSourceConfig.LoggingConfiguration.ConnectionLogger; | ||
| _commandLogger = dataSourceConfig.LoggingConfiguration.CommandLogger; | ||
|
|
||
| // Make sure we do not flow AsyncLocals like Activity.Current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Though a part of me thinks we should maybe start the activity only after the data source has been set up - that really isn't part of physical connection open. What do you think?
| // At no point should we ever have an activity here | ||
| Debug.Assert(Activity.Current is null); | ||
| // Set current activity as the one from the command | ||
| // So child activities from physical open are bound to it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thinking here.
Though at some point we should have the conversation about whether we want to keep multiplexing, given that it's known to be non-scalable (e.g. and is no longer used in the TE benchmarks).
| LogMessages.OpeningPhysicalConnection(ConnectionLogger, Host, Port, Database, UserFacingConnectionString); | ||
| var startOpenTimestamp = Stopwatch.GetTimestamp(); | ||
|
|
||
| var activity = NpgsqlActivitySource.ConnectionOpen(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As written below, there may be a case for only starting the activity after the data source is created and bootstrapped. Otherwise, the first "physical connection" is longer because it contains e.g. type loading and stuff like that; at minimum that skews stuff like average time calculations for physical open.
If we really wanted to surface startup costs like type loading properly, we could have another activity for that - though that seems like a bit much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As written below, there may be a case for only starting the activity after the data source is created and bootstrapped. Otherwise, the first "physical connection" is longer because it contains e.g. type loading and stuff like that; at minimum that skews stuff like average time calculations for physical open.
I do not think anyone really calculates average time for any sort of traces (too much stuff to store), there are metrics specifically for stuff like this (for example, in Npgslq we have db.client.commands.duration). Beside, right now when we log physical open we pass the total time, including type loading.
If we really wanted to surface startup costs like type loading properly, we could have another activity for that - though that seems like a bit much.
This might actually be not a bad idea, since this will also allow us to trace NpgsqlConnection.ReloadTypes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, makes sense. If it's not much work/changes, I'd try to exclude type loading from the activity span you're introducing here (simply because it really shouldn't be part of it). We can consider introducing a new activity for it, though I don't think that's very important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, makes sense. If it's not much work/changes, I'd try to exclude type loading from the activity span you're introducing here (simply because it really shouldn't be part of it).
The main problem is that we do consider it a part of physical open. That is, if we get an exception during bootstrap or from DataSourceConnectionInitializer, we'll break the physical connection. In that case, should we pass that exception to activity? It would be really weird if we don't (since we'll throw the exception but the trace will be green, even though the connection is already closed). But then, that means that we can't close the activity until we're absolutely sure the connection is stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or to be more specific, we can specify Activity.Duration via Activity.SetEndTime, so while it's still be considered as active, the duration will remains as is, but IDK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think anyone really calculates average time for any sort of traces (too much stuff to store), there are metrics specifically for stuff like this (for example, in Npgslq we have db.client.commands.duration). Beside, right now when we log physical open we pass the total time, including type loading.
FWIW calculating aggregate metrics from tracing data is a pretty standard thing to do, and indeed we could also not include the type loading time when logging.
But good point about the exception. I still think that fundamentally, type loading really shouldn't be counted as part of physical connection open; physical connections get opened all the time, whereas type loading is a one-time startup query that we happen to need (unless it's disabled by the user). It seems to just weirdly skew things to look at it as part of connection open.
But I'm OK with setting this aside for now and moving on - we can always change this in the future if we want.
Closes #4136