fix(core): Sanitize data URLs in http.client spans#18896
Conversation
size-limit report 📦
|
node-overhead report 🧳Note: This is a synthetic benchmark with a minimal express app and does not necessarily reflect the real-world performance impact in an application.
|
|
Thanks a ton Lukas! I'd leave the first n chars of the data stream in there, knowing what type of data it is (by magic bytes, eg |
| [SEMANTIC_ATTRIBUTE_SENTRY_ORIGIN]: 'auto.http.otel.node_fetch', | ||
| 'http.url': sanitizedUrl, | ||
| [SEMANTIC_ATTRIBUTE_URL_FULL]: sanitizedUrl, | ||
| [SEMANTIC_ATTRIBUTE_SENTRY_CUSTOM_SPAN_NAME]: `${request.method || 'GET'} ${sanitizedUrl}`, |
There was a problem hiding this comment.
About the additional comment from @brunohaid: Maybe we could add the content of the first few bytes as an attribute here 🤔
There was a problem hiding this comment.
I ended up adding the first 10 bytes to the span description and data. I went for adding this directly to the URL attributes/description because there isn't really a fitting attribute defined in SemConv for data url content. So I just stuck with the "url.full" one.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Our
http.clientspan instrumentations currently treat data URLs (blobs or base64 encoded data) like regular raw URLs. While this is in general fine, the problem is that this leads to incredibly long span names and attribute values, especially because the URL is sent in up to three different attributes per span. This makes Relay reject the the sent events due to exceeding size limits.Therefore, I decided to extract the already existing stack trace URL sanitization logic for data URLs and apply it to
http.clientspans and attributes. This will lead to "lost" information in the sense of not having data URLs but I think for now we can live with that. The replacement in the future could be span attachments (if we decide this is valuable or make it opt-in).Note: This is most likely only a concern in client-side SDKs but since our fetch instrumentation is used across the board, I decided to also add this to our server-side instrumentation. I'd rather have this taken care of everywhere.
closes #17345