-
Couldn't load subscription status.
- Fork 7
feat!: remove TraceId and telemetry thread-local state
#67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
11a89b2 to
739c5f9
Compare
Change SummaryThis PR removes the The changes include breaking API changes to the Issues Found🟡 Style Guide - Inconsistent Formatting in ProcessId::DisplayThe Location: veecle-telemetry/src/id.rs:48 Current code: write!(f, "{:016x}", self.0)Issue: A Expected: The format string should use 32 hex digits to fully represent the 128-bit value: write!(f, "{:032x}", self.0)This is consistent with the serialization implementation which correctly uses all 16 bytes (32 hex chars) of the |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
739c5f9 to
4341f4e
Compare
| self.borrow_mut().take() | ||
| pub(crate) fn take( | ||
| &self, | ||
| #[cfg(feature = "veecle-telemetry")] span_context: Option<SpanContext>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very unfortunate, but unavoidable I suppose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I would get rid of the feature and just always have this code, since it's supposed to be zero-cost without veecle-telemetry/enable. But that's separate from these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(DEV-1066)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking about passing down the SpanContext, I'd really prefer a solution where this isn't necessary (thread locals and alternatives on no_std) because it makes some things very awkward, but I guess this is what we decided on doing
veecle-telemetry/tests/lib.rs
Outdated
| root [] | ||
| + attr: runtime_attr="added_later" | ||
| + link: trace=123456789abcdef0, span=fedcba9876543210 | ||
| + link: span=fedcba9876543210 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be printing the process id now?
8175512 to
af394c0
Compare
57070a8 to
8b1b3b3
Compare
88f862d to
969a3f1
Compare
| /// An identifier for a trace, which groups a set of related spans together. | ||
| #[derive(Copy, Clone, Debug, Eq, PartialEq, Ord, PartialOrd, Hash)] | ||
| pub struct TraceId(pub u128); | ||
| /// A globally-unique id identifying a process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big fan of the name because it only really fits on std?
ExecutionId kinda was the general name for this "an execution of code happening somewhere"
But I guess if it's just this name it shouldn't block the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was supposed to be on ProcessId
Do we even need it or can we just add ThreadId to the InstanceMessage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A span can be entered+exited from many threads on the same process (e.g. if it's part of a future that gets stolen within a multi-threaded tokio executor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, I'd just like to not have both ExecutionId and ProcessId, so I'm thinking we can probably remove ProcessId again and put ThreadId next to ExecutionId where necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's much easier to think about the context-tracking in the UI when you have a name for the "thread of execution", instead of having a HashMap<(ProcessId, ThreadId), _>, and I felt ExecutionId fit that better than it did the process.
One other option would be to not name the integer for the threads, and have
struct ThreadId {
process: ProcessId,
thread: u64,
}to only need to come up with two names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess thinking about it ThreadId also doesn't make sense on no_std targets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess nesting ExecutionId inside whatever we want to call ThreadId would always make it globally unique as well which might be nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to pin it down, the two things we're identifying are:
- the global memory space
- the call stack
These don't really give great names, so personally I think using the standard OS names for these works fine. They're probably instantly understandable to most devs and are easy to map to other systems (freertos process=reset thread=task)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that having this separate together is needed?
Why don't we split into nibbles? high nibble process id and lower one thread id?
So it is still a newtype but can be carried anywhere. I am not sure it can be more than 2^32 thread id can happen for a single process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been moved to #90
969a3f1 to
85148d1
Compare
85148d1 to
52aadd1
Compare
The collector is now initialized with a process id and automatically combines this with a per-thread id from the OSAL to create a globally unique id for the current thread/task. Closes: DEV-913 Signed-off-by: Wim Looman <[email protected]>
…ntification Remove the `TraceId` concept and use `ProcessId` directly to identify the context within which `SpanId`s are unique. Previously, `TraceId` was generated from `ProcessId` using a counter, but this added complexity without clear benefit and requires thread local state. `SpanId`s are now unique within a process, and the combination of `ProcessId` + `SpanId` provides global uniqueness through `SpanContext`. Signed-off-by: Wim Looman <[email protected]>
Remove the thread-local `CURRENT_SPAN` tracking and `SpanContext::current` method. Span context is now determined by execution id (process + thread) rather than explicit parent-child span relationships. This simplifies the telemetry model by eliminating implicit state within the process. Span messages are correlated through their execution id, which provides sufficient context for external tools to reconstruct the span relationships. Closes: DEV-911 Signed-off-by: Wim Looman <[email protected]>
Add custom `Display`, `FromStr`, `Serialize`, and `Deserialize` implementations for `ProcessId`, `ThreadId`, `ExecutionId`, and `SpanContext` types. These provide a consistent hex-encoded string format with colon separators for composite IDs (`process:thread` for `ExecutionId`, `process:span` for `SpanContext`). This makes telemetry ids more readable and provides a unified format for logging and serialization. Signed-off-by: Wim Looman <[email protected]>
52aadd1 to
d2ab38d
Compare
See the individual commit descriptions for full context.
Overall this is removing the traces + root spans from the telemetry protocol, and moving the resolution of "implicit parents" to the consumer side rather than the producer. To allow the consumer to correctly track what is the implicit parent for events the execution id now needs to mix-in a thread-id. By having just this one id (which is queried via the OSAL) as part of the output data we avoid needing any other thread-local state within the producer so we can use it on systems that don't provide any.
This involves breaking changes to both the
veecle-telemetrycrate API and the JSON encoding.Closes: DEV-911, DEV-913