-
Notifications
You must be signed in to change notification settings - Fork 539
Multi-sink / tee
Python API
#10158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-sink / tee
Python API
#10158
Conversation
Web viewer built successfully. If applicable, you should also test it:
Note: This comment is updated whenever you push a commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever solution.
Not having sink-objects was the big issue we ran into before, and you kind of side-stepped that by not making, e.g. GrpcSink a first-class wrapper on top of the rust-side sinks. I don't remember why that was hard before, or if it was just work.
This could still serve as a stepping stone in that direction though.
In general I think this looks great but I think we should skip the |
Duplicate sinks are not allowed. For example, two [`GrpcSink`]s that | ||
use the same `url`. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example code might be useful here
This feature will also require a how-to guide and perhaps a look over our docs in general to make sure it's discoverable. That means it should be referenced to in the right places where one might wonder about it (e.g. in the rr.init docs) fyi @pweids |
@pweids / @nikolausWest / @jprochazk are we still taking this on for 0.24 or do we think it's too much too soon? |
It's not far from completion, I think we should land it for 0.24 |
tee
Python APItee
Python API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice feature to have! My main concern is about future-proofing the naming, but I don't have the context if prior discussions that might have happened previously.
// NOTE: this is only really used for BufferedSink, | ||
// and by the time you `set_sink` you probably don't have | ||
// a buffered sink anymore | ||
#[inline] | ||
fn drain_backlog(&self) -> Vec<LogMsg> { | ||
Vec::new() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trusting you here—I don't have a deep enough understanding of that stuff to make a call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing which I didn't write here: which sink do you pick to call drain_backlog
on? It's not clear to me at all, but luckily we probably don't need to worry about that, since you can't insert a BufferedSink
into a MultiSink
as a user
get_recording_id = get_recording_id | ||
is_enabled = is_enabled | ||
|
||
def tee( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has there been discussion about this choice of name?
I suggest using set_sinks()
instead, because it's more explicit, and it is also more future-proof (ie. I would like to drop all these weird save()
, etc. APIs in favour of exposing all sinks and being explicit with plugging a stream onto them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that tee
probably isn't the best name, but I like it more than set_sinks
. Another followup to the list, properly document this everywhere we talk about operating modes.
I'm going to start a bikeshedding thread in Slack, since we have some time until 0.24 to change it, but leave it as is for now and merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name tee
suggests two sinks, which seems arbitrary to me. Wouldn't it be cleaner to have 2 (or more) calls to something like add_sink
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided on .attach_sinks
in the slack thread. I'll merge this as-is (with review other comments addressed) and do that in a follow-up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be cleaner to have 2 (or more) calls to something like
add_sink
?
The way this works under the hood would make any kind of "add sink" operation a bit more complicated. The presence of that initial SetStoreInfo
message means we must set all the sinks "at once"
tee
logging) #1998 (comment)tee
(multi-sink) API for Python #5470What
We want some kind of "log to multiple sinks" API, which would alleviate some long-standing issues:
API Additions
The entrypoint for this in the Python SDK would be
tee
onRecordingStream
:The above snippets are functional on this PR. You must start a Viewer (or gRPC server) to connect to before running it. It logs the data to both the server and a
data.rrd
file.The exposed
FileSink
andGrpcSink
are simple dataclasses that temporarily hold information for how to setup the sink:FileSink
stores apath
GrpcSink
stores aurl
, along with a flush timeoutThe
tee
call then converts these sinks into their Rust equivalents, wraps them in aMultiSink
, and callsset_sink
on that.Calling
set_sink
ensures the usual "dump backlog into the new sink" behavior is preserved, so that the initialSetStoreInfo
logged byrr.init
is sent through to every sink correctly.The sink that's being set is the new
MultiSink
, a wrapper overVec<Box<dyn LogSink>>
which clones all received messages to every stored sink.Currently only
FileSink
andGrpcSink
are available. A user may use the same kind of sink multiple times, but there must not be any duplicates. For example, twoGrpcSink
with the sameurl
would count as duplicates.Open questions
BufferedSink
doesn't make that much sense. It's something we use to temporarily hold data in memory (specifically only aSetStoreInfo
message) before the user sets up a real "endpoint" sink, like GRPC or a file.MemorySink
andCallbackSink
are important, as they interact with Notebook/Gradio/JS integration. We should allow for example sending data to a Viewer embedded in a notebook, and saving the same data to a file at the same time.tee
makes sense, but isn't very discoverable.LogSink
trait makes this impossible, as it requires that every stored sink be passed an ownedLogMsg
.