Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

jprochazk
Copy link
Member

@jprochazk jprochazk commented Jun 7, 2025

What

We want some kind of "log to multiple sinks" API, which would alleviate some long-standing issues:

  • Saving data to a file in case the Viewer crashes.
  • Saving a larger-than-RAM recording to a file that's also being streamed to a Viewer, which will GC and drop large portions of it in an unrecoverable way.

API Additions

The entrypoint for this in the Python SDK would be tee on RecordingStream:

import rerun as rr
from rerun.utilities import build_color_grid

rec = rr.RecordingStream("rerun_example_tee")
rec.tee(rr.FileSink("data.rrd"), rr.GrpcSink())

grid = build_color_grid()
rec.log("points", rr.Points3D(positions=grid.positions, colors=grid.colors))

The above snippets are functional on this PR. You must start a Viewer (or gRPC server) to connect to before running it. It logs the data to both the server and a data.rrd file.

The exposed FileSink and GrpcSink are simple dataclasses that temporarily hold information for how to setup the sink:

  • FileSink stores a path
  • GrpcSink stores a url, along with a flush timeout

The tee call then converts these sinks into their Rust equivalents, wraps them in a MultiSink, and calls set_sink on that.

Calling set_sink ensures the usual "dump backlog into the new sink" behavior is preserved, so that the initial SetStoreInfo logged by rr.init is sent through to every sink correctly.

The sink that's being set is the new MultiSink, a wrapper over Vec<Box<dyn LogSink>> which clones all received messages to every stored sink.

Currently only FileSink and GrpcSink are available. A user may use the same kind of sink multiple times, but there must not be any duplicates. For example, two GrpcSink with the same url would count as duplicates.

Open questions

  • What other sinks should be exposed?
    • BufferedSink doesn't make that much sense. It's something we use to temporarily hold data in memory (specifically only a SetStoreInfo message) before the user sets up a real "endpoint" sink, like GRPC or a file.
    • MemorySink and CallbackSink are important, as they interact with Notebook/Gradio/JS integration. We should allow for example sending data to a Viewer embedded in a notebook, and saving the same data to a file at the same time.
  • Needs name bikeshedding, tee makes sense, but isn't very discoverable.
  • Can we avoid the extra overhead of cloning all messages to every sink?
    • The current LogSink trait makes this impossible, as it requires that every stored sink be passed an owned LogMsg.

@jprochazk jprochazk changed the title RFC: Multi-sink / tee API RFC: Multi-sink / tee Python API Jun 7, 2025
Copy link

github-actions bot commented Jun 7, 2025

Web viewer built successfully. If applicable, you should also test it:

  • I have tested the web viewer
Result Commit Link Manifest
af8ee6f https://rerun.io/viewer/pr/10158 +nightly +main

Note: This comment is updated whenever you push a commit.

@jprochazk jprochazk added enhancement New feature or request sdk-python Python logging API labels Jun 9, 2025
@jprochazk jprochazk added this to the 0.24.0 milestone Jun 9, 2025
Copy link
Member

@jleibs jleibs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever solution.

Not having sink-objects was the big issue we ran into before, and you kind of side-stepped that by not making, e.g. GrpcSink a first-class wrapper on top of the rust-side sinks. I don't remember why that was hard before, or if it was just work.

This could still serve as a stepping stone in that direction though.

@nikolausWest
Copy link
Member

In general I think this looks great but I think we should skip the rr.tee() api and stick to the api on the rr.RecordingStream object. Other than than let's land it!

Duplicate sinks are not allowed. For example, two [`GrpcSink`]s that
use the same `url`.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example code might be useful here

@nikolausWest
Copy link
Member

This feature will also require a how-to guide and perhaps a look over our docs in general to make sure it's discoverable. That means it should be referenced to in the right places where one might wonder about it (e.g. in the rr.init docs) fyi @pweids

@Wumpf
Copy link
Member

Wumpf commented Jun 13, 2025

@pweids / @nikolausWest / @jprochazk are we still taking this on for 0.24 or do we think it's too much too soon?
(I know we'd all love to have it, but seems like yet-another-feature?)

@jprochazk
Copy link
Member Author

It's not far from completion, I think we should land it for 0.24

@jprochazk jprochazk changed the title RFC: Multi-sink / tee Python API Multi-sink / tee Python API Jun 18, 2025
@jprochazk jprochazk marked this pull request as ready for review June 18, 2025 11:56
@abey79 abey79 self-requested a review June 19, 2025 13:14
Copy link
Member

@abey79 abey79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice feature to have! My main concern is about future-proofing the naming, but I don't have the context if prior discussions that might have happened previously.

Comment on lines +112 to +118
// NOTE: this is only really used for BufferedSink,
// and by the time you `set_sink` you probably don't have
// a buffered sink anymore
#[inline]
fn drain_backlog(&self) -> Vec<LogMsg> {
Vec::new()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trusting you here—I don't have a deep enough understanding of that stuff to make a call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing which I didn't write here: which sink do you pick to call drain_backlog on? It's not clear to me at all, but luckily we probably don't need to worry about that, since you can't insert a BufferedSink into a MultiSink as a user

get_recording_id = get_recording_id
is_enabled = is_enabled

def tee(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has there been discussion about this choice of name?

I suggest using set_sinks() instead, because it's more explicit, and it is also more future-proof (ie. I would like to drop all these weird save(), etc. APIs in favour of exposing all sinks and being explicit with plugging a stream onto them).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that tee probably isn't the best name, but I like it more than set_sinks. Another followup to the list, properly document this everywhere we talk about operating modes.

I'm going to start a bikeshedding thread in Slack, since we have some time until 0.24 to change it, but leave it as is for now and merge

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name tee suggests two sinks, which seems arbitrary to me. Wouldn't it be cleaner to have 2 (or more) calls to something like add_sink?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided on .attach_sinks in the slack thread. I'll merge this as-is (with review other comments addressed) and do that in a follow-up PR

Copy link
Member Author

@jprochazk jprochazk Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be cleaner to have 2 (or more) calls to something like add_sink?

The way this works under the hood would make any kind of "add sink" operation a bit more complicated. The presence of that initial SetStoreInfo message means we must set all the sinks "at once"

@jprochazk jprochazk merged commit 3b75b7a into main Jun 20, 2025
41 checks passed
@jprochazk jprochazk deleted the jan/tee branch June 20, 2025 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request include in changelog sdk-python Python logging API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Design tee (multi-sink) API for Python
6 participants