New dataset API for just retrieving `chunk_ids` associated with a query #10261

jleibs · 2025-06-17T00:43:14Z

This makes it easier to benchmark the query vs chunk fetching portions of the server APIs.

Example:

import rerun as rr
import pyarrow as pa
import time

client = rr.catalog.CatalogClient("rerun+http://localhost:51234")
dataset = client.get_dataset(name="droid:sample500")

wrist = dataset.dataframe_query_view(index="log_tick", contents="/camera/wrist/embedding /thumbnail/camera/wrist")

sampled_times = [0, 100, 200, 500, 1000, 2000]

# Profiling

start = time.perf_counter()
batches = (wrist.filter_index_values(sampled_times).fill_latest_at()).get_chunk_ids()
end = time.perf_counter()
table = pa.Table.from_batches(batches)
num_batches = len(table.to_batches())
print(f"Got {table.num_rows} chunks over {num_batches} batches in {(end - start) * 1000} ms")

github-actions · 2025-06-17T00:44:28Z

Web viewer built successfully. If applicable, you should also test it:

I have tested the web viewer

Result	Commit	Link	Manifest
✅	`3e63dd6`	https://rerun.io/viewer/pr/10261	`+nightly` `+main`

^{Note: This comment is updated whenever you push a commit.}

abey79 · 2025-06-17T07:22:36Z

rerun_py/src/catalog/connection_handle.rs

        Ok(stores)
    }
+
+    pub fn get_chunk_ids_for_dataframe_query(


I'd rather have any new such helper delegate to ConnectionClient, but let's at least have a todo here.

I'm leaving as is for consistency, but added the comment and we can tackle them all at the same time in a separate PR.

jleibs added 2 commits June 16, 2025 20:38

Add new client API to directly get the chunk_ids from a query

c00a6fb

fmt

dc52052

jleibs added include in changelog 📉 performance Optimization, memory use, etc sdk-python Python logging API labels Jun 17, 2025

Lint

bfba9d9

abey79 approved these changes Jun 17, 2025

View reviewed changes

Add ConnectionClient API migration comment

3e63dd6

jleibs merged commit 650333a into main Jun 17, 2025
40 checks passed

jleibs deleted the jleibs/get_chunk_ids branch June 17, 2025 13:02

Wumpf changed the title ~~New dataset API for just retrieving chunk_ids assocaited with a query~~ New dataset API for just retrieving chunk_ids associated with a query Jul 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New dataset API for just retrieving `chunk_ids` associated with a query #10261

New dataset API for just retrieving `chunk_ids` associated with a query #10261

Uh oh!

jleibs commented Jun 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 17, 2025 •

edited

Loading

Uh oh!

abey79 Jun 17, 2025

Uh oh!

jleibs Jun 17, 2025

Uh oh!

Uh oh!

Uh oh!

New dataset API for just retrieving chunk_ids associated with a query #10261

New dataset API for just retrieving chunk_ids associated with a query #10261

Uh oh!

Conversation

jleibs commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abey79 Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

jleibs Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

New dataset API for just retrieving `chunk_ids` associated with a query #10261

New dataset API for just retrieving `chunk_ids` associated with a query #10261

jleibs commented Jun 17, 2025 •

edited

Loading

github-actions bot commented Jun 17, 2025 •

edited

Loading