Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

jleibs
Copy link
Member

@jleibs jleibs commented Jun 17, 2025

This makes it easier to benchmark the query vs chunk fetching portions of the server APIs.

Example:

import rerun as rr
import pyarrow as pa
import time

client = rr.catalog.CatalogClient("rerun+http://localhost:51234")
dataset = client.get_dataset(name="droid:sample500")

wrist = dataset.dataframe_query_view(index="log_tick", contents="/camera/wrist/embedding /thumbnail/camera/wrist")

sampled_times = [0, 100, 200, 500, 1000, 2000]

# Profiling

start = time.perf_counter()
batches = (wrist.filter_index_values(sampled_times).fill_latest_at()).get_chunk_ids()
end = time.perf_counter()
table = pa.Table.from_batches(batches)
num_batches = len(table.to_batches())
print(f"Got {table.num_rows} chunks over {num_batches} batches in {(end - start) * 1000} ms")

@jleibs jleibs added include in changelog 📉 performance Optimization, memory use, etc sdk-python Python logging API labels Jun 17, 2025
Copy link

github-actions bot commented Jun 17, 2025

Web viewer built successfully. If applicable, you should also test it:

  • I have tested the web viewer
Result Commit Link Manifest
3e63dd6 https://rerun.io/viewer/pr/10261 +nightly +main

Note: This comment is updated whenever you push a commit.

Ok(stores)
}

pub fn get_chunk_ids_for_dataframe_query(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have any new such helper delegate to ConnectionClient, but let's at least have a todo here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving as is for consistency, but added the comment and we can tackle them all at the same time in a separate PR.

@jleibs jleibs merged commit 650333a into main Jun 17, 2025
40 checks passed
@jleibs jleibs deleted the jleibs/get_chunk_ids branch June 17, 2025 13:02
@Wumpf Wumpf changed the title New dataset API for just retrieving chunk_ids assocaited with a query New dataset API for just retrieving chunk_ids associated with a query Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
include in changelog 📉 performance Optimization, memory use, etc sdk-python Python logging API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants