-
Notifications
You must be signed in to change notification settings - Fork 539
Low-level Arrow conversion of protobuf
-encoded MCAP messages
#10791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Web viewer built successfully. If applicable, you should also test it:
Note: This comment is updated whenever you push a commit. |
stash more reverted builders are broken savegame autosave boxed all the way even closer should work? stash even closer holy! Implement hard coded JSON loading Protobuf improvements Remove `jsonschema` again for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This is quite elegant.
.with_field("id", Arc::new(UInt16Array::from(vec![*id]))) | ||
.with_field("name", Arc::new(StringArray::from(vec![name.clone()]))) | ||
.with_component::<components::Blob>("data", vec![blob]) | ||
.with_field("data", Arc::new(BinaryArray::from(vec![data.as_ref()]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I'm happy that we're storing this, the fact that it lives in separate static component is kind of annoying for some practical use-cases since it means the schema and the binary payloads will live in separate chunks that need to be tracked together.
I'm wondering if it's worth actually adding this schema-data to the Field-metadata associate with the "Binary Message Payload" column itself (if and when we create it). This would make that column totally self-describing and able to be decoded viewer-side using the same reflection techniques you've implemented here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense! From a viewer perspective adding this as a static chunk makes sense though, so I could see us supporting both ways going forward.
Not sure if Arrow field metadata is the right place though, or if this will end up in the promised out-of-band metadata (cc @teh-cmc).
Related
ArrayBuilder
forUnionBuilder
apache/arrow-rs#8033What
Important
Admittedly, the way this code is called is still a mess. My next steps will be cleaning up the general MCAP data loader architecture, but I wanted to get the
protobuf
decoding out separately.This PR implements low-level Arrow conversion of
protobuf
-encoded MCAP messages using runtime reflection from theprost_reflect
crate. Given aprotobuf
-based MCAP schema a dynamic set of Arrow builders is created. Each message is then encoded recursively using these builders.This PR also contains a drive-by fix for a missing field in
PointCloud2
, happy to pull that out if requested!Todos
There is some performance concern with the current implentation, namely that we re-decode the schema once for every MCAP chunk. That will require re-architecting some of the higher-level code, which I will be doing in a separate PR.