-
Notifications
You must be signed in to change notification settings - Fork 135
Description
Vortex is generally quite good about minimising copies of array data. It can load a segment from disk into a FlatLayout, zero-copy deserialize it into an in-memory array, and then perform computations over it.
There is one place where we do often copy that ideally we wouldn't and this is in the FlatLayoutReader when we eventually come to filter the zero-deserialization array. We invoke the filter(array, mask) compute function, which for many kernels will create a new canonicalized array.
When exporting to Arrow, this is fine. Since we can use a Vortex buffer zero-copy inside Arrow. But for other systems, e.g. DuckDB, or when the caller wishes to re-use a pre-allocated buffer of their own (Arrow, Numpy, DuckDB Vector), we need a way to pass the output buffer into the filter compute function and have the result written into it.
For example:
filter(&dyn Array, &Mask -> VortexResult<ArrayRef>
filter_into(&dyn Array, &Mask, &mut Canonical) -> VortexResult<()>In the first function, the kernel has the option to return a non-canonical array. For example, a DictArray just needs to filter its codes leaving its values untouched.
In the second function, we can only really take a canonical array to write into (similar I suppose to the APIs inside DuckDB that pass around an output vector). With some changes to vortex-buffer (support for external BufferMut), this would allow us to wrap up pre-existing buffers and write results into them. In this case, Canonical::len == Mask::true_count.
For a scan, we cannot know the output length. So we should resort to an "exporter" style API (similar to how we currently implement the DuckDB exporter). This might look something like:
trait ArrayStreamExt {
fn into_exporter(self) -> ArrayExporter;
}
trait ArrayExporter {
fn export(&mut self, &mut Canonical) -> VortexResult<usize>; // Returning num rows exported <= Canonical::len.
}Note the &mut Canonical could also be a non-resizable impl of an ArrayBuilder, or it could be something entirely new such as a trait Exportable. (Having a trait here would allow us to dynamically allocate additional buffers via the external system, e.g. asking DuckDB to attach a validity buffer to the vector when its needed, rather than having to pre-allocate one and wrap it up inside a mutable Canonical PrimitiveArray)
To summarize, a concrete example use-case might be:
- Pre-allocate a numpy array of 2k floats.
- Setup a Vortex scan of a compressed float column, filter using row indices, no filter expr.
- Export the result of the scan 2k floats at a time into the same reused Numpy array with only a single copy (from the compressed form into the uncompressed Numpy buffer)