Add promote argument to joinpartitions#27
Conversation
1dbb040 to
4a788eb
Compare
Codecov Report
@@ Coverage Diff @@
## main #27 +/- ##
==========================================
+ Coverage 90.75% 91.05% +0.30%
==========================================
Files 1 1
Lines 238 246 +8
==========================================
+ Hits 216 224 +8
Misses 22 22
Continue to review full report at Codecov.
|
| foreach(i -> push!(joined, ChainedVector([Tables.getcolumn(cols, i)])), 1:N) | ||
| else | ||
| foreach(i -> append!(joined[i], Tables.getcolumn(cols, i)), 1:N) | ||
| foreach(1:N) do i |
There was a problem hiding this comment.
I'll admit that this is a bit uglier and may trigger extra allocations, but that should be relatively infrequently.
| if !(S <: T) && promote | ||
| R = promote_type(S, T) | ||
| # Promote the joined arrays | ||
| newcol = similar(prev, R) |
There was a problem hiding this comment.
Since we're using a ChainedVector we accept the default (existing length) to avoid changing the underlying blocks.
| # Update the schema | ||
| sch_names = schema[].names | ||
| sch_types = schema[].types | ||
| schema[] = Tables.Schema( |
There was a problem hiding this comment.
Is there a cleaner way to do this?
|
Looks like this change depends on the |
| t2 = (A=[1, missing, 3], B=[1.0, 2.0, 3.0], C=["trim", "the", "sail"]) | ||
| p = Tables.partitioner((t1, t2)) | ||
| # Throws a method error trying to convert `missing` to `Int64` | ||
| @test_throws MethodError TableOperations.joinpartitions(p) |
There was a problem hiding this comment.
this is triggered through promote_type(S, T)?
There was a problem hiding this comment.
This is triggered on the append!(::ChainedVector{Int64}, ::Vector{Union{Int64, Missing}}) because we didn't pass promote=true, so we never reallocate via similar(prev, promote_type(S, T)). This is the current behaviour, so this should be a non-breaking change.
| T = eltype(prev) | ||
| S = eltype(col) | ||
|
|
||
| if !(S <: T) && promote |
There was a problem hiding this comment.
With normal joinpartitions does the schema's type for a column just indicate the type of the first vector in ChainedVector? And that's why (with strict subtypes) S <: T is okay but S >: T is not?
There was a problem hiding this comment.
That's correct. For example, if we were to reverse the order of t1 and t2 in the error test below, then it wouldn't error and promote=true wouldn't be needed. The schema reference is just whatever it found on the first pass, this is also why we need to update the existing schema when we hit this condition.
|
Bump @quinnj (org member, original author)? |
Closes #26