Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add promote argument to joinpartitions#27

Merged
quinnj merged 2 commits into
mainfrom
rf/joinpartitions-promote
Oct 13, 2021
Merged

Add promote argument to joinpartitions#27
quinnj merged 2 commits into
mainfrom
rf/joinpartitions-promote

Conversation

@rofinn

@rofinn rofinn commented Oct 6, 2021

Copy link
Copy Markdown
Member

Closes #26

@rofinn rofinn force-pushed the rf/joinpartitions-promote branch from 1dbb040 to 4a788eb Compare October 6, 2021 22:27
@rofinn rofinn requested a review from quinnj October 6, 2021 22:28
@codecov

codecov Bot commented Oct 6, 2021

Copy link
Copy Markdown

Codecov Report

Merging #27 (02df3d6) into main (619cd78) will increase coverage by 0.30%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #27      +/-   ##
==========================================
+ Coverage   90.75%   91.05%   +0.30%     
==========================================
  Files           1        1              
  Lines         238      246       +8     
==========================================
+ Hits          216      224       +8     
  Misses         22       22              
Impacted Files Coverage Δ
src/TableOperations.jl 91.05% <100.00%> (+0.30%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 619cd78...02df3d6. Read the comment docs.

Comment thread src/TableOperations.jl
foreach(i -> push!(joined, ChainedVector([Tables.getcolumn(cols, i)])), 1:N)
else
foreach(i -> append!(joined[i], Tables.getcolumn(cols, i)), 1:N)
foreach(1:N) do i

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll admit that this is a bit uglier and may trigger extra allocations, but that should be relatively infrequently.

Comment thread src/TableOperations.jl
if !(S <: T) && promote
R = promote_type(S, T)
# Promote the joined arrays
newcol = similar(prev, R)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're using a ChainedVector we accept the default (existing length) to avoid changing the underlying blocks.

Comment thread src/TableOperations.jl
# Update the schema
sch_names = schema[].names
sch_types = schema[].types
schema[] = Tables.Schema(

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a cleaner way to do this?

@rofinn

rofinn commented Oct 6, 2021

Copy link
Copy Markdown
Member Author

Looks like this change depends on the similar/copyto! methods introduced in SentinelArrays v1.3, which dropped Julia < 1.3. Should I add that restriction here and make this a minor release?

Comment thread test/runtests.jl
t2 = (A=[1, missing, 3], B=[1.0, 2.0, 3.0], C=["trim", "the", "sail"])
p = Tables.partitioner((t1, t2))
# Throws a method error trying to convert `missing` to `Int64`
@test_throws MethodError TableOperations.joinpartitions(p)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is triggered through promote_type(S, T)?

@rofinn rofinn Oct 8, 2021

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is triggered on the append!(::ChainedVector{Int64}, ::Vector{Union{Int64, Missing}}) because we didn't pass promote=true, so we never reallocate via similar(prev, promote_type(S, T)). This is the current behaviour, so this should be a non-breaking change.

Comment thread src/TableOperations.jl
T = eltype(prev)
S = eltype(col)

if !(S <: T) && promote

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With normal joinpartitions does the schema's type for a column just indicate the type of the first vector in ChainedVector? And that's why (with strict subtypes) S <: T is okay but S >: T is not?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. For example, if we were to reverse the order of t1 and t2 in the error test below, then it wouldn't error and promote=true wouldn't be needed. The schema reference is just whatever it found on the first pass, this is also why we need to update the existing schema when we hit this condition.

@rofinn

rofinn commented Oct 12, 2021

Copy link
Copy Markdown
Member Author

Bump @quinnj (org member, original author)?

Comment thread src/TableOperations.jl Outdated

@quinnj quinnj left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rofinn!

@quinnj quinnj merged commit e860c85 into main Oct 13, 2021
@quinnj quinnj deleted the rf/joinpartitions-promote branch October 13, 2021 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

joinpartitions should take a promote kwarg

4 participants