-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Help compiler know integer bounds in vector resizing ops #59540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This produces much better code than _deleteend!, because it skips a needless error check for negative length. It's also easier on the compiler because it does not produce _unsetindex! code which is optimised away.
Test failure seems unrelated |
When making similar claims, it'd be nice to substantiate them with concrete code, or benchmarks if relevant. This would be very useful for future reference, not just during review. |
kinda tangential but should |
the design goal was for Alternatively, could we move this condition into |
I should have given some more information about this, sorry! π On non-pointer arrays, The reason the current implementation doesn't do it is because of this error check which is unreachable but not optimised away. It's just a single branch with probably a 100% prediction rate, but it bloats the generated code unnecessaily and therefore affects inlining. Does it matter for performance? Yes, it matters. Inlining matters and dead code prevents it. However it's tricky to time because the time depends entirely on whether you reach the inlining threshold in your hot loop. The same error is not removed from Code generated before this PR for %gcframe1 = alloca [3 x ptr], align 16
call void @llvm.memset.p0.i64(ptr align 16 %gcframe1, i8 0, i64 24, i1 true)
%thread_ptr = call ptr asm "movq %fs:0, $0", "=r"() #11
%tls_ppgcstack = getelementptr inbounds i8, ptr %thread_ptr, i64 -8
%tls_pgcstack = load ptr, ptr %tls_ppgcstack, align 8
store i64 4, ptr %gcframe1, align 8
%frame.prev = getelementptr inbounds ptr, ptr %gcframe1, i64 1
%task.gcstack = load ptr, ptr %tls_pgcstack, align 8
store ptr %task.gcstack, ptr %frame.prev, align 8
store ptr %gcframe1, ptr %tls_pgcstack, align 8
%"a::Array.size_ptr" = getelementptr inbounds i8, ptr %"a::Array", i64 16
%"a::Array.size.0.copyload" = load i64, ptr %"a::Array.size_ptr", align 8
%0 = icmp slt i64 %"a::Array.size.0.copyload", 0
br i1 %0, label %L73, label %L70
L70: ; preds = %top
store i64 0, ptr %"a::Array.size_ptr", align 8
%frame.prev38 = load ptr, ptr %frame.prev, align 8
store ptr %frame.prev38, ptr %tls_pgcstack, align 8
ret ptr %"a::Array"
L73: ; preds = %top
%1 = call [1 x ptr] @j_ArgumentError_7567(ptr nonnull @"jl_global#7568.jit")
%gc_slot_addr_0 = getelementptr inbounds ptr, ptr %gcframe1, i64 2
%2 = extractvalue [1 x ptr] %1, 0
store ptr %2, ptr %gc_slot_addr_0, align 8
%ptls_field = getelementptr inbounds i8, ptr %tls_pgcstack, i64 16
%ptls_load = load ptr, ptr %ptls_field, align 8
%"box::ArgumentError" = call noalias nonnull align 8 dereferenceable(16) ptr @ijl_gc_small_alloc(ptr %ptls_load, i32 360, i32 16, i64 140032381843840) #7
%"box::ArgumentError.tag_addr" = getelementptr inbounds i64, ptr %"box::ArgumentError", i64 -1
store atomic i64 140032381843840, ptr %"box::ArgumentError.tag_addr" unordered, align 8
store ptr %2, ptr %"box::ArgumentError", align 8
store ptr null, ptr %gc_slot_addr_0, align 8
call void @ijl_throw(ptr nonnull %"box::ArgumentError")
unreachable Code generated after %"a::Array.size_ptr" = getelementptr inbounds i8, ptr %"a::Array", i64 16
store i64 0, ptr %"a::Array.size_ptr", align 8
ret ptr %"a::Array" |
Interestingly, rephrasing the error check to |
it's odd to me that LLVM wasn't able to figure out the equivalence there... I guess it doesn't know that the length field can't be negative. |
That looks a much nicer solution (with a comment to explain why it's necessary) than the branching in the current version of this PR. |
You're right. I removed the optimisation in |
@@ -1264,7 +1264,10 @@ end | |||
function _deletebeg!(a::Vector, delta::Integer) | |||
delta = Int(delta) | |||
len = length(a) | |||
0 <= delta <= len || throw(ArgumentError("_deletebeg! requires delta in 0:length(a)")) | |||
# See comment in _deleteend! | |||
unsigned(delta) > unsigned(len) && throw( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it has to be formatted like this anyway, it feels a normal if end
would work better.
This produces much better code than _deleteend!, because it skips a needless error check for negative length. It's also easier on the compiler because it does not produce _unsetindex! code which is optimised away.
empty!
is already pretty fast, but this produces smaller code more likely to inline.