Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

jakobnissen
Copy link
Member

This produces much better code than _deleteend!, because it skips a needless error check for negative length. It's also easier on the compiler because it does not produce _unsetindex! code which is optimised away.

empty! is already pretty fast, but this produces smaller code more likely to inline.

This produces much better code than _deleteend!, because it skips a needless
error check for negative length. It's also easier on the compiler because it
does not produce _unsetindex! code which is optimised away.
@jakobnissen
Copy link
Member Author

Test failure seems unrelated

@giordano
Copy link
Member

This produces much better code than _deleteend!, because it skips a needless error check for negative length.

When making similar claims, it'd be nice to substantiate them with concrete code, or benchmarks if relevant. This would be very useful for future reference, not just during review.

@adienes
Copy link
Member

adienes commented Sep 12, 2025

kinda tangential but should resize!(a::Vector, nl_::Integer) get a fast path too?

@oscardssmith
Copy link
Member

oscardssmith commented Sep 12, 2025

the design goal was for _unsetindex! (and the loop) to be deleted where unnecessary, but if that's not happening we should go with this approach.

Alternatively, could we move this condition into delete_end?

@jakobnissen
Copy link
Member Author

jakobnissen commented Sep 12, 2025

I should have given some more information about this, sorry! πŸ˜…

On non-pointer arrays, empty! is really just setting the length to zero. So, if it compiles to anything more than a single move instruction, there is room for improvement. The implementaiton in this PR does that.

The reason the current implementation doesn't do it is because of this error check which is unreachable but not optimised away. It's just a single branch with probably a 100% prediction rate, but it bloats the generated code unnecessaily and therefore affects inlining.

Does it matter for performance? Yes, it matters. Inlining matters and dead code prevents it. However it's tricky to time because the time depends entirely on whether you reach the inlining threshold in your hot loop.

The same error is not removed from resize! either, although it's still unreachable.
I'm fine with other approaches, e.g. to refactor _deleteend! to a safe and unsafe version.
However, empty! remains a special case because the new length (zero) is always valid so IMO it's a low-hanging optimisation fruit and we should just pluck it regardless.

Code generated before this PR for empty!(::Vector{UInt8}):

  %gcframe1 = alloca [3 x ptr], align 16
  call void @llvm.memset.p0.i64(ptr align 16 %gcframe1, i8 0, i64 24, i1 true)
  %thread_ptr = call ptr asm "movq %fs:0, $0", "=r"() #11
  %tls_ppgcstack = getelementptr inbounds i8, ptr %thread_ptr, i64 -8
  %tls_pgcstack = load ptr, ptr %tls_ppgcstack, align 8
  store i64 4, ptr %gcframe1, align 8
  %frame.prev = getelementptr inbounds ptr, ptr %gcframe1, i64 1
  %task.gcstack = load ptr, ptr %tls_pgcstack, align 8
  store ptr %task.gcstack, ptr %frame.prev, align 8
  store ptr %gcframe1, ptr %tls_pgcstack, align 8
  %"a::Array.size_ptr" = getelementptr inbounds i8, ptr %"a::Array", i64 16
  %"a::Array.size.0.copyload" = load i64, ptr %"a::Array.size_ptr", align 8
  %0 = icmp slt i64 %"a::Array.size.0.copyload", 0
  br i1 %0, label %L73, label %L70

L70:                                              ; preds = %top
  store i64 0, ptr %"a::Array.size_ptr", align 8
  %frame.prev38 = load ptr, ptr %frame.prev, align 8
  store ptr %frame.prev38, ptr %tls_pgcstack, align 8
  ret ptr %"a::Array"

L73:                                              ; preds = %top
  %1 = call [1 x ptr] @j_ArgumentError_7567(ptr nonnull @"jl_global#7568.jit")
  %gc_slot_addr_0 = getelementptr inbounds ptr, ptr %gcframe1, i64 2
  %2 = extractvalue [1 x ptr] %1, 0
  store ptr %2, ptr %gc_slot_addr_0, align 8
  %ptls_field = getelementptr inbounds i8, ptr %tls_pgcstack, i64 16
  %ptls_load = load ptr, ptr %ptls_field, align 8
  %"box::ArgumentError" = call noalias nonnull align 8 dereferenceable(16) ptr @ijl_gc_small_alloc(ptr %ptls_load, i32 360, i32 16, i64 140032381843840) #7
  %"box::ArgumentError.tag_addr" = getelementptr inbounds i64, ptr %"box::ArgumentError", i64 -1
  store atomic i64 140032381843840, ptr %"box::ArgumentError.tag_addr" unordered, align 8
  store ptr %2, ptr %"box::ArgumentError", align 8
  store ptr null, ptr %gc_slot_addr_0, align 8
  call void @ijl_throw(ptr nonnull %"box::ArgumentError")
  unreachable

Code generated after

  %"a::Array.size_ptr" = getelementptr inbounds i8, ptr %"a::Array", i64 16
  store i64 0, ptr %"a::Array.size_ptr", align 8
  ret ptr %"a::Array"

@jakobnissen
Copy link
Member Author

Interestingly, rephrasing the error check to unsigned(delta) > unsigned(len) && throw(...) also makes it optimize well, so maybe we should (also) do that.

@oscardssmith
Copy link
Member

it's odd to me that LLVM wasn't able to figure out the equivalence there... I guess it doesn't know that the length field can't be negative.

@giordano
Copy link
Member

rephrasing the error check to unsigned(delta) > unsigned(len) && throw(...) also makes it optimize well

That looks a much nicer solution (with a comment to explain why it's necessary) than the branching in the current version of this PR.

@jakobnissen jakobnissen changed the title Optimise empty! for pointerfree arrays Help compiler know integer bounds in vector resizing ops Sep 12, 2025
@jakobnissen
Copy link
Member Author

jakobnissen commented Sep 12, 2025

You're right. I removed the optimisation in empty! and instead added the above optimisation to _deletebeg! and _deleteend!, and a similar optimisation in resize!.
This now also improves codegen for pop! and resize! (and possibly other methods, too). Unfortunately, resize! still has an unreachable error branch that requires some minor refactoring to optimise away (and I'm not going to do it in this PR)

@@ -1264,7 +1264,10 @@ end
function _deletebeg!(a::Vector, delta::Integer)
delta = Int(delta)
len = length(a)
0 <= delta <= len || throw(ArgumentError("_deletebeg! requires delta in 0:length(a)"))
# See comment in _deleteend!
unsigned(delta) > unsigned(len) && throw(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it has to be formatted like this anyway, it feels a normal if end would work better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants