Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

omariom
Copy link
Contributor

@omariom omariom commented Jul 24, 2015

The issue: https://github.com/dotnet/corefx/issues/2257

This PR replaces usage of reminder operator (which is fairly slow) in Queue's Enqueue / Dequeue / Contains methods with simple boundary check.

newcapacity = _array.Length + MinimumGrow;
}
SetCapacity(newcapacity);
Grow();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this refactored into a separate method? Does it help with inlining?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it has nothing to do with inlining. It was just instinct :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. In that case please revert it; if it's beneficial to separate it out, that can be done separately.

@omariom
Copy link
Contributor Author

omariom commented Jul 25, 2015

@nguerrera @stephentoub

Ok, this is what I've found.
The original Enqueue does a million ops in not less than 10.5 ms.
The new one can do the same in 6.5 ms.
Using Increment(ref index) though hasn't impacted perf but increased number of instructions
from

lea         eax,[rdx+1]  
mov         dword ptr [rsi+1Ch],eax
cmp         r8d,eax  
jne         00007FF8F4E708E8  
xor         eax,eax  
mov         dword ptr [rsi+1Ch],eax

to

lea         rax,[rsi+1Ch]  
mov         edx,dword ptr [rax]  
inc         edx  
mov         dword ptr [rax],edx  
mov         rcx,qword ptr [rsi+8]
cmp         dword ptr [rcx+8],edx
jne         00007FF8F4E80920  
xor         edx,edx  
mov         dword ptr [rax],edx  

That's probably because JIT is not yet perfect at inlining methods with ref params pointing to fields.

But with @sharwell's suggestion, looking like:

int tail = _tail;
_array[tail] = item;
Increment(ref tail);
_tail = tail;

JIT does a perfect job of inlining the call to Increment:

inc         eax  
cmp         ecx,eax  
jne         00007FF8F4E7079D  
xor         eax,eax  
mov         dword ptr [rsi+1Ch],eax

With the changes enqueuing a million ints takes about 5.5 ms.

The results for Dequeue: 9.3 ms vs 3 ms.

@stephentoub
Copy link
Member

Thanks, @omariom. Sounds like adopting both Nick's and Sam's suggestions is the way to go.

@sharwell
Copy link

@omariom: Keep in mind the same thing (local variable to ensure single-update to the field) can be applied to the location where _tail is updated. I didn't mention it before because the intermediate update was not exposed to callers by a pure method like Peek.

@omariom
Copy link
Contributor Author

omariom commented Jul 26, 2015

I've made the discussed changes: https://github.com/omariom/corefx/commit/8c22635ac1e38433244b8e7351c7d653906a4db6

What to do now with Grow method? Should I create a separate issue or just a PR will be enough?

@nguerrera
Copy link
Contributor

I have to say I'm disappointed that the code has gotten repetitive. I'd really like to capture everything in the increment helper if at all possible.

cc @CarolEidt to see if there's something else we can use to get around the sub-optimal inline...

@omariom
Copy link
Contributor Author

omariom commented Jul 31, 2015

I can turn it back to usage of the fields (without copies). It was good enough already.
It will get better with JIT getting better.

@stephentoub
Copy link
Member

Thanks, @omariom. I'd suggest not worrying about the increased number of instructions for now; as you say, that's something that'll just get better as the backend improves. We'd have the helper like:

private void MoveNext(ref int value)
{
    int tmp = value + 1;
    value = (tmp == _array.Length) ? 0 : tmp;
}

which would be used at call sites like:

_array[_tail] = item;
MoveNext(ref _tail); // instead of _tail = (_tail + 1) % _array.Length;
_size++;
_version++;

and that should provide the bulk of the wins while keeping the call sites simple.

@omariom
Copy link
Contributor Author

omariom commented Aug 16, 2015

@stephentoub
Ready!
And perf is good.

@stephentoub
Copy link
Member

Ready!

Thanks, but what about the local temp in the helper?

Can you also please squash this down to a single commit?

@omariom
Copy link
Contributor Author

omariom commented Aug 16, 2015

Fixed and squashed 👌

// Increments the index wrapping it if necessary.
private void MoveNext(ref int index)
{
// It is tempting to use the reminder operator here but it is actually much slower
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remainder, not reminder

@stephentoub
Copy link
Member

Thanks! Perf still looks good?

@omariom
Copy link
Contributor Author

omariom commented Aug 16, 2015

Dequeue is about 3.5 ms, Enqueue 4.5-5 ms. Even better than twice.

@stephentoub
Copy link
Member

LGTM. Thanks!

stephentoub added a commit that referenced this pull request Aug 16, 2015
Fix to issue #2257 - Trivial change to make Queue<T>'s Enqueue / Dequeue twice faster
@stephentoub stephentoub merged commit 51f757c into dotnet:master Aug 16, 2015
@omariom omariom deleted the 2257_faster_queues branch August 16, 2015 19:28
@karelz karelz modified the milestone: 1.0.0-rtm Dec 3, 2016
int tmp = index + 1;
index = (tmp == _array.Length) ? 0 : tmp;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be better?

private void MoveNext(ref int index)
{
    index++;
    if (index == _array.Length)
    {
        index = 0;
    }
}

Copy link
Member

@stephentoub stephentoub Jul 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be better?

Have you tried it? I would expect "no", since every read/write on index needs to go through that ref (hence the use of the tmp here), but if you find otherwise and have data to back it up, PRs are welcome. 😄 (It's possible subsequent JIT improvements in the last year have helped, too.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just wondering. But the ref access does make a difference.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the C# compiler automatically inline the MoveNext method in the release build? If so, then ref won't be slower than a local variable. Just wondering...

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Fix to issue dotnet/corefx#2257 - Trivial change to make Queue<T>'s Enqueue / Dequeue twice faster

Commit migrated from dotnet/corefx@51f757c
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants