Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

luke-gruber
Copy link
Contributor

We can avoid taking this barrier if we're not incremental marking or lazy sweeping. I found this was taking a significant amount of samples when profiling Psych.load in multiple ractors due to the vm barrier. With this change, we get significant improvements in ractor benchmarks that allocate lots of objects.

-- Psych.load benchmark --

Before:            After:
r:   itr:   time   r:   itr:   time
0    #1:  960ms    0    #1:  943ms
0    #2:  979ms    0    #2:  939ms
0    #3:  968ms    0    #3:  948ms
0    #4:  963ms    0    #4:  946ms
0    #5:  964ms    0    #5:  944ms
1    #1:  947ms    1    #1:  940ms
1    #2:  950ms    1    #2:  947ms
1    #3:  962ms    1    #3:  950ms
1    #4:  947ms    1    #4:  945ms
1    #5:  947ms    1    #5:  943ms
2    #1: 1131ms    2    #1: 1005ms
2    #2: 1153ms    2    #2:  996ms
2    #3: 1155ms    2    #3: 1003ms
2    #4: 1205ms    2    #4: 1012ms
2    #5: 1179ms    2    #5: 1012ms
4    #1: 1555ms    4    #1: 1209ms
4    #2: 1509ms    4    #2: 1244ms
4    #3: 1529ms    4    #3: 1254ms
4    #4: 1512ms    4    #4: 1267ms
4    #5: 1513ms    4    #5: 1245ms
6    #1: 2122ms    6    #1: 1584ms
6    #2: 2080ms    6    #2: 1532ms
6    #3: 2079ms    6    #3: 1476ms
6    #4: 2021ms    6    #4: 1463ms
6    #5: 1999ms    6    #5: 1461ms
8    #1: 2741ms    8    #1: 1630ms
8    #2: 2711ms    8    #2: 1632ms
8    #3: 2688ms    8    #3: 1654ms
8    #4: 2641ms    8    #4: 1684ms
8    #5: 2656ms    8    #5: 1752ms

Comment on lines 2015 to 2018
bool needs_gc = is_incremental_marking(objspace) || (heap->free_pages == NULL && is_lazy_sweeping(objspace));
if (needs_gc) {
gc_enter(objspace, gc_enter_event_continue, &lock_lev); // takes vm barrier, try to avoid
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about an early return instead?

Suggested change
bool needs_gc = is_incremental_marking(objspace) || (heap->free_pages == NULL && is_lazy_sweeping(objspace));
if (needs_gc) {
gc_enter(objspace, gc_enter_event_continue, &lock_lev); // takes vm barrier, try to avoid
}
bool needs_gc = is_incremental_marking(objspace) || (heap->free_pages == NULL && is_lazy_sweeping(objspace));
if (!needs_gc) {
return;
}

We can avoid taking this barrier if we're not incremental marking or lazy sweeping.
I found this was taking a significant amount of samples when profiling `Psych.load`
in multiple ractors due to the vm barrier. With this change, we get significant improvements
in ractor benchmarks that allocate lots of objects.

-- Psych.load benchmark --

```
Before:            After:
r:   itr:   time   r:   itr:   time
0    ruby#1:  960ms    0    ruby#1:  943ms
0    ruby#2:  979ms    0    ruby#2:  939ms
0    ruby#3:  968ms    0    ruby#3:  948ms
0    ruby#4:  963ms    0    ruby#4:  946ms
0    ruby#5:  964ms    0    ruby#5:  944ms
1    ruby#1:  947ms    1    ruby#1:  940ms
1    ruby#2:  950ms    1    ruby#2:  947ms
1    ruby#3:  962ms    1    ruby#3:  950ms
1    ruby#4:  947ms    1    ruby#4:  945ms
1    ruby#5:  947ms    1    ruby#5:  943ms
2    ruby#1: 1131ms    2    ruby#1: 1005ms
2    ruby#2: 1153ms    2    ruby#2:  996ms
2    ruby#3: 1155ms    2    ruby#3: 1003ms
2    ruby#4: 1205ms    2    ruby#4: 1012ms
2    ruby#5: 1179ms    2    ruby#5: 1012ms
4    ruby#1: 1555ms    4    ruby#1: 1209ms
4    ruby#2: 1509ms    4    ruby#2: 1244ms
4    ruby#3: 1529ms    4    ruby#3: 1254ms
4    ruby#4: 1512ms    4    ruby#4: 1267ms
4    ruby#5: 1513ms    4    ruby#5: 1245ms
6    ruby#1: 2122ms    6    ruby#1: 1584ms
6    ruby#2: 2080ms    6    ruby#2: 1532ms
6    ruby#3: 2079ms    6    ruby#3: 1476ms
6    ruby#4: 2021ms    6    ruby#4: 1463ms
6    ruby#5: 1999ms    6    ruby#5: 1461ms
8    ruby#1: 2741ms    8    ruby#1: 1630ms
8    ruby#2: 2711ms    8    ruby#2: 1632ms
8    ruby#3: 2688ms    8    ruby#3: 1654ms
8    ruby#4: 2641ms    8    ruby#4: 1684ms
8    ruby#5: 2656ms    8    ruby#5: 1752ms
```
@luke-gruber luke-gruber force-pushed the heap_prepare_page_no_vm_barrier branch from 910514f to 4552e5c Compare September 2, 2025 19:30
@@ -2012,7 +2012,10 @@ static void
gc_continue(rb_objspace_t *objspace, rb_heap_t *heap)
{
unsigned int lock_lev;
gc_enter(objspace, gc_enter_event_continue, &lock_lev);
bool needs_gc = is_incremental_marking(objspace) || (heap->free_pages == NULL && is_lazy_sweeping(objspace));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about extracting a macro:

/* In lazy sweeping or the previous incremental marking finished and
 * did not yield a free page. */
#define needs_continue_sweeping(objspace, heap) \
    ((heap)->free_pages == NULL && is_lazy_sweeping(objspace))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants