DATAS tuning changes #102368

Maoni0 · 2024-05-17T08:13:01Z

tuning changes -

formally introduced the concept of BCS (Budget Computed via Survrate) and BCD (Budget Computed via DATAS) for budget tuning.
unified below and above target code - now when we are above target we also grow based on the same formula (but with a different factor to avoid oscillation).
the number of HC range is now limited by DATAS to be true adapating to size. Because of this we need to be very careful with the startup stage. I'm using gen1 size to gauge what the allowed budget is. Previously this just used the min size before the first gen2 happens which means we can be running much slower at the beginning. We also don't want to be too aggressive because we might be allocating too much before the first gen2 happens so I take half of gen1 size.
recorded the last few adjustments to use make decision on such as getting the aggressive factor.
we record 2 kinds of adjustments - adjust_hc and adjust_budget. I tried to adjust for msl wait but that didn't go very far because right now we don't have the separation between alloc heaps and collection heaps; adjust_budget is meant to be a transient state that's only there if we don't want to adjust HC right away; and if we are in this state for too long we do change the HC (and reset this state).
computes a factor for how many GCs we allow before making the HC change for various scenarios.
the computation of the allowed budget now takes the conserve_mem_setting into consideration so if it's more conservation we will give it less budget.
provided a new config to specify the target tcp.

got rid of the following in DATAS - they were there as a proof of concept but really didn't work well at all -

got rid of smoothed tcp - it really just doesn't make sense to use it. It makes the reaction delayed and in a counter-productive way.
got rid of all of the space tuning related things which also didn't make sense. It was very dubious the way we calculated the "space cost". This made us barely reduce the number of heaps.

hopefully now the code is also much easier to read than the previous big method (which I wrote in a hurry).

exclude factors that introduce volatility in budget computation in SVR GC - these can change the budget dramatically in a very predictable way

based on free list
based on the LLC size - I'm seeing with the default SVR GC this makes the heap size larger on a 12-core machine than on a 28-core machine for the same workload for some aspnet benchmarks.

I'm really trying to make DATAS produce more predictable heap sizes. This is a very different philosophy from SVR GC.

diag is not done. Will come in a separate PR -

I got rid of some fields that we currently fire in the HeapCountTuning event. but I didn't want to change the event layout as we might want to fire different info in the event.
need to fire budget info in an event.

I'll add the comparison data tomorrow.

dotnet-policy-service · 2024-05-17T08:13:34Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

src/coreclr/gc/gcconfig.h

src/coreclr/gc/gcpriv.h

src/coreclr/gc/gcconfig.h

src/coreclr/gc/gcpriv.h

src/coreclr/gc/gc.cpp

src/coreclr/gc/gcpriv.h

Maoni0 · 2024-05-23T23:41:37Z

oops, I forgot to add the data. with this change you can see the true effect of adapting to size -

(fix is running on the 28-core machine, aspnet-citrine-win; fix12c is running on 12-core, aspnet-perf-win).
the max heap sizes are very similar (note that benchmarks could behave differently even without difference in GC so some are more similar than others. below is such an example)

comparing with Server GC on the 28-core machine -

RPS comparison

the PlaintextMvc benchmark RPS is noticeable lower because DATAS limits how many heaps there are but these benchmarks allocate with different number of threads based on the machine they are running on (so Server GC has 28 heaps and DATAS has 15 heaps) which causes throughput drops. this benchmark is extremely allocation intense which causes this effect. this is something we can improve in the GC to accommodate.

* DATAS tuning changes * fix debug build error * fix linux build error --------- Co-authored-by: Maoni0 <[email protected]>

DATAS tuning changes

e7502c0

ghost added the area-GC-coreclr label May 17, 2024

dotnet-policy-service bot assigned Maoni0 May 17, 2024

Maoni0 requested a review from mangod9 May 17, 2024 08:13

Maoni0 added 2 commits May 17, 2024 16:46

fix debug build error

69f6d26

fix linux build error

a0dddca