Pauseless Garbage Collector (Question) #96213

jogibear9988 · 2023-12-20T12:46:27Z

Is there any reason why something like the Pauseless Garbage Collector wich exists for Java from Azul never was implemented for Dotnet?

https://www.azul.com/products/components/pgc/
https://www.artima.com/articles/azuls-pauseless-garbage-collector

ghost · 2023-12-20T13:10:22Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Is there any reason why something like the Pauseless Garbage Collector wich exists for Java from Azul never was implemented for Dotnet?

https://www.azul.com/products/components/pgc/
https://www.artima.com/articles/azuls-pauseless-garbage-collector

Author:	jogibear9988
Assignees:	-
Labels:	`question`, `area-GC-coreclr`, `untriaged`
Milestone:	-

AlgorithmsAreCool · 2023-12-20T16:18:00Z

I'll add the generic response that the .NET GC supports functionality that Java's GCs do not and this can complicate or invalidate optimizations that other GCs can take advantage of.

For examples, Java's GC doesn't support interior pointers while the .NET GC does

huoyaoyuan · 2023-12-21T06:41:33Z

I'd like to see the GC team to write some insight about benefits and challenges/drawbacks of these options. What's theoretically possible but just complex/low priority for implementation? Which features/goals have fundamentally conflict?

jkotas · 2023-12-21T18:17:20Z

These types of GCs typically trade throughput for shorter pause times. For example, they often use GC read barriers that make accessing object reference fields significantly slower. If you would like to understand the problem space, read the The Garbage Collection Handbook. It has a full chapter dedicated to real-time garbage collectors.

There is nothing fundamental preventing building these types of garbage collectors for .NET. It is just a lot of work to build a production quality garbage collector. We do not see significant demand for these types of garbage collectors in .NET. Building alternative garbage collectors with very different performance tradeoffs has not been at the top of the core .NET team priority list.

I would love to see .NET community experimenting with alternative garbage collectors with very different performance tradeoffs. It is how Azul came to be - Azul's garbage collector that you have linked to is not built by the core Java team.

MichalPetryka · 2023-12-24T20:39:01Z

Building alternative garbage collectors with very different performance tradeoffs has not been at the top of the core .NET team priority list.

I guess that having more and more official, simultaneously supported GCs would also increase the amount of work needed to maintain and improve them all, putting even more burden on the GC team which would mean that the existing GCs would be improved slower.

jogibear9988 · 2023-12-24T20:52:17Z

Building alternative garbage collectors with very different performance tradeoffs has not been at the top of the core .NET team priority list.

I guess that having more and more official, simultaneously supported GCs would also increase the amount of work needed to maintain and improve them all, putting even more burden on the GC team which would mean that the existing GCs would be improved slower.

Yeah, but a pauseless collector for me seams to open a whole new area where .NET could be used. Even if it may be slower, but deterministic, without pauses every now and then, for some applications this could be a huge benefit.

jkotas · 2023-12-24T21:18:54Z

I guess that having more and more official, simultaneously supported GCs would also increase the amount of work needed to maintain and improve them all, putting even more burden on the GC team which would mean that the existing GCs would be improved slower.

Right. If it was to follow the Azul model, it would not impact the core GC team much. I believe that the core Java GC team does not spend any cycles on the Azul GC. The Azul GC is maintained by Azul that is a company with a closed source business model.

Yeah, but a pauseless collector for me seams to open a whole new area where .NET could be used.

It comes down to numbers and opportunity costs. For example, how many new developers can pauseless GC bring to .NET? It is hard to make the numbers work.

filipnavara · 2023-12-27T00:32:55Z

If you would like to understand the problem space, read the The Garbage Collection Handbook.

For what's it worth, beware that buying the book as eBook from the official publisher only gives access to the book through the VitalSource service. There is no way to download the book except through the DRM encumbered software, and they managed to block my account before I was even able to read a single page (no explanation given, the service just responds with 401 error and logs me out).

If you want to get the book, get it as a physical book or through Amazon Kindle and save yourself the trouble.

fabianoliver · 2024-04-21T14:03:53Z

We do not see significant demand for these types of garbage collectors in .NET

I am rather surprised to hear that!

I'd love to see an experimental GC, so I'm certainly quite biased. But I'd imagine predictability of latency is a very significant concern for a number of large user bases. Game development (Unity) comes to mind of course. As do many areas in finance & algorithmic trading.
In my experience, current GC characteristics are often a dealbreaker for these users. So unlikely many of the recent improvements to C# features and the runtime, which without a doubt greatly enhance the experience for existing .NET users (not that I'm complaining! :) ), I would imagine something like pauseless GC has much higher potential to bring in new developers that so far couldn't realistically choose .NET at all.

Sorry, that's it for my sales pitch; but in short, I'd definitely love to see experimentation in this area.

tannergooding · 2024-04-22T17:11:59Z

Latency of a GC can be a concern in many of the same ways that latency of RAII can be a concern.

Having a GC, including a GC that can "stop the world", is not itself strictly a blocker and it may be of interest to note that many of the broader/well known game engines do themselves use GCs (many, but not all, of which are incremental rather than pauseless).

Most people's experience with .NET and a GC in environments like game dev, up until this point, has been with either the legacy Mono GC or the Unity GC, neither of which can really be compared with the performance, throughput, latency, or various other metrics of the precise GC that ships with RyuJIT.

Having some form of incremental GC is likely still interesting, especially if it can be coordinated to run more so in places where the CPU isn't doing "important" work (such as when you're awaiting a dispatched GPU task to finish executing), but its hardly a requirement with an advanced modern GC, especially if you're appropriately taking memory management into consideration by utilizing pools, spans/views, and other similar techniques (just as you'd have to use in C++ to limit RAII or free overhead).

sgf · 2024-05-23T03:15:01Z

In essence, using a pool is no different from manually allocating memory. It does not reduce the mental burden of manual management required to allocate and reclaim memory.
Although .net gc can still cope with it, with the increasing memory usage, improvement is imperative.
At present,about low latency control, .net's GC has no advantage over jvm's gc in large memory management.It's also not good as the go 's gc .
Throughput is important, but in cases where latency is sensitive, STW will directly limit the areas where .net can be used, such as server side of large-scale online games.

Of course, it is also necessary to explore safe programming methods similar to rust.
If the memory usage is too much, the GC will be overwhelmed.

This is a popular article about GC.
Go vs C#, part 2: Garbage Collection

jogibear9988 · 2024-05-23T05:31:42Z

In essence, using a pool is no different from manually allocating memory. It does not reduce the mental burden of manual management required to allocate and reclaim memory. Although .net gc can still cope with it, with the increasing memory usage, improvement is imperative. At present,about low latency control, .net's GC has no advantage over jvm's gc in large memory management.It's also not good as the go 's gc . Throughput is important, but in cases where latency is sensitive, STW will directly limit the areas where .net can be used, such as server side of large-scale online games.

Of course, it is also necessary to explore safe programming methods similar to rust. If the memory usage is too much, the GC will be overwhelmed.

This is a popular article about GC. Go vs C#, part 2: Garbage Collection

Would be nice to see how this changes in newer versions of .NET and also how JAVA compares against (with the default and the here mentioned pauseless collector)

hez2010 · 2024-05-29T03:05:00Z

Maybe an option like Incremental GC which is being adopted by Unity is feasible here, where it breaks up a "full GC" into several "partial GC" sequence (i.e. doing GC incrementally), so that although the total pausing time doesn't change, each pausing time of a single GC can be minimized to a nearly pauseless one.
This should fit the need of the applications which require soft real-time.
We allow pausing in GC, it may not need to be real "pauseless", but for apps like games and real-time services, we can make sure the pausing time of each single GC be short enough.

cc: @Maoni0

smoogipoo · 2024-06-08T01:34:30Z

I'm a game/engine dev on osu!. We've used C# throughout all of .NET 3.5 to .NET 8, and have fully rewritten the game over the years which has brought new challenges in terms of balancing features that wouldn't have been possible prior and what works best with the .NET GC. By far our greatest fight has been with the GC - it is definitely a felt presence and at the forefront of everything we do.

I've personally gone pretty deep in minimising pauses with issues such as #48937, #12717, and #76290, but as a team we've always been very conscious about allocations because our main loop is running at potentially 1000Hz, or historically even more than that.
We'll regularly profile the game to find any and all areas where we can reduce allocations, where even issues like #40009 and #33747 have caused problems because even a tiny hitch of ~5ms could be noticeable in today's world of 240+Hz monitors.

What we've found works best for us is turning on LowLatency GC mode during our core gameplay session. This is two-part:

Gameplay is the single area where we throw the kitchen sink to minimise allocations as much as possible.
Combined with the above, this mode basically emulates how I would expect an incremental GC to behave - frequent but smaller-length GCs.

Where it breaks down, however, is areas that require allocs such as menus. This GC mode will cause terrible stutters when doing anything remotely intensive, meaning that we have to very carefully switch GC modes at opportune moments to get the best of both worlds, and sometimes those worlds are intertwined.
The best we've found in menus is SustainedLowLatency - even the default Interactive mode is a little bit too heavy-handed - at the cost of a single large stutter every once in a while.
There's some further nuance here because of the cascading failure where more pressure leads to more GCs, leads to more promotions to Gen1, leads to longer GCs, leads to more data being promoted to Gen2 and longer BGCs, which is likely how #65850 and the suggestion of a "generationless GC" came about, but nevertheless...

I think that our experience in terms of minimising allocs is shared with other game devs, and is one of the more extreme scenarios.
Our situation can be deconstructed into two cases: 1) where there's a burst of allocs but the rate is not maintained, and 2) where there's a small rate of allocs maintained over a large period of time.
@hez2010 hit the nail on the head with how I'd hope the GC to behave. Failing that (partial GC), I believe SGen also achieves similar by simply adjusting the SoH size. In a way this is already what we're doing by adjusting the GC mode, but it's more dynamic. We also haven't seen any improvement from GC regions, which I hoped would bring a bit of this dynamicity.
Between VS and dotTrace/dotMemory, .NET has some of the best tooling available to diagnose issues. I'd like to see .NET lean more into this and consider that users have some ability to minimise allocations/maximise throughput themselves but not any other factors of the GC.

julealgon · 2024-06-10T13:07:25Z

@smoogipoo it seems to me that you should be working directly with MS folks on this. Your expertise on gamedev would help so many people out.

Stuttering in Unity for example has been a blemish on C# for a very long time. It gives people the impression C# is just a bad language which is absolutely disastrous to the community as a whole as more people move away from these tools and end up using other languages.

georgiuk · 2024-09-11T14:42:08Z

Industrial automation and real-time data fusion are sectors that grow a lot and would require more deterministic behavior/latency.

fawdlstty · 2024-09-25T06:12:09Z

I hope C# can enter the field of industrial robot control, but this field requires hard real-time performance, and long interruption times may lead to safety accidents. It would be great if users could create real-time threads that are not affected by GC

fawdlstty · 2024-09-25T06:20:27Z

Linus has merged the final real-time code (PREEMPT_RT) into the Linux mainline, and for programming languages, supporting hard real-time will be one of the most outstanding features of the language!

VSadov · 2025-05-01T19:06:53Z

This is an interesting discussion. I actually wonder - is there really a demand for a low-pause Garbage Collector on .NET?

Are there large enough classes of Apps that are like "If only we could have GC pauses under X milliseconds, we could do this and that . . ."?

NOTE: X cannot be 0. that is unrealistic and pointless on non-realtime OS, but it can be 2-3 msec.

neon-sunset · 2025-05-01T19:27:31Z

With the Unity slowly inching to completing their Sisyphean effort to move onto CoreCLR, that could be one of the major consumers of low-pause collectors. It's a bit arbitrary, but Minecraft as a game represents a very good example where low-pause collector (Shenandoah) has really high value as it manages to offer relatively stutter-free experience despite multi-gigabyte allocation rates.

Another example are highly-loaded but latency-sensitive services. Consider a hypothetical scenario of authoring a dynamic NGINX module with NativeAOT and thin C shim. A 20% loss in performance around allocatey parts may be acceptable, especially as guaranteeing completely allocation-free implementation may prove challenging, but incurring a sudden multi-millisecond pause may prohibit the use of .NET in such a (arguably exotic) scenario completely.

Lastly, it may be acceptable to have some application threads run into significant pauses but not the other. To my knowledge, .NET does not have the ability to scavenge thread/core-local allocation context for dead objects when attempting an allocation which exceeds the budget before incurring application-wide pause, so having that as a stop gap would have been nice.

I think the move of the Java ecosystem to ZGC and the design choices in Go point to an appetite for this type of GC design despite its downsides.

VSadov · 2025-05-01T22:37:45Z

I would love to see .NET community experimenting with alternative garbage collectors with very different performance tradeoffs. It is how Azul came to be - Azul's garbage collector that you have linked to is not built by the core Java team.

To be honest, Java always had multiple GC. I think right now there are 5 or so Java GCs just officially supported and can be selected via command line switch. I assume it is relatively easy to add another one.

Such possibility must require a well abstracted API where "substantially different" GC can plug in. By "substantially different" I mean a GC that would have a different write barrier, a different heap layout, perhaps a different approach to generations and large objects. An alternative GC might not have anything like segments or per-core heaps either.

Current CLR does not have such interface. There are some EE-to-GC APIs, but they seem to be mostly designed for scenarios like "the same GC, but with a few tweaks or fixes". If you want a "substantially different" GC, you may quickly find yourself changing things on both sides of these interfaces.
I think the lack of good GC abstraction could be a reason why experimenting/research with CLR GC is not as common as in Java community.

jkotas · 2025-05-02T00:07:57Z

I expect we would be happy to consider proposals for GC-EE interface generalizations to enable substantially different GCs.

you may quickly find yourself changing things on both sides of these interfaces.

It is no different in Java once you look at details what it took to add substantially different GCs there. It required modifications throughout the JVM to make the substantial different GCs work. It is not like that you add a new directory with a substantially different GC and it will just work without changing anything else.

a different write barrier

There is established pattern for this one: GC-EE interface API to communicate the shape of the write barrier that the GC wants to use. We have been adding more write barrier shapes over time even for the current GC.

kevingosse · 2025-05-02T08:05:07Z

Rather than a different write barrier, a low-latency GC would probably need a read barrier, for which the runtime has no support (afaict) at the moment. In any case, if a brand new GC should be born, I believe it makes more sense to fork the runtime then adapt the API if it gains traction rather than the other way around.

This is an interesting discussion. I actually wonder - is there really a demand for a low-pause Garbage Collector on .NET?
Are there large enough classes of Apps that are like "If only we could have GC pauses under X milliseconds, we could do this and that . . ."?

Just to add one data point to the conversation, a few years ago I was working for Criteo, which serves targeted ad on the web. We had high-throughput low-latency .NET services, every GC pause meant that the pending requests would timeout (which translates into lost revenue). Those services would definitely have benefitted from a low-latency GC. But Criteo is probably the exception rather than the norm.

huoyaoyuan · 2025-05-02T09:40:31Z

Rather than a different write barrier, a low-latency GC would probably need a read barrier, for which the runtime has no support (afaict) at the moment.

This sounds like a trade-off between latency and throughput.

If we have more flexibility to configure these, would it be possible to also support a reference counting GC? (For PoC and very specialized scenarios).

jkotas · 2025-05-02T13:52:53Z

I believe it makes more sense to fork the runtime then adapt the API if it gains traction rather than the other way around.

Right, it is the best to run experiments. Create a fork, try to prototype different ideas, and worry about doing it properly only once you find something with promising results.

VSadov · 2025-05-02T16:10:46Z

Rather than a different write barrier, a low-latency GC would probably need a read barrier

Not necessarily. I had an old pet project around (https://github.com/VSadov/Satori). There is a small GC that can do low pauses. In LowLatency mode the pauses are typically under 1-2 milliseconds.

Satori GC has classic Dijkstra-style write barries, so throughput is comparable to the default CoreCLR GC. Everything allocation-sensitive gets eventually gated by how fast you zero-out memory, so this is unsurprising.

Satori GC has a generational incremental concurrent GC design. All major GC phases that are proportional to the heap size run concurrently with the application threads. That is - all except compaction, which would require read barrier to run concurrently. But compaction is just an optional thing that GC can do, but does not have to. Native allocators do not compact and do just fine. Some other GCs (i.e. GO lang GC) do not compact either.

In LowLatency mode Satori GC turns off compaction. That may trade a slightly larger heap for better latency guarantees, 0% - 20%, depends on workload..
In rare cases you may even see a reduction in heap size as collections happen faster or if compaction was not very useful. Like in some ASP.Net scenarios where except for a small resident set everything dies after request is over - so why compact what soon will be garbage? (it is hard for GC to know this though without some longer-term history/statistics)

Generally though there is some degree of space-for-latency trade in LowLatency mode.

Unlike GO, Satori can flip compaction on/off dynamically.

VSadov · 2025-05-02T16:28:01Z

As I see earlier in this discussion, there was a reference to the Unity "hiccupometer" sample. https://gist.github.com/jechter/2730225240163a806fcc15c44c5ac2d6

I tried that with Satori. It is interesting.
With Satori this benchmark completes fairly fast, so to have more meaningful results I've put the benchmark in a loop like:

        static void Main(string[] args)
        {
            GCSettings.LatencyMode = GCLatencyMode.LowLatency;

            for (; ; )
            {
                float maxMs = 0;
                UpdateLinkedLists(kNumLinkedLists);

                Stopwatch totalStopWatch = new Stopwatch();
                Stopwatch frameStopWatch = new Stopwatch();
                totalStopWatch.Start();
                for (int i = 0; i < kNumFrames; i++)
                {
                    frameStopWatch.Start();
                    UpdateLinkedLists(kNumLinkedListsToChangeEachFrame);
                    frameStopWatch.Stop();
                    if (frameStopWatch.ElapsedMilliseconds > maxMs)
                        maxMs = frameStopWatch.ElapsedMilliseconds;
                    frameStopWatch.Reset();
                }

                totalStopWatch.Stop();

                Console.WriteLine($"Max Frame: {maxMs}, Avg Frame: {(float)totalStopWatch.ElapsedMilliseconds / kNumFrames}");
            }
        }

When I run this locally (MacBook Pro \w Apple M1 arm64), I see:

vs@vsadovMBP osx-arm64 % ./B1
Max Frame: 2, Avg Frame: 0.07238
Max Frame: 2, Avg Frame: 0.07243
Max Frame: 2, Avg Frame: 0.07239
Max Frame: 2, Avg Frame: 0.07187
Max Frame: 2, Avg Frame: 0.0718
Max Frame: 2, Avg Frame: 0.07176

It looks like the benchmark simulates a game that draws frames one after another and measures how long it takes max/avg per frame. However the benchmark does not really draw anything and each frame just allocates a bunch of garbage data structures.

What we see here is that the "game" runs at ~14000 frames per second! GC seems to be able to keep up with the garbage and longest frames are consistently at 2 msec.
Considering that a game typically wants 60fps or 16msec. per frame - seeing max pause at 2 msec. is pretty good and may be good enough for a lot of scenarios.

VSadov · 2025-05-02T16:30:33Z

For the standard GC I see on the same machine:

vs@vsadovMBP osx-arm64 % ./B1                    
Max Frame: 17, Avg Frame: 0.73258
Max Frame: 8, Avg Frame: 0.90282
Max Frame: 108, Avg Frame: 0.991
zsh: killed     ./B1

Looks like default GC OOMs on this benchmark.
MacBook Pro is somewhat a memory constrained machine. I'll try something bigger.

VSadov · 2025-05-02T16:46:12Z

Tried the Unity sample on a bigger machine which is more like server (32 logical cores ryzen 7950X, win-x64).

Max Frame: 1, Avg Frame: 0.05633
Max Frame: 0, Avg Frame: 0.05353
Max Frame: 1, Avg Frame: 0.05351
Max Frame: 0, Avg Frame: 0.05362
Max Frame: 1, Avg Frame: 0.05373
Max Frame: 1, Avg Frame: 0.05362

For comparison, default GC with DOTNET_gcServer=1 does:

Max Frame: 247, Avg Frame: 0.03677
Max Frame: 416, Avg Frame: 0.03592
Max Frame: 239, Avg Frame: 0.03665
Max Frame: 422, Avg Frame: 0.03861
Max Frame: 253, Avg Frame: 0.03451

There are half-second pauses, but the throughput is a bit better than with Satori.
I think it is likely just different tuning. There was not a lot of effort on self-tuning in Satori so far.

VSadov · 2025-05-02T16:53:15Z

Right. Tuning for this benchmark is possible, but it may be at cost to more general apps.

Like - to improve throughput, at some cost to heap size, one can dial aggressiveness of collections
(How much heap increase is tolerated before collection is triggered in %%). By default Satori does gen2 when heap occupancy doubles - for simplicity, but can be tweaked.

set DOTNET_gcGen2Target=0x200

Also Satori Gen0 collects only thread-local objects, but in this benchmark every single object escapes its allocating thread, so Gen0 is pointless.
Thread-local GC can be very effective in real apps as many temporary objects would be allocated and be only accessible to one thread. In this benchmark Gen0 is a waste of time though. It is not too expensive, but can be disabled for a little more gain.

set DOTNET_gcGen0=0

In theory, I could make it to turn off Gen0 automatically if seeing a workload not benefiting from it. Not sure how common that would be. Just one of many NYIs ideas...

So, - it may not be a good idea for a real app, but for this benchmark turning the above knobs results in throughput comparable to default GC with DOTNET_gcServer=1, while keeping low pauses as well.

Max Frame: 1, Avg Frame: 0.03691
Max Frame: 1, Avg Frame: 0.0365
Max Frame: 1, Avg Frame: 0.03693
Max Frame: 1, Avg Frame: 0.03696
Max Frame: 1, Avg Frame: 0.03704

Commit size while running this benchmark:
Satori (default): 1.5 Gb
Satori (with knobs above): 2.7 Gb
default GC with DOTNET_gcServer: 7.3 Gb

VSadov · 2025-05-02T18:17:17Z

I believe it makes more sense to fork the runtime then adapt the API if it gains traction rather than the other way around.

Right, it is the best to run experiments. Create a fork, try to prototype different ideas, and worry about doing it properly only once you find something with promising results.

Yes. That is the only way really.

If I search for #if FEATURE_SATORI_GC in Satori repo, I see about 140 hits.
These are places where alternative behavior is not abstracted at all or existing abstraction is difficult/inefficient to support in an alternative GC implementation.

VSadov · 2025-05-02T18:19:25Z

BTW: If someone is interested to try Satori, here are some things to know:

start with an app that targets net8.0 (I did not have time to rebase onto net9.0)
publish your app with --self-contained
dotnet publish -c release --self-contained
build Satori (as in build.[bat|sh] clr -c Release)
copy/paste coreclr.[dll|so|dylib], System.Private.CoreLib.dll, clrjit.[dll|so|dylib] from artifacts\bin\coreclr\os.arch.Release into your app's bin\Release\net8.0\os-arch\publish

There could be other ways, but this is what I do.

julealgon · 2025-05-02T19:20:55Z

@VSadov watch out for notification spam... you might want to just edit your last post sometimes instead of adding a new one to avoid notifying everyone again (which my post ironically does as well, but you know...).

smoogipoo · 2025-05-03T09:19:42Z

Very glad to see this thread come up in my emails once more, have been following for the last couple of days. I tried the Satori GC and can corroborate the low pause time statistic with our game.
I don't have good qualitative evidence to provide, so I'll leave it at that (and take it as completely subjective), but it's to the point that I would immediately consider using it for our project.

AlgorithmsAreCool · 2025-05-03T21:14:55Z

Not to pile on, but I am also pretty excited to test this out. My domain is financial transaction processing and this seems promising to lower our P99 latency

hez2010 · 2025-05-04T03:49:58Z

I have to say the Satori GC is incredible (credits to @VSadov).

This is the benchmark result on my machine:

Frame Rate: Throughput
Max Frame: Max Pause Time
Avg Frame: Average Pause Time
Peak memory: Max WorkingSet Size

All tests were running on .NET 8 on a Windows x64 machine with 24 CPU cores (i7-13700K) and 64G of RAM.

Server GC:

Frame Rate: 17370.294146302935 fps, Max Frame: 409 ms, Avg Frame: 0.05756 ms
Peak memory: 5052.04296875 MB
Frame Rate: 18940.47261365813 fps, Max Frame: 276 ms, Avg Frame: 0.05279 ms
Peak memory: 5080.984375 MB
Frame Rate: 17956.08376448434 fps, Max Frame: 420 ms, Avg Frame: 0.05569 ms
Peak memory: 5087.3828125 MB
Frame Rate: 18702.706232964876 fps, Max Frame: 365 ms, Avg Frame: 0.05346 ms
Peak memory: 5096.79296875 MB
Frame Rate: 17146.241027007713 fps, Max Frame: 450 ms, Avg Frame: 0.05832 ms
Peak memory: 5096.94140625 MB

DATAS:

Frame Rate: 9449.773250328415 fps, Max Frame: 106 ms, Avg Frame: 0.10582 ms
Peak memory: 1845.21875 MB
Frame Rate: 9848.397705086973 fps, Max Frame: 132 ms, Avg Frame: 0.10153 ms
Peak memory: 1926.3671875 MB
Frame Rate: 9431.130077993937 fps, Max Frame: 96 ms, Avg Frame: 0.10603 ms
Peak memory: 2100.14453125 MB
Frame Rate: 9625.090130547118 fps, Max Frame: 90 ms, Avg Frame: 0.10389 ms
Peak memory: 2100.14453125 MB
Frame Rate: 9514.13727063853 fps, Max Frame: 95 ms, Avg Frame: 0.1051 ms
Peak memory: 2204.7578125 MB

Workstation GC:

Frame Rate: 1824.6777605845894 fps, Max Frame: 27 ms, Avg Frame: 0.54804 ms
Peak memory: 528.0078125 MB
Frame Rate: 1780.4365091560362 fps, Max Frame: 29 ms, Avg Frame: 0.56166 ms
Peak memory: 563.828125 MB
Frame Rate: 1806.6830524990569 fps, Max Frame: 26 ms, Avg Frame: 0.5535 ms
Peak memory: 563.828125 MB
Frame Rate: 1806.7767273382653 fps, Max Frame: 26 ms, Avg Frame: 0.55347 ms
Peak memory: 563.828125 MB
Frame Rate: 1779.376018062204 fps, Max Frame: 27 ms, Avg Frame: 0.56199 ms
Peak memory: 563.828125 MB

Workstation GC + LowLatency:

Frame Rate: 2289.7795734534498 fps, Max Frame: 4 ms, Avg Frame: 0.43672 ms
Peak memory: 23137.41796875 MB
Frame Rate: 1612.406622771229 fps, Max Frame: 5499 ms, Avg Frame: 0.62019 ms
Peak memory: 39331.95703125 MB
Frame Rate: 1727.729821757025 fps, Max Frame: 5 ms, Avg Frame: 0.57879 ms
Peak memory: 39331.95703125 MB
Frame Rate: 1482.2906189126309 fps, Max Frame: 5668 ms, Avg Frame: 0.67463 ms
Peak memory: 39962.66796875 MB
Frame Rate: 1533.7899728750472 fps, Max Frame: 7 ms, Avg Frame: 0.65197 ms
Peak memory: 39962.66796875 MB

Satori GC:

Frame Rate: 17477.067204112118 fps, Max Frame: 12 ms, Avg Frame: 0.05721 ms
Peak memory: 1456.04296875 MB
Frame Rate: 17996.93436620619 fps, Max Frame: 11 ms, Avg Frame: 0.05556 ms
Peak memory: 1456.25 MB
Frame Rate: 18261.266644551055 fps, Max Frame: 22 ms, Avg Frame: 0.05476 ms
Peak memory: 1472.41796875 MB
Frame Rate: 18378.93628340746 fps, Max Frame: 16 ms, Avg Frame: 0.05441 ms
Peak memory: 1472.4453125 MB
Frame Rate: 18298.19838317485 fps, Max Frame: 10 ms, Avg Frame: 0.05465 ms
Peak memory: 1472.4453125 MB

Satori GC + LowLatency:

Frame Rate: 20135.723637031995 fps, Max Frame: 7 ms, Avg Frame: 0.04966 ms
Peak memory: 1540.640625 MB
Frame Rate: 22456.736423577884 fps, Max Frame: 5 ms, Avg Frame: 0.04453 ms
Peak memory: 1543.95703125 MB
Frame Rate: 22454.150141067075 fps, Max Frame: 11 ms, Avg Frame: 0.04453 ms
Peak memory: 1548.49609375 MB
Frame Rate: 22602.03638471297 fps, Max Frame: 5 ms, Avg Frame: 0.04424 ms
Peak memory: 1548.5625 MB
Frame Rate: 22671.265941045593 fps, Max Frame: 4 ms, Avg Frame: 0.0441 ms
Peak memory: 1554.58203125 MB

Satori GC managed to achieve better throughput than Server GC, and the latency is really low (even 10x~20x better than DATAS), while the memory footprint is even smaller than DATAS! I can't wait to see Satori being productized as the default GC of .NET.

cc: @Maoni0

fabianoliver · 2025-05-04T09:06:03Z

@VSadov fantastic work on Satori!

Are there large enough classes of Apps that are like "If only we could have GC pauses under X milliseconds, we could do this and that . . ."?

NOTE: X cannot be 0. that is unrealistic and pointless on non-realtime OS, but it can be 2-3 msec.

We've already seen a number of great examples of applications that'd benefit from a GC like that. I'd like to offer one more, just for the sake of the debate - as of course, many different somewhat individually niche case scenarios could ultimately add up to overall significant demand/use!

I work a lot on backends serving UIs, but everything is fully streaming based. So much of the backend boils down to joining & transforming streaming data sets.

By nature of things, that's pretty memory hungry - eg streaming joins means you need to keep lots of data readily accessible (i.e. often: in memory). It's very common for apps to run north of 10GB of memory, and at times multiples of that. If you're unlucky, you're dealing with fast moving streams too, so allocation rates could be significant too.

Overall, C#/dotnet is an absolute workhorse here - in spite of the overall demanding problem, median response rates in the tens to low 100ms range are often quite achievable even for rather complex queries spanning/joining dozens of systems.

P99s are a much bigger challenge though. I've seen a bunch of cases where responses took >5 seconds, just because one of the involved systems was blocked by a massive GC (and can't really fault there GC either, compacting a gen2 that totals to tens of GBs full of mostly surviving objects.. not a lightweight task really). And in some ways, we're even more unlucky than that, because if our query spans 10 different services, oftentimes the overall response is just as fast as the weakest link here - so you really add up the probabilities of every involved service to have a p99-style response.

A zero or near-zero pause GC could be very interesting to try here. Hard to say without being able to try of course, but I'd expect I'd very much be able to tolerate lower throughput in favor of avoiding huge pauses. Lower overall throughput could likely often be mitigated by just throwing a few more pods at it, which is cheap. And even increasing p50 response times is perfectly ok - adding a few ms to the p50 will be hardly noticeable, but having fewer or no cases that take multiple seconds for some UI to load would be very noticeable!

tannergooding · 2025-05-05T15:55:25Z

I can't wait to see Satori being productized as the default GC of .NET.

It should be noted that a single benchmark isn't the type of thing that would drive that decision.

Much like with locks, collection types, and other scenarios; there are a lot of considerations into picking what the default memory management library (in this case the GC) should be. This is why Java has so many different GC and why .NET has as many knobs and configuration options as it does.

That is, a given GC being ideal for low latency scenarios like games won't necessarily be ideal for other types of applications. The default needs to strike a balance for all applications. The current GC has years of investment into being such an ideal default for most applications.

It's also worth noting that the benchmark as given isn't exactly representative or realistic. That is, while it does help showcase some worst possible potential for a GC, it isn't necessarily going to cleanly map to how a GC will perform in a real game.

Even a raw C++ game using OGL, Vulkan, or D3D12 and doing nothing but clearing the render target (with proper frame buffering, etc) will be limited to around 10k fps on a modern machine; just due to general overhead of the message pump, dispatching work to the GPU, etc. A real game will be running at a fraction of that speed, because it will be doing real logic, data management, rendering of thousands to millions of vertices, etc.

While naive logic might do the worst possible case of creating tons of new allocations every frame, with some throwaway. A real game and game engine is more likely to be using pooling, arenas, being generally mindful of wasteful garbage, etc. It won't do this completely throughout the entire app, but it will do it in the most crucial sections and those will be highlighted by basic profiling and hot spot analysis. This will reduce the impact from the worst case scenario and often keep things to manageable levels.

These are the same kind of considerations you have when writing a game and/or game engine in C/C++ because malloc/free are likewise not free. In many cases, free may not even deterministically release the memory at the point you call it. Rather, many modern memory allocators are threaded and will queue a free to be deallocated by the thread that owns a heap at a point in its future. RAII is also not free and the costs of calling the destructors for objects has to be factored in many scenarios. -- It's really the same considerations that apps need to make in runtimes with a GC. Some garbage is ok, but being wasteful with memory is bad.

That isn't to say that a low latency, incremental, and/or pauseless GC wouldn't still benefit such scenarios; just that it may not actually pan out to the type of savings that the worst case scenario is highlighting.

En3Tho · 2025-05-05T16:46:04Z

On my machine this Satori easily wins in particular benchmark. It is amazing how it can maintain lower memory footprint that server gc and have faster throughput and have much faster collections.

It would be good to have it benchmarked with different kinds of applications, especially webservers, where short-lived few-frame allocations mix with long lasting ones that span through network calls. I believe such behavior can cause gen0 fragmentation and it's important to have GC that can handle such scenario.

I wonder if someone can make a repo with different kinds of GC-intensive workloads that can showcase more differences between these GCs.

Also I'm wondering why workstation GC does perform really poor/slow (7-10 times slower than Satori) while maintaining much lower memory footprint. Given single threaded nature of this benchmark I would have assumed that workstation would be a fine competitor.

Continuing @tannergooding response a little bit, I think that people should understand that GC is "just" a memory allocator. It is not magic. It has it's own pros and cons, it's own success and failure patterns and while different GCs can be suitable for different kinds of tasks it doesn't mean that it will magically solve your problems. It's much better to start caring about your code and in this particular case memory management rather than hope for the miracle that might never come.

stanoddly · 2025-05-05T17:39:54Z

@tannergooding perhaps you missed a comment from @smoogipoo: #96213 (comment)

As long as @smoogipoo meant osu! (as mentioned earlier #96213 (comment)), osu! sounds like one of the best gamedev representatives and its developers are willing to experiment.

(I'm just a random gamedev guy that is concerned about GC's unpredictable impact on a bit weak devices like Nintendo Switch.)

fawdlstty · 2025-05-06T01:36:39Z

The current GC has years of investment into being such an ideal default for most applications

@tannergooding It seems that sunk costs have affected your judgment. Rational decision-making should not only be based on who has been using it for a long time

tannergooding · 2025-05-06T01:42:13Z

It's not a question of "sunk costs". It's a question of what a good default for the majority of the ecosystem is.

Having multiple different GC's can be goodness, it is why Java has many. It is why .NET has the Workstation and the Server version, why it has had other splits in the past, why there are various configuration knobs for such GCs, etc.

Adding in another GC that is one or more of low latency, incremental, and/or pauseless can be good for the domains that need such things; but that is not necessarily the same as it being a good default for the majority of the .NET ecosystem. Much as such a GC is not the default for Java and you need to set various configuration knobs to get such behavior if it is correct for your Java app.

tannergooding · 2025-05-06T01:53:41Z

-- Notably that also isn't to say that such a GC cannot be a good default. Just that it would require significantly more investigation to prove such a point.

Rather simply, other ecosystems have almost universally decided that such GCs are not a good default for them. I would expect that .NET would come to the same conclusion for much the same reasons.

Domains, such as games and certain types of services might benefit from such a GC and would then need to opt-in. If it was somehow found to be the right choice for the .NET default, then the other GCs would likely need to continue existing so that apps where it isn't the right choice can opt into the one of the other GC's that is correct for them.

neon-sunset · 2025-05-06T02:24:29Z

Java has gone through multiple GC implementations throughout its history, and is in the process of migrating onto a new default, being ZGC which is a low-pause design.

Satori GC results are really impressive, hopefully I'll get to test them with a couple of our workloads later because much lower pause latency at no (or minor) throughput hit sounds like a no-brainer.

Even if runtime team decides not to entertain this further, there are enough people in this discussion to possibly maintain an out of tree implementation.

Thank you @VSadov for publishing it, I really hope this will help with pushing whatever GC choices .NET offers into a more competitive position. We didn't know this was actually feasible, but now that we do everyone is hungry for more 😄

peppy · 2025-05-06T09:08:46Z

Very thankful to see movement on this thread. The results look promising (and as already mentioned by @smoogipoo, we are already considering deploying to production with what we're seeing).

I'd be very interested to see benchmarks crafted to perform worse on the new GC implementation, ie showing its limitations or drawback, if they exist. That would be helpful in assessing what could potentially go wrong as a product owner, but also could help driving the ongoing discussion of whether it could be considered a "replacement" or exist as an "alternative" for specific use cases like games.

ghost added the untriaged New issue has not been triaged by the area owner label Dec 20, 2023

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Dec 20, 2023

vcsjones added question Answer questions and provide assistance, not an issue with source code or documentation. area-GC-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Dec 20, 2023

huoyaoyuan mentioned this issue Feb 23, 2024

Please introduce pauseless GC in c# #49059

Closed

mangod9 removed the untriaged New issue has not been triaged by the area owner label May 2, 2024

mangod9 added this to the Future milestone May 2, 2024

Pauseless Garbage Collector (Question) #96213

Pauseless Garbage Collector (Question) #96213

Comments

jogibear9988 commented Dec 20, 2023

ghost commented Dec 20, 2023

AlgorithmsAreCool commented Dec 20, 2023

huoyaoyuan commented Dec 21, 2023

jkotas commented Dec 21, 2023

MichalPetryka commented Dec 24, 2023 • edited Loading

jogibear9988 commented Dec 24, 2023

jkotas commented Dec 24, 2023

filipnavara commented Dec 27, 2023

fabianoliver commented Apr 21, 2024

tannergooding commented Apr 22, 2024

sgf commented May 23, 2024 • edited Loading

jogibear9988 commented May 23, 2024

hez2010 commented May 29, 2024 • edited Loading

smoogipoo commented Jun 8, 2024 • edited Loading

julealgon commented Jun 10, 2024

georgiuk commented Sep 11, 2024

fawdlstty commented Sep 25, 2024

fawdlstty commented Sep 25, 2024

VSadov commented May 1, 2025

neon-sunset commented May 1, 2025 • edited Loading

VSadov commented May 1, 2025 • edited Loading

jkotas commented May 2, 2025

kevingosse commented May 2, 2025

huoyaoyuan commented May 2, 2025

jkotas commented May 2, 2025

VSadov commented May 2, 2025 • edited Loading

VSadov commented May 2, 2025 • edited Loading

VSadov commented May 2, 2025 • edited Loading

VSadov commented May 2, 2025 • edited Loading

VSadov commented May 2, 2025 • edited Loading

VSadov commented May 2, 2025

VSadov commented May 2, 2025 • edited Loading

julealgon commented May 2, 2025

smoogipoo commented May 3, 2025

AlgorithmsAreCool commented May 3, 2025

hez2010 commented May 4, 2025 • edited Loading

fabianoliver commented May 4, 2025

tannergooding commented May 5, 2025

En3Tho commented May 5, 2025

stanoddly commented May 5, 2025

fawdlstty commented May 6, 2025

tannergooding commented May 6, 2025

tannergooding commented May 6, 2025

neon-sunset commented May 6, 2025

peppy commented May 6, 2025

MichalPetryka commented Dec 24, 2023 •

edited

Loading

sgf commented May 23, 2024 •

edited

Loading

hez2010 commented May 29, 2024 •

edited

Loading

smoogipoo commented Jun 8, 2024 •

edited

Loading

neon-sunset commented May 1, 2025 •

edited

Loading

VSadov commented May 1, 2025 •

edited

Loading

VSadov commented May 2, 2025 •

edited

Loading

VSadov commented May 2, 2025 •

edited

Loading

VSadov commented May 2, 2025 •

edited

Loading

VSadov commented May 2, 2025 •

edited

Loading

VSadov commented May 2, 2025 •

edited

Loading

VSadov commented May 2, 2025 •

edited

Loading

hez2010 commented May 4, 2025 •

edited

Loading