-
Notifications
You must be signed in to change notification settings - Fork 5k
MemoryCache creates lots of tasks that cause CPU/Memory spikes on compaction. #97736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Tagging subscribers to this area: @dotnet/area-extensions-caching Issue DetailsTitle:MemoryCache creates lots of tasks that cause CPU/Memory spikes on compaction. Description:The problem we’re addressing here is the significant increase in memory and processor usage when a cache nears its size limit and begins the compaction process. This issue becomes particularly noticeable under certain conditions. Specifically, this situation arises when the cache size limit is sufficiently large and items are being added to the cache frequently. As the cache fills up and approaches its size limit, it triggers the compaction process, leading to a surge in memory and processor usage. The reason behind this surge is tied to the This was discovered on and tested on .net 6 but from code seems like the same problem exists on other versions too. How the problem was discoveredAfter we started using MemoryCache in our application we started seeing short server load, CPU and memory spikes. Load + requests: CPU load at the same time when big spike happenned: Memory usage (first image on is needed to see usage before spike, so that we can see that it has increased from 30GB to 95GB and then returned back to normal after spike ended): To find what exactly caused those spikes we collected a diagnostic trace during such a spike. By analyzing the diagnostic trace, we can see that owerwhelming ammount of time is taken by MemoryCache.Compact(). But we newer call it from our code, so its only calls that are made by MemoryCache itself. Problem in codeThe problem lies in MemoryCache.cs, where on each set, if private void TriggerOvercapacityCompaction()
{
if (_logger.IsEnabled(LogLevel.Debug))
_logger.LogDebug("Overcapacity compaction triggered");
// Here we dont have any mechanism that would protect us from running
// this on too many threads at the same time.
ThreadPool.QueueUserWorkItem(s => ((MemoryCache)s!).OvercapacityCompaction(), this);
} Making matters worse, running compaction on multiple threads does not make it faster, since many threads will already fill priorities buckets lists in Reproducing it locallyHere i have created a simple script that reproduces the problem locally on my machine on .net 6: using Microsoft.Extensions.Caching.Memory;
// This is needed to simulate server environment, where lots of threads are awailable in thread pool.
// Also COMPlus_ThreadPool_ForceMinWorkerThreads environment variable can be used instead
ThreadPool.SetMinThreads(1000, 1000);
// Here SizeLimit number should be high enough to be able to reproduce the issue
IMemoryCache memoryCache = new MemoryCache(new MemoryCacheOptions { SizeLimit = 700000 });
long key = 0;
// Fully fill in cache to its maximum capacity to simulate moment when capacity limit is reached
for (int i = 0; i < 700000; i++)
{
memoryCache.Set(key.ToString(), 0, new MemoryCacheEntryOptions { Size = 1, AbsoluteExpirationRelativeToNow = TimeSpan.FromSeconds(1000) });
key++;
}
Console.WriteLine("Filled up cache initially. Starting setting cache values in 1 second.");
Thread.Sleep(1000);
var tasks = new List<Task>();
// Here we are creating lots of tasks that add items to cache at the same time
// to simulate busy cache. Numbers here are just magic numbers, but number of threads
// and number of items added by each thread should be high enough to reproduce the issue
for (long i = 0; i < 70; i++) {
var startIndex = 7000 * i;
var endIndex = startIndex + 7000 - 1;
startIndex += key;
endIndex += key;
// Create threads that will be adding items to cache
var task = Task.Run(() => { setCacheValues(startIndex, endIndex); });
tasks.Add(task);
}
// Wait untill all items were added (since we dont have enough space for all of them, compact will be frequentry triggered to free up some space)
Task.WhenAll(tasks).Wait();
Console.WriteLine("Finished adding elements.");
void setCacheValues(long startIndex, long endIndex) {
for (long i = startIndex; i < endIndex; i++) {
memoryCache.Set(i.ToString(), 0, new MemoryCacheEntryOptions { Size = 1, AbsoluteExpirationRelativeToNow = TimeSpan.FromSeconds(1000) });
}
} When running it, after initially filling up the cache to its full capacity you would notice that memory usage would grow very fast and can reach very high values. (Abnormal, since after filling cache up to its capacity its should not grow inside, because it should remove old items to add new ones) Also you would notice that it takes abnormally long to run this untill it outputs "Finished adding elements." (20 minutes on my machine). We took a memory dump about a minute after starting this script. Analyzing it with visual studio we can see that we have 81k compact tasks waiting in thread pool queue, which prooves that We also used WinDebug and SOS !dumpheap command to analyze dump produced by this script. We can see that we have 12GB of memory taken by objects of type Proposed solutionAs a solution we can limit the number of threads that can run compact at the same time to one. This will not cause compacting to be slower since having multiple threads run This can be done by replacing current private int lockFlag = 0;
private void TriggerOvercapacityCompaction()
{
if (_logger.IsEnabled(LogLevel.Debug))
_logger.LogDebug("Overcapacity compaction triggered");
// If no threads are currently running compact - enter lock and start compact
// If there is already a thread that is running compact - do nothing
if (Interlocked.CompareExchange(ref lockFlag, 1, 0) == 0)
// Spawn background thread for compaction
ThreadPool.QueueUserWorkItem(s =>
{
try
{
((MemoryCacheChanged)s!).OvercapacityCompaction();
}
finally {
lockFlag = 0; // Release the lock
}
}, this);
} After making this change local problem has resolved and now it takes 5 seconds to run the script that was running 20 minutes before and now it consumes less then 500MB memory, when before it consumed up to 20GB.
|
Thanks for filing this. Your solution makes sense to me, are you interested in submitting a PR with your fix? |
Does this also happen with the new |
|
Yes, i would be interested in creating PR for this, but i just noticed that somebody already created PR: #103992 I think it would be fair if I had a chance to continue working on a fix for this issue since I have spent a lot of time to investigate the problem, created this issue and proposed the solution. So what should I do in this case? |
@myk0la999 you could review the change. Additionally, I hope @ADNewsom09 can amend the first commit to list you as the author and he as the committer, hopefully that gives both of you credit. |
I spent a bit trying to amend the older commit following https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/changing-a-commit-message and https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors and haven't been able to put everything together. |
The change is exactly what i proposed, so i there is nothing new for me to review. But there was a question about memory model on that MR and i answered it. (I am technically not MR author but since i investigated this before proposing the solution, i already have an answer to that question) |
Description:
The problem we’re addressing here is the significant increase in memory and processor usage when a cache nears its size limit and begins the compaction process. This issue becomes particularly noticeable under certain conditions.
Specifically, this situation arises when the cache size limit is sufficiently large and items are being added to the cache frequently. As the cache fills up and approaches its size limit, it triggers the compaction process, leading to a surge in memory and processor usage.
The reason behind this surge is tied to the
TriggerOvercapacityCompaction()
function. Each time an item is set in the cache, this function creates a task that runsCompact()
. So in case if items are added to cache frequently when cache reached its size limit, a lot of tasks are created. However, having multiple threads running theCompact()
function doesn’t actually speed up the compaction process. This is because all these threads are essentially performing the same work of iterating over all cache entries and sorting them by last accessed time. Therefore, there’s no benefit to having more than oneCompact()
call running simultaneously. Large number ofCompact()
tasks running simultaniously leads to the increased memory and processor usage. This seems to happenning when number of workers in thread pool is sufficiently large, for example whenCOMPlus_ThreadPool_ForceMinWorkerThreads
environment variable is set and has a big value.This was discovered on and tested on .net 6 but from code seems like the same problem exists on other versions too.
How the problem was discovered
After we started using MemoryCache in our application we started seeing short server load, CPU and memory spikes.
Load + requests:
CPU load at the same time when big spike happenned:
Memory usage (first image on is needed to see usage before spike, so that we can see that it has increased from 30GB to 95GB and then returned back to normal after spike ended):
To find what exactly caused those spikes we collected a diagnostic trace during such a spike.
By analyzing the diagnostic trace, we can see that owerwhelming ammount of time is taken by MemoryCache.Compact(). But we newer call it from our code, so its only calls that are made by MemoryCache itself.
Problem in code
The problem lies in MemoryCache.cs, where on each set, if
UpdateCacheSizeExceedsCapacity()
evaluates to true,TriggerOvercapacityCompaction()
gets called, and it runsOvercapacityCompaction()
on a thread from thread pool. And if cache set happens frequently enough too many tasks that run compaction are created.Making matters worse, running compaction on multiple threads does not make it faster, since many threads will already fill priorities buckets lists in
Compact()
with whole cache before any entries are actually removed. So those threads will still sort all of those items by last accessed time inExpirePriorityBucket()
even if they are already removed from underlying ConcurrentDictionary making them run even longer (especially if cache size is huge).Reproducing it locally
Here i have created a simple script that reproduces the problem locally on my machine on .net 6:
When running it, after initially filling up the cache to its full capacity you would notice that memory usage would grow very fast and can reach very high values. (Abnormal, since after filling cache up to its capacity its should not grow inside, because it should remove old items to add new ones) Also you would notice that it takes abnormally long to run this untill it outputs "Finished adding elements." (20 minutes on my machine).
We took a memory dump about a minute after starting this script. Analyzing it with visual studio we can see that we have 81k compact tasks waiting in thread pool queue, which prooves that
TriggerOvercapacityCompaction()
spams them.We also used WinDebug and SOS !dumpheap command to analyze dump produced by this script. We can see that we have 12GB of memory taken by objects of type
Microsoft.Extensions.Caching.Memory.MemoryCache+CompactPriorityEntry[]
. This type in used only inCompact()
and not referenced anywhere else. So we can see that it is the reason that causes high memory usage.Proposed solution
As a solution we can limit the number of threads that can run compact at the same time to one. This will not cause compacting to be slower since having multiple threads run
Compact()
doesnt make it faster, since they all would do the same work.This can be done by replacing current
TriggerOvercapacityCompaction()
with:After making this change local problem has resolved and now it takes 5 seconds to run the script that was running 20 minutes before and now it consumes less then 500MB memory, when before it consumed up to 20GB.
The text was updated successfully, but these errors were encountered: