I was getting familiar with profiler and it seems pretty capable to me.
I've got question regarding memory fragmentation, what are best techniques to analyze it?
What I figured so far is to look at real-time monitor while taking snapshots of Gen #0 and comparing Total bytes and Live bytes.
Also, I took a look at native memory's overhead node.
As far as our scenario in particular goes, we're having large byte arrays that go to LOH. Our WinForms 32-bit CLR 2.0 process gets out of memory even though we haven't spotted any memory leaks so far. Also, it can go OOM sometimes around 600 MB but sometimes also around as low as 300 MB which points me to the fragmentation.
As a side note, let's say the process has 600 MB commit size, it is usually sitting like that for a while. If I take a full snapshot, it goes down below 200 MB instantly.
It seems like GC is not aggressive enough. Would it be better to switch to server GC. The process is running on the Win Server 2008 R2 with several cores.
Thank you a million!
Total bytes: 22 MB
Live bytes: 16 MB
Gen #0 GCs: 628
Gen #1 GCs: 531
Gen #2 GCs: 179
Total: 453 MB (562 MB with profiler data)
Normal heap: 11 MB
LOHLarge heap: 753 KB
Overhead/unused: 23 MB
Unreachabe instances: 278 MB
Total: 721 MB (833 MB with profiler data)
Normal heap: 11 MB
Large heap: 753 KB
Overhead/unused: 25 MB
Unreachale instances: 350 MB
Total managed heaps usage 76% of private comitted memory
Managed heaps utilization: 1%
Available virtual memory: 1,026 MB
Largest allocatable lock: 240 MB
Large heap usage: 73% of private committed memory
Large heap utilization: 0%
Large heap fragments: 35
Wasted large heap memory (small gaps): 4 KB (0% of large heap memory)
In your case, I assume that you have allocated a set large instances (a few 100 MBs), and the allocated a lot of small instances. This could for instance happen when you open a large document (which will create a set of large instances) and then work with the document (which may create many UI related short lived instances). When you work with the document, all newly created instances are GCed during a gen #0 but none of the larger instances are collected during a full GC, and thus the garbage collector learns that it's very efficient to perform frequent gen #0 collections and very few full collections. When the document is closed all large instances becomes unreachable, but no full GC will be triggered to collect them. If you expect to recreate a new set of large instances, this should not be a problem. The runtime will perform a full GC when running low on memory, and the memory will be reused. In some cases it might also be justified to call GC.Collect to reduce the memory usage (after releasing a large amount of long-lived instances).
If you allocate very large objects in a 32-bit process (e.g. > 100MB), you don't need very much memory fragmentation to get an OOM exception, even though 300MB committed memory seems very low for an OOM. In your example, you have 35 fragments in the large object heap, so this will limit the size of your large objects, but, still, the largest allocatable object is 240MB.
Switching the the server GC will most likely not help, as the memory overhead is often higher when the server GC is used. Instead I recommend that you try to optimize your memory allocation pattern. If possible:
- Try to avoid allocating differently sized large objects
- Split the large objects into smaller "chunks", e.g. using some paging algorithm.
- Clean up your "old" large objects (i.e. remove the references) before creating a new set of large objects
SciTech Software AB
Thank you so much for detailed explaination.
Reusing memory when it's low is exactly what I expected but it is not hapenning - even though all large instances are unreachable i.e. eligible for full collection, GC doesn't kick in on the next large allocation but rather OOM is thrown bringing process down.When the document is closed all large instances becomes unreachable, but no full GC will be triggered to collect them. If you expect to recreate a new set of large instances, this should not be a problem. The runtime will perform a full GC when running low on memory, and the memory will be reused. In some cases it might also be justified to call GC.Collect to reduce the memory usage (after releasing a large amount of long-lived instances).
That brought me to fragmentation conclusion at the beginning.
My allocation pattern is that process is usually very idle i.e. every 15 mins there's request which allocates from 1 - 100 byte arrays totalling to 5 MB - 250 MB in a very tight loop, assigning arrays to the fields. Profiler didn't spot any reachable instance of those arrays when doing snapshots. The only reachable ones were when I took the snapshot while OOM was beeing thrown or when I took the snapshot during the request being dispatched which I think shouldn't be counted at all.
How about memory pressure?
Is there a possibility that GC doesn't do full collection because the pattern doesn't require so, but once the pressure is high (allocations/sec vs. available mem) it kicks in. However, it cannot keep up with the pressure. Especially considering it's workstation GC in concurrent mode where Gen #2 (LOH) collection can be performed without pausing. In other words, is it possible that there's enough unreachable memory (which becomes free after the collection) and the largest allocatable block fits allocation but still OOM can be thrown?
Does this reasoning make sense?
Thanks once again.
As I understand it, you are reproducing the memory problem while running the process under the profiler. Is that correct? When running under the profiler, the concurrent GC is not enabled, so that should not affect this problem.
How did you collect a snapshot at the time of the OOM exception? Did you run under the debugger or did you attach to the process? Were you able to see how big the instance being allocated was when the OOM error occurred?
Maybe you can try to enable peak snapshot collection and see if you get some better information.
SciTech Software AB
Can I enable peak snapshot collection with it?
I tried with the following arguments:
nmpcore.exe /r /peaksnapshot+ /sf "C:\MemSessions\DumpSession.prfsession" /p "exepath"
But I don't see Peak snapshot in a saved session.
As a side note, I have noticed that my application when opened for the first time (without serving any request where LOH byte arrays are involved) has breakdown like:
Total bytes: 3 MB
Live bytes: 3 MB
Manage heaps: 6 MB
Normal heap: 4 MB
Large heap: 1 MB
Available virtual memory 1.5 GB
Largest allocatable block: 612 MB
Large heap usage: 2% of private committed memory
Large heap utilization: 49%
Large heap fragments: 2
Wasted large heap memory (small gaps): 0 KB (0% of large heap memory)
All my snapshots are Gen #0.
Once it starts processing requests, Total bytes and Live bytes start to differ usually in range of as high as 25 MB total - as low as 5 MB live for smaller byte arrays and as high as 250 MB total - as low as 30 MB live for larger ones. My largest allocatable block is then usually around 250 MB.
Intrestingly enough I found in different session from another box where the application is running that Snapshot 7 and 8 although have very similar memory usage their Largest allocatable block differs for over 430 MB:
The address space of a 32-bit process is very limited when you try to allocate large objects. A 600 MB block can easily be fragmented by a lot of things; such as loading libraries, making minor allocations. How the address space is used can differ between machines, depending on factors such as version of the .NET runtime and other libraries and possibly drivers.
As I mentioned previously, I would recommend that you try to allocate smaller memory blocks is possible. In earlier versions of the profiler, we allocated large blocks of memory in the profiled process. This could cause OOM-exceptions even if only 1GB of memory was committed. We redesigned the parts of the profiler that created large blocks, and tried to keep the largest block smaller than 1 MB. After this redesign, we usually don't see an out of memory error until close to 2GB is committed (in a process with a 2GB address space).
Since you provided the "/r" argument to NmpCore, I assume that you are profiling a pre-.NET 4.0 process. If possible I would recommend that you run under .NET 4.0 or later, as I believe that Microsoft has improved the large object heap in later versions. .NET Framework 4.5.1 also includes an option to compact the large object heap, which might help with your problem.
SciTech Software AB
Thank you for looking into it.
I used profiler for last 7 days of my trial and it was beyond helpful. I tried Red Gates' profiler as well, while it's got nice UI, it's far behind yours in terms of in-depth memory analysis.
I knew it's capable tool the moment Wintellect guys recommended it
Anyway, I just bought license for myself. Will be advocating it to my employer as well.
Thank you once again for bringing this jewel.
Users browsing this forum: Google [Bot] and 14 guests