Page 1 of 1

Strategy for automating leak detection

Posted: Thu Nov 17, 2011 10:35 pm
by kellyconway
I used .NET Memory Profiler to correct memory leaks in a large ASP.NET / Visual WebGUI application by, as seems typical, aggressively unwiring event handlers and calling .Dispose() on no-longer-needed objects.

Now, I've written code that instantiates and disposes each of our application's pages, which I want to use for automating detection of future leaks. It takes (using the MemProfiler API) an "initial" snapshot at the start of the process, and a "final" snapshot at the end of the process. Optionally, based on an additional URL query parameter, it also takes additional snapshots after opening each page (so that I can see the number of bytes held, etc). At the very end, it makes a couple of assertions that I plan to use the result of for logging and alerting purposes.

This works "OK" except for a couple of issues I could use some help/ideas on.

1. At the end of the process, after I use the API to take the "final" snapshot (all of my snapshots are full ones), I execute 2 MemAssertion.NoInstances() calls with wildcard string parameters to assert that none of my pages (from 2 different namespaces) are left in memory. I have seen the assertions (rarely) pass, but most times they fail. When the dialog appears to alert me of the failure, I ensure the box is checked to take a(nother) full snapshot and then I click OK. In the snapshots that are created by this dialog, my object instances are gone. I.e., if I make the same assertion again, it passes the 2nd time -- if I take the additional snapshot offered by the UI. The problem is that I plan to check the bool return value from the assertions and, if false, log a potential memory leak issue that I'll be notified of and can investigate. Remembering that I want to run this in an automated fashion (i.e., nobody will be around to collect the extra snapshot in the UI), what do you recommend I do to avoid these false positives? I already have tried pausing for up to 30 seconds, doing extra snapshots, etc. The only extra snapshots that seem to clear out my objects, for some reason, are the ones I can do manually in the UI after the assertion fails. Are those snapshots different behind the scenes than the ones I get by calling the API MemProfiler.FullSnapShot("My Snapshot Name Here") somehow?

2. The process is instantiating and disposing a little over 30 "pages" (actually user controls that are hosted inside a VWG form class). About half-way through the process, the per-page snapshots cease to be taken. No error, but also no snapshots after a certain point. The whole process is inside a try catch that throws the ex, and also uses MemProfiler.AddReadTimeComment(ex.ToString()) to place the ex info right into the real-time graph. No ex is being thrown at this point; the process continues but no additional snapshots are taken until the "final" one at the end of the process, which I always get. Weird, I know. I also introduced a 5-second sleep between page instantiations, but that had no affect (other than spacing out my real-time graph comments about each page loading, which was nice).

Any ideas what I should try / look for to fix these issues? #1 is by far the higher priority, but assistance on either or both issues is greatly appreciated.


Kelly Conway
RedCard Systems

Re: Strategy for automating leak detection

Posted: Fri Nov 18, 2011 10:03 am
by Andreas Suurkuusk
1. A snapshot collected using MemProfiler.CollectFullSnapshot should be equivalent to collecting a snapshot from the user interface. This makes it hard to understand why you are seeing different behaviour between the snapshots.

If I understand correctly, you perform the following actions before exiting the process:


And the NoInstances assertion fails, but when the snapshot for the failed assertion is collected, the instances have been collected. Is this correct?

Are you seeing the instances in the snapshot you collected before the assertion?

2.There is a limitation on the number of snapshots that can be collected using the API. This limit is set to 10 by default but can be changed using the session settings. The setting is changed using the field "Maximum number of software triggered snapshots", under the Snapshot page of the session settings. Or you can change the setting using the command line parameter /maxsnapshots.

Re: Strategy for automating leak detection

Posted: Fri Nov 18, 2011 2:02 pm
by kellyconway
Thanks, Andreas. Yes, you have summarized my assertion issue correctly. And, yes, the offending objects do appear (marked as potential leaks due to the assertion failures) in the snapshot that my code generates with MemProfiler.FullSnapShot("Final - After Disposing All Pages"), which is immediately followed by a MemAssert.NoInstances("My.Namespace.And.MyTypePrefix*") wildcard assertion (and a 2nd wildcard assertion with a different namespace). They do not appear (unless I comment my dispose call in our base) in the subsequent snapshots that I can generate either by choosing that option in the assertion failure dialogs or by clicking the snapshot button in the UI again after dismissing the assertion failure dialogs. As I said, weird.

Thanks too for your answer on #2 -- I remember having seen that option now.

Re: Strategy for automating leak detection

Posted: Fri Nov 18, 2011 2:06 pm
by kellyconway
While looking for the option to allow more snapshots per session, I noticed that the option "Suspend thread until memory leak is handled" was unchecked. I will turn that on, give this another try, and report back.

Attempt #1

After adjusting the max snapshots option to 40, profiler continued attempting to collect snapshots after each page load. I say "attempting" becuase, after 30+ of them, I started getting errors for each one. I submitted the first 3, with log info, when prompted by the profiler UI. At the end of the process, I also get a .NET Memory Profiler captioned message box that said "The operation failed with the following error: An attempt was made to move the file pointer before the beginning of the file."

I'm not too concerned about this actually, as I figure it's only because I was trying to generate so many full snapshots. I will instead change my memory testing URL to allow specifying *a* page for which to gen a full snapshot during the test. There shouldn't be a need to generate a snapshot *per* page in a single test run.

My assertions also failed at the end of this test, even though I had change the setting for suspending while collecting a snapshot. I thought maybe that would have fixed it. Will try again though - maybe the above issues caused that to fail too.

Attempt #2

OK, this is a little different from my OP, but still an issue with the assertions. I ran the version that just does:

1. MemProfiler.FullSnapShot("Initial");
2. Instantiates and disposes each page in our application (this time, without the additional snapshot per page load)
3. MemProfiler.FullSnapShot("Final");
4. MemAssertion.NoInstances("My.Namespace.1.TypePrefix*");
5. MemAssertion.NoInstances("My.Namespace.2.TypePrefix*");

Both assertions failed, causing the dialog to appear (I'm running using the UI at this point; not command-line mode which is the ultimate goal here).

For both assertion failures, I chose the option to generate a snapshot.

When that was alll done, I looked at the last 3 snapshots (the "Final" one from my code, and the 2 from the assertion failure dialogs).

This time (likely due to my having changed the "suspend" option) the 2 snapshots from the assertion failure dialogs were identical (Delta = 0) to the "Final" one that I captured in code. That seems more like what we'd expect, though still all 3 snapshots showed that I had types (38 total) from both namespaces hanging around at that point.

THEN, I took an additional snapshot by clicking on the camera button (not Gen #0; full) in the UI. That last snapshot showed that 36 of my 38 offending types were no longer in memory (yes, I still have to stubborn types to fix). Delta was -35 MB between this manual snapshot and the previous 3.

HOWEVER, I believe I know what caused this and it would NOT be a .NET Memory Profiler issue. At the end of the test method (I'd forgotten about this until just now), I issue:

Context.Redirect("<our main page>");

That code would have executed (I guess) after the "Final" and assertion-failure-triggered snapshots, and before I had the chance to trigger the manual snapshot that showed the intances gone.

This may all just be an artifact of my test method, since when I profile the app and manually open/close pages, .NET Memory Profiler does show that the pages go out of memory. So, I still have some work to do on my test process (perhaps merely by moving my final snapshot and assertsions after the redirect, though I'm not sure that wouldn't be "cheating" if I leave the Session.Abandon call in there...), but I no longer suspect that the MemAssertion is misbehaving for me.

Sorry for the "book" but I figured, once I'd started the thread, I should conclude it for the possible help of somone else who may run into a similar situation.

Re: Strategy for automating leak detection

Posted: Fri Nov 18, 2011 4:09 pm
by kellyconway
OK, so "stream-of-consciousness" forum posting is probably a bad idea. Sorry. :-)

As it turns out, I'm still seeing something a little strange in the assertions. I've modified my code now to be as follows:

1. Log in exactly the way a user would (sets up some things in the session, establishes database connection strings, etc).
2. Full snap shot "Initial"
3. Open and dispose each page in a way that's pretty much indential to a user manually doing it
4. Log out exactly the way a user would (clears out the session, redirects to the login page)
5. Full snap shot "Final"
6. Assertion #1 (no instances of a namespace's types)
7. Assertion #2 (no instances of a 2nd namespace's types)

I'm no longer calling the Session.Abandon, since we don't do that in actual usage. I did however move the final snapshot and assertion calling to happen after I've called our logout procedure that clears some things out of the session and redirects to our login page.

Still, the snapshots that I capture in steps 5-7 above show that I have 38 types (the pages my code opened and disposed) in memory. Then, a snapshot that I generated in the UI by clicking the camera icon (after no additional code should have executed since the previous assertion-failure snapshot) shows that 36 of those types no longer were in memory (2 still are legitimately leaking).

The overall delta from the last auto-generated snapshot to the manually generated one was around -37MB. The total live bytes at that point is fairly close (within 2-3 MB) to the amount from the initial snapshot. Both of those facts again seem to show that the memory from those 36 pages was reclaimed, but only after I manually requested a snapshot in the UI.

I should point out here that, yes, the 2 still-leaking pages would cause the assertions to fail, however the assertions list all of the types matching the wildcard that remain in memory and they are *all* in there; not only the 2 leaky ones. In fact, if I take those 2 out of my test process, I still have the assertions failing even though the snapshot that I take later in the UI shows 0 types matching the wildcards remain in memory.

So, still seems strange to me and I'm not sure what I can do in code to get a snapshot like the last one that I'm getting manually.

Weird. Thanks again for your consideration, and any ideas you can provide.

Re: Strategy for automating leak detection

Posted: Sun Nov 20, 2011 10:07 pm
by Andreas Suurkuusk
I'm currently on a software conference, so I have limited possibilities to look into this. I will reply to you as soon as I return (on Tuesday).

Re: Strategy for automating leak detection

Posted: Tue Nov 22, 2011 8:19 am
by Andreas Suurkuusk
When working with ASP.NET or any other framework that caches or stores data based on a timeout, there's always a risk that you have left-over instances that will be removed when a timeout expires. In your case it seems possible that you still have session data that keep your instances alive even after you have logged out the user. If the left-over instances are kept by the ASP.NET session data or cache data, this will be indicated by the comparison analyzer (in the yellow info area above the type).

To clear all session data, you will need to call Session.Abandon, or wait for the session time out to expire. If you want to avoid calling Session.Abandon, you can set the session timeout to a low value (e.g. 10 seconds) and make sure that you wait a little longer than that before collecting the "Final" snapshot.

As I mentioned previously, there should be no difference between a snapshot collected by the UI and a snapshot collected using MemProfiler.FullSnapshot.

8. Wait for a while (longer than the session lifetime)
9. MemProfiler.FullSnapshot( "Last" )

Should be equivalent to:

8. Wait for a while (longer than the session lifetime)
9. Collect snapshot in UI

If this is not the case, please tell me, since then there might be a problem with the profiler that we must correct.

Re: Strategy for automating leak detection

Posted: Tue Nov 22, 2011 1:53 pm
by kellyconway
Alright, I'll keep banging on it. I had already tried waiting as much as 30 seconds (in code) before doing the final snapshot, as well as having had the Session.Abandon() in my code previous to the final snapshot. Neither of those seemed to change the described behavior before. I'll try those and some other things, if necessary, again, since you've verified the strategy should be working. Thanks.

Re: Strategy for automating leak detection

Posted: Tue Nov 22, 2011 9:50 pm
by kellyconway
OK, after making quite a few changes to my test method today, I was finally able to get it to work the way I wanted.

Not really sure what it was exactly that had been causing the discrepancy between snapshots taken a few seconds apart when no code should have been running, but I was able to change the test method code so that it now does exactly would a user would do if he/she were to log in, open each page of our application in fairly quick succession, and then log out.

Between that, and changing from the "generic" assertions to using an instance of the AssertionsDefinition class to exclude a couple of sticky classes, I can now reliably detect whether a new event handler gets wired but not unwired or something else causes a new memory leak in our application.

Re: Strategy for automating leak detection

Posted: Wed Nov 23, 2011 9:23 pm
by Andreas Suurkuusk
Thanks for the update. I'm glad that you got it working correctly.

I noticed that I didn't comment on the problem you were having when collecting 40 snapshots. In the version you are using there is a size limit of 2GB on the session files and if you collect a lot of snapshots, you will reach this limit. This limit was not correctly checked, which causes the "file pointer" error message you were seeing. In version 4.0.118, we have extended the limit to 4GB and we have added a better error message that more clearly tells you that you have reached the file size limit.

We're also working on a new file format for the session files. Using the new file format, there will no longer be a 4GB file size limit.