Help on duplicate instances

Use this forum for questions on how to use .NET Memory Profiler and how to analyse memory usage.
Post Reply
gianpaolof
Posts: 2
Joined: Wed Jul 25, 2018 2:51 pm

Help on duplicate instances

Post by gianpaolof » Tue Aug 07, 2018 3:03 pm

Hi,
I've been using the memory profiler since a couple of weeks and I was able to find lot of issues and I'm happy about it.
Problem is I think I don't understand what "duplicate instances" are and the code I'm working on has a HUGE set of duplicate instances (array of double).
I read the documentation but I think I'm not smart enough to understand it, sorry. Can someone show me a simple code (like a console app) with such a problem so that I can run the mem profiler on it and understand how to fix this kind of issue?

Thanks a lot,
GP

Andreas Suurkuusk
Posts: 1010
Joined: Wed Mar 02, 2005 7:53 pm

Re: Help on duplicate instances

Post by Andreas Suurkuusk » Thu Aug 09, 2018 2:30 pm

Two instances are duplicates if the data reachable from each instance is identical. For double arrays, this means that they are the same length and contain exactly the sames values.

If you create two arrays using the code below, then you will have two separate double array instances, but they will contain identical data. So .NET Memory Profiler would classify them as duplicates.

Code: Select all

var firstArray = new double[] { 5, 10, 15, 20 };
var secondArray = new double[] { 5, 10, 15, 20 };
To avoid wasting memory, it would be good if both arrays used the same underlying data, e.g.:

Code: Select all

var firstArray = new double[] { 5, 10, 15, 20 };
var secondArray = firstArray;
However, there's a problem with this. First of all, you need to find the duplicate instance. You normally don't allocate two identical arrays like I did in the example. Secondly, if you use one of the references (e.g. firstArray) and change the data, then the data will change for secondArray as well. If you have assigned the same array to both references just because they were equal, this is probably not the behavior you want. To avoid this, it's safest to only share data for immutable instances. Of course it's possible to share mutable instances as well, but then care has to be taken so that they are not modified in unexpected ways.

After we wrote the duplicate instances detector we tested it on the profiler itself, and we found a lot of duplicate instances. To reduce the number of duplicate instances, we wrote two container classes that help us avoid them: SingleValueContainer and WeakSingleValueContainer. These classes will help you look up instances to share and optionally create immutable copies of the instances.

The SingleValueContainer is suitable to use when you have a clearly defined "region" where you want to avoid duplicate instances. This can for instance be when you open an XML-file or other document that includes a lot of duplicate instances.

The WeakSingleValueContainer is similar to the SingleValueContainer, but it only keeps a weak reference to the elements, so there's no need to explicitly clear the container. However, the overhead is significantly higher for the WeakSingleValueContainer, since a weak GC handle is created for each unique item. You can use the WeakSingleValueContainer when the "region" is not as clearly defined and/or when you expect to have many duplicates and only a few unique instances.

I have refactored the SingleValueContainer classes and published them on GitHub. The refactoring was more significant than I expected, hence the late reply to this post. Also the unit tests have not been updated for the new implementation, which I something I hope I will fix soon.

The short console program below shows how the SingleValueContainer can be used (ListEqualityComparer implementation can be found on the Github page):

Code: Select all

internal class Program
{
    private static void Main()
    {
        var firstArray = new double[] { 5, 10, 15, 20 };
        var secondArray = new double[] { 5, 10, 15, 20 };

        bool areSame = ReferenceEquals(firstArray, secondArray);
        Console.WriteLine(areSame ? "firstArray and secondArray are the same" : "firstArray and secondArray are NOT the same");

        // Create a single value container that will provide single instances of read-only double lists (IReadOnlyList<double>). 
        // A comparer is needed to compare the contents of the lists (rather than the list references).
        var container = new SingleValueContainer<IReadOnlyList<double>>(ListEqualityComparer<double>.Default);

        var firstSingleList = container[firstArray];
        var secondSingleList = container[secondArray];

        /// We no longer need the original data.
        firstArray = secondArray = null;

        bool areSingleSame = ReferenceEquals(firstSingleList, secondSingleList);
        Console.WriteLine(areSingleSame ? "firstSingleList and secondSingleList are the same" : "firstSingleList and secondSingleList are NOT the same");

        Console.ReadLine();
    }
}
The problem with this code is that even if firstSingleList and secondSingleList are read-only lists, they both still reference the same underlying double array (firstArray). Using a cast (or accessing it using firstArray) it is still possible to modify the data. To avoid this, a "key" creator can be provided to the SingleValueContainer constructor. This creator can make sure that the returned instance is immutable, e.g. item => item as IImmutableList<double> ?? ImmutableArray.CreateRange(item).

The code with the key creator now looks like this:

Code: Select all

internal class Program
{
    private static void Main()
    {
        var firstArray = new double[] { 5, 10, 15, 20 };
        var secondArray = new double[] { 5, 10, 15, 20 };

        bool areSame = ReferenceEquals(firstArray, secondArray);
        Console.WriteLine(areSame ? "firstArray and secondArray are the same" : "firstArray and secondArray are NOT the same");

        // Create a single value container that will provide single instances of read-only double lists (IReadOnlyList<double>). 
        // A comparer is needed to compare the contents of the lists (rather than the list references).
        // A key creator is provided to make sure that the lists stored in the container are actually immutable.
        var container = new SingleValueContainer<IReadOnlyList<double>>(
            ListEqualityComparer<double>.Default,
            item => item as IImmutableList<double> ?? ImmutableArray.CreateRange(item));

        var firstSingleList = container[firstArray];
        Debug.Assert(!ReferenceEquals(firstArray, firstSingleList), "firstSingleList should not be same as firstSingleArray, since it is not immutable.");

        var secondSingleList = container[secondArray];

        /// We no longer need the original data.
        firstArray = secondArray = null;

        bool areSingleSame = ReferenceEquals(firstSingleList, secondSingleList);
        Console.WriteLine(areSingleSame ? "firstSingleList and secondSingleList are the same" : "firstSingleList and secondSingleList are NOT the same");

        Console.ReadLine();
    }
}
I hope that you find this helpful and that you may be able to use the SingleValueContainer or WeakSingleValueContainer to reduce your duplicate instances. I realize that it may not be trivial, or even possible, to replace your double arrays with IReadOnlyList<double>.
Best regards,

Andreas Suurkuusk
SciTech Software AB

Post Reply

Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 9 guests