Tag: memory

Yield! Reconsidering APIs with Collections

Yield! Reconsidering APIs with Collections (Image by http://www.sxc.hu/)

Yield: A Little Background

The yield keyword in C# is pretty cool. Being used within an iterator, yield lets a function return an item as well as control of execution to the caller and upon next iteration resume where it left off. Neat, right? MSDN documentation lists these limitations surrounding the use of the yield keyword:

  • Unsafe blocks are not allowed.
  • Parameters to the method, operator, or accessor cannot be ref or out.
  • A yield return statement cannot be located anywhere inside a try-catch block. It can be located in a try block if the try block is followed by a finally block.
  • A yield break statement may be located in a try block or a catch block but not a finally block.

So what does this have to do with API specifications?

A whole lot really, especially if you’re dealing with collections. I personally haven’t been a big user of the yield keyword, but I’ve never really been forced to use it. After playing around with it for a bit, I saw a lot of potential. I’ve written before about what I think makes a good API. In my article, I was making a point to discuss two perspectives:

  • Who needs to implement your interface. You want it to be easy for them to implement.
  • Who needs to call your interface. You want it to be easy for them to use.

In my opinion, the IEnumerable<T> interface was a tricky thing to work with as a return value. You can essentially only iterate an IEnumerable, and at the time of calling a function, maybe that’s not what you want to do. The flip side is that for the person implementing the interface, IEnumerable<T> is a really easy interface to satisfy. However, the yield keyword has opened up some new doors.

In this article, I’d like to go over a couple of different approaches for an API and then explain why the yield keyword might be something you consider next time around. Disclaimer: I’m not claiming anything I’m about to present is the only way or the best way–I’m just sharing some of my own findings and perspective.

Interface For Returning Collections

The first type of API I’d like to look at is for returning collections. Based on my own API guidelines, I’d ideally choose an interface or class to return that provides a lot of information to the caller that is also easy to create for the implementer of my interface. The List<T> class is a great choice:

  • It’s easy to construct
  • It’s built-in to the .NET framework
  • It provides many handy functions (All of the IList<T> functionality as well as things like AddRange(), or functions that support delegates)

My next choice might be to have a return type of IList<T>, which would provide a little less ease of use to the caller, but make it even easier for the implementer of the interface. They could return arrays of type T, since an array implements the IList<T> interface, or their own custom list implementation that doesn’t inherit from the List<T> class. The differences between using IList<T> and List<T> are arguable pretty small.

A third alternative, which I would have avoided in the past, is to return an IEnumerable<T>. My opinion used to be that this made the life of the interface implementer a bit easier compared to returning an IList<T>, but complicated the life of the caller for a couple of reasons:

  • The caller would have to use the results of the function in a foreach loop.
  • The caller would have to add the items to their own collection to be able to do much more with the items.

My naive implementations of being forced to return an IEnumerable<T> were… well… crap. I would have constructed a collection within the function, fill it up, and then return it as an IEnumerable<T>. Then as the caller of my function, I’d have to re-enumerate the results (or add it to another collection):

public static IEnumerable<T> GetItems()
{
  var collection = new List<T>();
  // add all the items to a collection
  return collection;
}

private static void Main()
{
  var myCollection = new List<T>();
  myCollection.AddRange(GetItems());
  // use myCollection...

  // or.....
  foreach (var item in GetItems())
  {
    // use the items
  }
}

Seems like overkill to me with that implementation. However, we’ll examine how using yield can truly transform this into something… better. So to reiterate, a few potential implementations for an API involving collections might be:

  • Return a List<T> class
  • Return an IList<T> (or even an ICollection<T>) interface
  • Return an IEnumerable<T> interface

Constantly Creating Collections

My design decisions, in the past, were really driven by two guidelines:

  • Make it easier for the person implementing/extending the API
  • Make it easy for the person consuming the API

As I quickly illustrated in the first section, this meant that I would have a method where I would create a collection, fill it with items, and then return it. I could generally pick any concrete collection class and return it since I would usually pick a simple collection as the return type. Easy.

One thing that might be noticeable with this approach is that it looks pretty inefficient to keep creating new collections, fill them, and then return them. I’ll illustrate with a simple example. We’ll create a class that has a method on it called GetItems(). As per my reasoning presented earlier, we’ll have this method return a List<T> instance, and to make this example easier to work with, we’ll pass in an IEnumerable<T> instance. For what it’s worth, the input to this function is really just for demonstration purposes here–We’re really focusing on how we’re creating our return value.

public class CreateNewListApi<T>
{
  public List<T> GetItems(IEnumerable<T> input)
  {
    var newCollection = new List<T>();

    foreach (var item in input)
    {
      newCollection.Add(item);
    }

    return newCollection;
  }
}

And now that we have our simple class we can mock up a little test for performance… Just how inefficient is creating new lists every time?

internal class Program
{
  private static void Main(string[] args)
  {
    const int NUM_ITEMS = 100000000;
    var inputItems = new int[NUM_ITEMS];

    Console.WriteLine("API Creating New Collections");
    var api = new CreateNewListApi<int>();

    var watch = Stopwatch.StartNew();
    var results = api.GetItems(inputItems);

    foreach (var item in results)
    {
    }

    Console.WriteLine(watch.Elapsed);
    Console.WriteLine(Process.GetCurrentProcess().PrivateMemorySize64);
    Console.ReadLine();
  }
}

When I run this on my machine, I get an average of about 1.73 seconds. The memory printout I get when running is 1615908864 bytes. So is that slow? Is that a lot of memory usage? I think it’s pretty hard to say conclusively without being able to compare it against anything. So let’s keep this number in mind as we continue to investigate the alternatives.

Side Note: At this point, some readers may be saying “Well, if the input to our function was also a list (or if whatever our function has to work with was otherwise equivalent to our return value) then we wouldn’t have to go populate a new collection every time… We can just return the underlying collection”! And I would say you are absolutely correct. If your function has access to an instance of the same type as the return type, then you could always just return that instance. But what implications does this have? You’re now giving people access to your underlying internals, and they can go modify them as they please. So, if you need to control access to items being added or removed, then it might not make sense for you to expose your internal collections like this.

Yield to Incoming API Alternatives

We’ve seen how my past implementations may have looked, so how might we tweak this? If we tweak our API a bit, we can make our method return an IEnumerable<T> instead. Let’s see what that might look like:

public class YieldingApi<T>
{
  public IEnumerable<T> GetItems(IEnumerable<T> input)
  {
    foreach (var item in input)
    {
      yield return item;
    }
  }
}

So in this API implementation, all we’ll be doing is iterating over some type of collection and then yielding each result. If we run it through the same type of test as out previous API implementation, what kind of results do we end up with?

internal class Program
{
  private static void Main(string[] args)
  {
    const int NUM_ITEMS = 100000000;
    var inputItems = new int[NUM_ITEMS];

    Console.WriteLine("API Yielding");
    var api = new YieldingApi<int>();

    var watch = Stopwatch.StartNew();
    var results = api.GetItems(inputItems);

    foreach (var item in results)
    {
    }

    Console.WriteLine(watch.Elapsed);
    Console.WriteLine(Process.GetCurrentProcess().PrivateMemorySize64);
    Console.ReadLine();
  }
}

When I run this on my machine, I get an average of about 2.80 seconds. The memory printout I get when running is 449409024 bytes. How does this relate back to our first implementation? Well, it’s certainly slower. It takes about 1.62x as long to enumerate using the yield implementation as it did with the first API we created. However, yield also uses less than 1/3 (about 27.8%, actually) of the memory footprint when compared to the first implementation. Pretty cool results!

Site Note: So yield was a bit slower according to our results, but what happens if print the elapsed time before we run that foreach loop? Well, on my machine it averages at about one millisecond. Now that’s fast, right?! The cool thing about using yield with the IEnumerable<T> interface is that the work is deferred. That is, not until the program goes to actually run the enumeration do we get our performance hit. Try it out! Try moving the time printout from after the foreach loop to before the foreach loop. Try sticking breakpoints in on the line that yields. You’ll see what I mean.

Summary

In this article, I’ve explored two different ways of implementing an API (specifically focusing on the return value). We saw a brief performance analysis between the two and I highlighted some differences in both approaches. Let’s recap though:

  • Approach 1: Returning a List<T> and creating the collection ahead of time
    • Appeared to be overall a bit faster then yielding.
    • Consumed much more memory than yielding.
    • Callers can use the results immediately for enumeration, checking count, or as a collection to add more things to
    • The return type of List<T> is a bit more restrictive than an IEnumerable<T> like in the second API implementation
  • Approach 2: Return type of IEnumerable<T> and yielding results
    • Appeared to be overall a bit slower than the List<T> implementation
    • Lazy. We don’t actually execute any enumeration code until the caller actually enumerates
    • Consumed significantly less memory than the first approach using List<T>
    • Callers can enumerate the results immediately, but they need to add the results to a collection class to do much more than enumerate

So next time you’re designing an API for your interfaces and classes, try keeping these things in mind!

EDIT (December 30th, 2013):
As per some comments on Google+ by Dan Nemec, I figured I’d add a bit more here in the summary. IEnumerable<T> on it’s own is certainly not useless, especially if you’re leveraging LINQ or extension methods. My main beef in the past was that the consumer of an API with a IEnumerable<T> return value can only iterate over the results… And that’s just because that’s all that IEnumerable<T> lets you do. Dan made a great point though–If you are leveraging things like extension methods, or LINQ (which introduces tons of handy extension methods for working with IEnumerable<T>) then you get all of that functionality tacked on to IEnumerable<T>.

So if you’re not fortunate enough to be working with LINQ or extension methods (i.e. working with legacy code in old .NET framework versions… and yes I am familiar with the attribute you can add in to allow extension methods provided you have a compiler version high enough to support it), then IEnumerable<T> sometimes just plain sucks. I’d wager the majority of C# developers aren’t in this boat though, so I’d like to thank Dan again for his comments.


Events: Demystifying Common Memory Leaks

Events: Demystifying Common Memory Leaks

Background

If you’ve poked through my previous postings, you’ll probably notice that I love using events when I program. If I can find a reason to use an event, I probably will. I think they’re a great tool that can really help you with designing your architectures, but there are certainly some common problems people run into when they use events. The one I want to address today has to do with memory leaks. That’s right. I said it. Memory leaks in your .NET application. Just because it’s a managed language doesn’t mean your code can’t be leaking memory! And now that I’ve got your attention, let’s see how events might be causing some leakage in your application.

(There is source that you can download and run. Check the summary section at the end!)

Instance-Scope Event Handlers

One of the most common ways to set up an EventHandler in C# is by having them defined for the entire scope of the instance. Consider for a moment the form designer in Visual Studio. When you double click on controls you get some handler created for the default event on that control. See how the EventHandler was declared though? You get a method declared that has a sender and some type of EventArgs. Pretty standard stuff here and there’s nothing ground-breaking about it. So what’s the problem with this method?

Well, there’s nothing wrong with it as long as you know how to clean up after yourself. Consider the following two classes:


private class ObjectWithEvent
{
    ~ObjectWithEvent()
    {
        Console.WriteLine(this + " is being finalized.");
    }

    public event EventHandler<EventArgs> Event;

    public void UnhookAll()
    {
        Event = null;
    }
}

private class ObjectThatHooksEvent
{
    public ObjectThatHooksEvent(ObjectWithEvent objectWithEvent)
    {
        objectWithEvent.Event += ObjectWithEvent_Event;
    }

    ~ObjectThatHooksEvent()
    {
        Console.WriteLine(this + " is being finalized.");
    }

    private void ObjectWithEvent_Event(object sender, EventArgs e)
    {
        // some fancy event
    }
}

The first class has an event that our second class can hook onto. You’ll notice in the second class that I’ve defined an instance-scope handler that we can hook up. This is the exact same syntax for declaring an event handler that you’d get from the form designer if you’re doing GUI programming.

The danger with this setup is that until you unhook the event, the object that hooks onto the event will not be freed. “Well, no problem!” is what you might be thinking. You know how to solve that. You can just unhook the event in the second class’s finalizer/deconstructor.

…Except that won’t work. The finalizer will not get called on the instance of the second class until the event has been unhooked! It’s a bit of a chicken-or-the-egg problem, but it makes sense. A finalizer will only be called when the reference is being cleaned up, but the instance can’t be marked for cleaning because something is still using its event handler. See why this can get a bit dangerous?

Anonymous Delegate (No Parent Reference)

So this is an example of hooking events where you won’t get a leak. Why am I showing it? Well, in the next section I’ll make a small tweak to it which will make it behave just like the first scenario I described.

Let’s assume we have two classes again. I’ll use the first class from my first example (the object with the event) and this new class here that we’ll use to hook onto the event:

private class HookWithAnonymousDelegate
{
    public HookWithAnonymousDelegate(ObjectWithEvent objectWithEvent)
    {
        objectWithEvent.Event += (sender, args) =>
        {
            // handle your event
            // (this one is special because it doesn't use anything related to the instance)
            Console.WriteLine("Event being called!");
        };
    }

    ~HookWithAnonymousDelegate()
    {
        Console.WriteLine(this + " is being finalized.");
    }
}

Notice the difference from the first example? I’ve hooked up an anonymous method (using a lambda expression) to our event instead of declaring an instance-scope event handler. It’s a small change, and for the most part, I might argue that this is just a stylistic thing. If you don’t ever plan on unhooking the event then it’s not such a big deal to go with anonymous methods, but if your method body grows pretty big the code can definitely get unsightly.

Anyway… sweet! We just hooked up to our event and we don’t have the scary leak situation that we did in the first scenario. How cool is that? Well…

Anonymous Delegate (With Parent Reference)

The second method I described works great… until you go to put it into practice. It’s clearly not an impossible situation, but it’s pretty unlikely that you’ll write event handlers within an object that don’t use any of that object’s state (or even other methods on the object). Again, not impossible but just not the common use case. And since it’s not the common use case, you need to be concerned with the potentially problematic common use case 🙂

Let’s consider two classes (yes, again, two classes). We’ll use the first class I described above in both examples that has an event that we can hook onto, and a second class that looks similar to the class I introduced in the second example:

private class HookWithAnonymousDelegate2
{
    public HookWithAnonymousDelegate2(ObjectWithEvent objectWithEvent)
    {
        objectWithEvent.Event += (sender, args) =>
        {
            // handle your event and use something that's part of this instance
            SomeInnocentLittleMethod();
        };
    }

    ~HookWithAnonymousDelegate2()
    {
        Console.WriteLine(this + " is being finalized.");
    }

    private void SomeInnocentLittleMethod()
    {
        Console.WriteLine("... Not so innocent after all!");
    }
}

See the difference compared to example 2? The event handler in this class calls an instance method. This would be a pretty common thing to do (unless you like to duplicate all of your code and not use methods ever :P) and it doesn’t look like it should cause problems. And really, it won’t if you understand the implications of hooking an event handler up to an event. So once you’re done handling your events, make sure you clean up and remove your handlers!

In my opinion, the really interesting part of this example is that the event handler is only calling an instance method. It’s not even using any variables or properties of the instance. Still, the .NET framework is going to hold onto this second instance until we unhook.

Summary

Well, hopefully I haven’t scared you away from using events. The take-away point here is that you need to be mindful of hooking up your events and when/where you unhook them. Personally, unless you always plan to have two objects exist for the same lifetime, I wouldn’t hook up events in the constructors like I’ve done in my examples. Some closing tips:

  • Try only hooking onto events when you need to. If you don’t need to hook up all your events when initializing something, then don’t!
  • Be mindful of how you’re going to clean up your event hooking. Whenever you add an event handler, try to think of where you’ll be cleaning it up.
  • Hooking events onto singletons or global instances can make this problem a lot worse. Since your singleton will be around for the lifetime of your application, if you forget to unhook from your event then you’ll start accumulating a lot of garbage.

I’ve written up a little sample application that uses the example classes and walks you through the three examples I’ve outlined. All of them involve instantiating the classes, hooking up the events, and then how they behave differently when you try to clean them up. You can grab the source code from:

Hope you enjoyed! Remember to follow Dev Leader:


  • Nick Cosentino

    Nick Cosentino

    I work as a team lead of software engineering at Magnet Forensics (http://www.magnetforensics.com). I'm into powerlifting, bodybuilding, and blogging about leadership/development topics over at http://www.devleader.ca.

    Verified Services

    View Full Profile →

  • Copyright © 1996-2010 Dev Leader. All rights reserved.
    Jarrah theme by Templates Next | Powered by WordPress