Iterators - An Elementary Perspective on How They Function

If you're newer to C# or programming in general, you may have used an iterator and not even realized it. Iterators can be a performant and effective tool that we have access to as .NET developers that allow us to traverse collections of data. Because one of the requirements of an iterator is that it must implement the IEnumerable interface, the results of an iterator can only be enumerated over. For example, you could use the results of an iterator in a foreach loop but you could not directly index into the iterator results (like you could an array) without some additional steps. Another requirement of iterators is that they use a special keyword called "yield" so that they can yield and return the individual elements that are to be provided to the caller of the iterator.

In a nutshell, iterators allow you to write code that will feed one item at a time to a caller that is enumerating an IEnumerable one element at a time. But what better way to learn about this than with some simple code examples?

By the way, these are all available on GitHub if you want to visit this link and try it out yourself!

A Companion Video on Iterators!

Simple Iterators

Let's have a look at the following block of code:

int[] dataset = new int[] { 1, 2, 3, 4, 5 };

// function that returns the collection directly
IEnumerable<int> SimpleEnumerable()
{
    return dataset;
}

// function that is an "iterator"
IEnumerable<int> SimpleIterator()
{
    foreach (var item in dataset)
    {
        // check out this fancy keyword "yield"!
        yield return item;
    }
}

As we can see, we declare an array with 5 elements inside of it. The first function that is called SimpleENumerable directly returns the array. As such, it is meeting only one of the two requirements we mentioned for an iterator (that it has an IEnumerable return type, but it does not yield return) and therefore is just a normal method.

Conversely, when we look at the second function called SimpleIterator, we can see that it does in fact meet both requirements. It's also important to note that we are able to yield and return individual items because we have a foreach loop iterating over the individual items of the dataset array. So to say a different way, we are yielding back each individual element of the dataset array to the caller which will be able to enumerate the results of our iterator one at a time.

The output of running this will not be very surprising as it will appear just like it would if we iterated over the array itself directly. The output is as follows:

Foreach in simple iterator...
1
2
3
4
5

Building on the Simple Iterator Example

Iterators afford us the ability to inject additional logic during the iteration process. We'll start with a simple example just to illustrate the execution order where you could theoretically hook into for your iterators:

IEnumerable<int> ConsoleWritingIterator()
{
    Console.WriteLine("Printing to console at start of iterator!");
    foreach (var item in dataset)
    {
        Console.WriteLine("Printing to console before yield return!");
        yield return item;
        Console.WriteLine("Printing to console after yield return!");
    }

    Console.WriteLine("Printing to console at end of iterator!");
}

Console.WriteLine("Foreach in console writing iterator...");
foreach (var item in ConsoleWritingIterator())
{
    Console.WriteLine(item);
}

The code above shows an iterator just like from the first example, but we've added additional writing to the console. We take note of the entering of the iterator, before and after the yield return line, and finally at the end of the iterator. While this example is contrived (i.e. you likely don't need to write iterators that are writing aggressively to the console) when we look at the resulting output we should be able to understand the behavior this gives us access to:

Foreach in console writing iterator...
Printing to console at start of iterator!
Printing to console before yield return!
1
Printing to console after yield return!
Printing to console before yield return!
2
Printing to console after yield return!
Printing to console before yield return!
3
Printing to console after yield return!
Printing to console before yield return!
4
Printing to console after yield return!
Printing to console before yield return!
5
Printing to console after yield return!
Printing to console at end of iterator!

We can see that in the output we get the single start and end lines when entering and exiting the iterator itself. We also get the before/after log lines printed wrapped around the numeric value that is printed to the console from the foreach loop.

With this in mind, one could consider different opportunities and reasons for why we may want to write additional logic inside of an iterator. The following are some examples to get you thinking, but I am not necessarily suggesting you do or do not write code that does these:

  • Check conditions before doing any internal enumeration to exit early
  • Filter out items that get yielded back based on some condition
  • Populate a cache based on the items that are being yielded out of the iterator

There are plenty of examples to consider here where we'd otherwise be unable to do this quite as easily with a collection.

Iterators Are "Lazy"

And no, I don't mean that your iterators are kicking back on the beach drinking pina coladas while you write all this complex code. I mean that they are evaluated lazily so that they are technically only executed when they are enumerated.

Let's demonstrate this by looking at the following code example:

IEnumerable<int> FunctionThatSleepsFor5SecondsFirst()
{
    Thread.Sleep(5000);
    return dataset;
}

IEnumerable<int> IteratorThatSleepsFor5SecondsFirst()
{
    Thread.Sleep(5000);
    foreach (var item in dataset)
    {
        yield return item;
    }
}

// this will wait the full 5 seconds because it is *not* an iterator
Console.WriteLine($"Calling function: {DateTime.Now}");
var resultA = FunctionThatSleepsFor5SecondsFirst();
Console.WriteLine($"Finished function: {DateTime.Now}");

// this assignment will happen "instantly" because it's an actual iterator
Console.WriteLine($"Calling iterator: {DateTime.Now}");
var resultB = IteratorThatSleepsFor5SecondsFirst();
Console.WriteLine($"Finished iterator: {DateTime.Now}");

The comments in the code are a bit of a spoiler, but let's read through to see what we have. First, the method at the top is *not* an iterator because it does not yield anything. We can see that it asks to sleep for 5 seconds before returning. The second method is in face an iterator, and it's written to be as similar as possible to the method above it except that it will yield return instead of returning the underlying collection directly.

The code that exercises these two methods is surrounded by printing to the console with some time stamps. It's also important to note that we're not even calling a foreach loop or otherwise "materializing" the results of the enumerator, we're simply assigning to the variables on the left hand side.

And the result?

The non-iterator pays the performance hit but the iterator assignment is essentially instantaneous. That's what we mean by lazy.

So if you're keen you might be asking: "Well it must not have actually enumerated anything because it would need to sleep 5 seconds before yielding back even the first item!"

And you'd be 100% correct. The iterator did not pay any performance impact here because we did not in fact enumerate it. The assignment of the iterator effectively acts like setting a function pointer. It will only pay the performance cost of the iterator when you iterate.

To conclude...

In this article we looked at a high level overview of iterators when contrasted with a simple collection like an array. If this was still a bit fuzzy, you may find a benefit from reading my prior article on IEnumerable. After looking through some examples, we can see that iterators allow us to have additional logic before we decide to yield back individual items to a caller. In our last example, we also saw that iterators have a lazy characteristic where just assigning them seems to be essentially free but we will be paying the cost to iterate them later.

If the lazy evaluation has got the gears turning for you, then you're well on your way to understanding some of the more advanced topics. From here, I would suggest reading about some of the pitfalls for C# code that is designed around passing around materialized collections compared with some of the pitfalls that are experienced when codebases heavily use iterators with developers that don't understand their characteristics.

Beware of These Iterator and Collection Traps

What Does yield Do In C#: A Simplified View For Beginners

What does yield do in C#? Explore the benefits of using C# yield keyword with large datasets and best practices for implementation. One more tool to leverage!

Yield! Reconsidering APIs with Collections

An error has occurred. This application may no longer respond until reloaded. Reload x