LINQ: Beware of deferred execution

If you've spent much time around C# and .NET, it's likely that you will have come across LINQ (Language-Integrated Query), which allows you to use a range of powerful querying capabilities directly in the C# language.

The example below demonstrate a couple of common features of LINQ (note that I am using the extension method syntax rather than LINQ expressions). In the example, we have a list of people and want to obtain a list of names of the adults in that list. We will then iterate over those names twice (this will be useful to demonstrate the differences between immediate and deferred execution).

Using LINQ, we can:

Filter by age using Where
Map from a Person object to the name string using Select
Evaluate the query to a list using ToList

public class Person
{
    public string Name { get; set; }
    public int Age { get; set; }
}

var people = new List<Person>
{
    new Person { Name = "Sam", Age = 27 },
    new Person { Name = "Suzie", Age = 17 },
    new Person { Name = "Harry", Age = 23 },
};

var adultNames = people
    .Where(person => 
    {
        Console.WriteLine("Filtering by age...");
        return person.Age >= 18;
    })
    .Select(person => person.Name)
    .ToList();

foreach(var name in adultNames)
    Console.Writeline(name);

foreach(var name in adultNames)
    Console.Writeline(name);

/* output
Filtering by age
Filtering by age
Filtering by age
Sam
Harry
Sam
Harry
*/

Notice that in the above example, we explicitly convert the query to a list. This immediately executes the query to give a new list containing only the adult names, which we can then iterate over.

So what happens if we leave off the ToList?

var adultNames = people
    .Where(person => 
    {
        Console.WriteLine("Filtering by age...");
        return person.Age >= 18;
    })
    .Select(person => person.Name);

foreach(var name in adultNames)
    Console.Writeline(name);

foreach(var name in adultNames)
    Console.Writeline(name);

/* output
Filtering by age
Sam
Filtering by age
Filtering by age
Harry
Filtering by age
Sam
Filtering by age
Filtering by age
Harry
*/

Now the output looks quite different. Instead of doing all the filtering first, then iterating over the adult names, the filtering for each item is now happening immediately before we evaluate that item. Importantly, the filtering is also happening every time we iterate over the items. This is known as deferred execution, since we wait until we actually need the values to evaluate the query.

Benefits of Deferred Execution

It looks like deferred execution is the default behaviour of LINQ, unless you explicitly tell it to evaluate immediately (using ToList, ToDictionary etc.). So there must be some benefit to doing this, right?

1. Better Performance

In most cases it is expected that deferred execution will result in better performance, since you don't have to execute the query on the whole data set at once. Instead, you perform the query on one item at a time as you are already iterating over it.

2. Query Construction

Since the query does not need to be immediately executed, you can build it up in several steps, perhaps passing through additional conditional logic. This gives you additional power to create more complex queries.

public IEnumerable<Person> GetNames(IEnumerable<Person> people, bool onlyAdults)
{
    var query = people.AsEnumerable();

    if (onlyAdults)
    {
        // only add this filter when onlyAdults is true
        query = query.Where(person => person.Age >= 18);
    }

    query = query.Select(person => person.Name);

    return query.ToList();
}

3. Always revaluated

Since the query is always revaluated on every enumeration, you can add/remove/change elements of you collection after the query has been constructed and the query will know about these changes. In this way, you know that you are always iterating over the most up to date data.

var people = new List<Person>
{
    new Person { Name = "Sam", Age = 27 },
    new Person { Name = "Suzie", Age = 17 },
    new Person { Name = "Harry", Age = 23 },
};
            
var adultNames = people
    .Where(person => person.Age >= 18)
    .Select(person => person.Name);
        
foreach(var name in adultNames)
    Console.WriteLine(name);
        
people.Add(new Person { Name = "Sally", Age = 26 });
        
foreach(var name in adultNames)
    Console.WriteLine(name);

/* output
Sam
Harry
Sam
Harry
Sally
*/

The pitfalls of deferred execution

Despite so far singing the praises of LINQs deferred execution, this post was inspired by some of the issues I've been experiencing using it. One of it's benefits is also a pitfall if you do not take enough care when writing your code - the query is always revaluated.

Although deferred execution is seen as a performance benefit generally, there can be cases where it can actually dramatically slow down your application if you're not careful. Any time that you know you will need to repeated iterate over the same collection numerous times (for example a nested for/foreach loop), make sure you call to list first. Otherwise, you will be evaluating the whole collection every single time, which will dramatically reduce performance. This is especially true if the source collection is particularly large since, even if your query does a lot of filtering, the query will be applied every time to the whole source collection.

The final pitfall to mention is using Select to run a collection of tasks. I've seen arguments to say that this is something that you shouldn't do at all, but I've seen it enough in codebases to know that it's something that people do do and something you should be aware of. Imagine the below scenario:

var listOfIds = new List<int> { 1, 5, 8, 15 };

var tasks = listOfIds.Select(id => _repository.GetAsync(id));
await Task.WhenAll(tasks);
var results = tasks.Select(task => task.Result).ToList();

In the above example, the GetAsync method is actually executed twice for every ID, once when it is first declared, and a second time when the query is evaluated using ToList. Not only does this have a massive impact on performance by performing expensive operations multiple times, but also, since the task is re-executed, its not guaranteed to be completed when you actually come to evaluate it. As you might imagine, it is also particularly dangerous if the task you are running is actually a create or update operation (yes I have seen this too). To do the get safely, you need to immediately evaluate the query:

var tasks = listOfIds.Select(id => _repository.GetAsync(id)).ToList();

// EDIT: As pointed out in the comments, a more elegant solution would be
var tasks = listOfIds.Select(id => _repository.GetAsync(id));
var results = await Task.WhenAll(tasks);

Conclusion

In this article I have introduced deferred execution in the context of .NETs LINQ. I have shown some of its features and why it can be beneficial compared to immediate execution. Finally, I have discussed some common pitfalls to looks out for when using LINQ and deferred execution.