[SOLVED] How to avoid Query Plan re-compilation when using IEnumerable.Contains in Entity Framework LINQ queries?

Issue

I have the following LINQ query executed using Entity Framework (v6.1.1):

private IList<Customer> GetFullCustomers(IEnumerable<int> customersIds)
{
    IQueryable<Customer> fullCustomerQuery = GetFullQuery();
    return fullCustomerQuery.Where(c => customersIds.Contains(c.Id)).ToList();
}

This query is translated into fairly nice SQL:

SELECT 
[Extent1].[Id] AS [Id], 
[Extent1].[FirstName] AS [FirstName]
-- ...
FROM [dbo].[Customer] AS [Extent1]
WHERE [Extent1].[Id] IN (1, 2, 3, 5)

However, I get a very significant performance hit on a query compilation phase. Calling:

ELinqQueryState.GetExecutionPlan(MergeOption? forMergeOption) 

Takes ~50% of the time of each request. Digging deeper, it turned out that query gets re-compiled every time I pass different customersIds.
According to MSDN article, this is an expected behavior because IEnumerable that is used in a query is considered volatile and is part of SQL that is cached. That’s why SQL is different for every different combination of customersIds and it always has different hash that is used to get compiled query from cache.

Now the question is: How can I avoid this re-compilation while still querying with multiple customersIds?

Solution

This is a great question. First of all, here are a couple of workarounds that come to mind (they all require changes to the query):

First workaround

This one maybe a bit obvious and unfortunately not generally applicable: If the selection of items you would need to pass over to Enumerable.Contains already exists in a table in the database, you can write a query that calls Enumerable.Contains on the corresponding entity set in the predicate instead of bringing the items into memory first. An Enumerable.Contains call over data in the database should result in some kind of JOIN-based query that can be cached. E.g. assuming no navigation properties between Customers and SelectedCustomers, you should be able to write the query like this:

var q = db.Customers.Where(c => 
    db.SelectedCustomers.Select(s => s.Id).Contains(c.Id));

The syntax of the query with Any is a bit simpler in this case:

var q = db.Customers.Where(c => 
    db.SelectedCustomers.Any(s => s.Id == c.Id));

If you don’t already have the necessary selection data stored in the database, you will probably don’t want the overhead of having to store it, so you should consider the next workaround.

Second workaround

If you know beforehand that you will have a relatively manageable maximum number of elements in the list you can replace Enumerable.Contains with a tree of OR-ed equality comparisons, e.g.:

var list = new [] {1,2,3};
var q = db.Customers.Where(c => 
    list[0] == c.Id ||
    list[1] == c.Id ||
    list[2] == c.Id );

This should produce a parameterized query that can be cached. If the list varies in size from query to query, this should produce a different cache entry for each list size. Alternatively you could use a list with a fixed size and pass some sentinel value that you know will never match the value argument, e.g. 0, -1, or alternatively just repeat one of the other values. In order to produce such predicate expression programmatically at runtime based on a list, you might want to consider using something like PredicateBuilder.

Potential fixes and their challenges

On one hand, changes necessary to support caching of this kind of query using CompiledQuery explicitly would be pretty complex in the current version of EF. The key reason is that the elements in the IEnumerable<T> passed to the Enumerable.Contains method would have to translate into a structural part of the query for the particular translation we produce, e.g.:

var list = new [] {1,2,3};
var q = db.Customers.Where(c => list.Contains(c.Id)).ToList();

The enumerable “list” looks like a simple variable in C#/LINQ but it needs to be translated to a query like this (simplified for clarity):

SELECT * FROM Customers WHERE Id IN(1,2,3)

If list changes to new [] {5,4,3,2,1}, and we would have to generate the SQL query again!

SELECT * FROM Customers WHERE Id IN(5,4,3,2,1)

As a potential solution, we have talked about leaving generated SQL queries open with some kind of special place holder, e.g. store in the query cache that just says

SELECT * FROM Customers WHERE Id IN(<place holder>)

At execution time, we could pick this SQL from the cache and finish the SQL generation with the actual values. Another option would be to leverage a Table-Valued Parameter for the list if the target database can support it. The first option would probably work ok only with constant values, the latter requires a database that supports a special feature. Both are very complex to implement in EF.

Auto compiled queries

On the other hand, for automatic compiled queries (as opposed to explicit CompiledQuery) the issue becomes somewhat artificial: in this case we compute the query cache key after the initial LINQ translation, hence any IEnumerable<T> argument passed should have already been expanded into DbExpression nodes: a tree of OR-ed equality comparisons in EF5, and usually a single DbInExpression node in EF6. Since the query tree already contains a distinct expression for each distinct combination of elements in the source argument of Enumerable.Contains (and therefore for each distinct output SQL query), it is possible to cache the queries.

However even in EF6 these queries are not cached even in the auto compiled queries case. The key reason for that is that we expect the variability of elements in a list to be high (this has to do with the variable size of the list but is also exacerbated by the fact that we normally don’t parameterize values that appear as constants to the query, so a list of constants will be translated into constant literals in SQL), so with enough calls to a query with Enumerable.Contains you could produce considerable cache pollution.

We have considered alternative solutions to this as well, but we haven’t implemented any yet. So my conclusion is that you would be better off with the second workaround in most cases if as I said, you know the number of elements in the list will remain small and manageable (otherwise you will face performance issues).

Hope this helps!

Answered By – divega

Answer Checked By – Gilberto Lyons (BugsFixing Admin)

Leave a Reply

Your email address will not be published. Required fields are marked *