Table of Contents
Issue
Imagine a situation where you have a sizable List which you need to search through. When would you convert it to a Dictionary<T.Identity, T> and when would you just query the list?
I am aware that querying a List is an O(n) operation, while querying a Dictionary (or Lookup/Hashset) is an O(1). However, I am entirely unsure of the efficiency of converting any O(n) collection into an O(1) collection. Isn’t the efficiency of that conversion O(n) itself? Would that mean that converting a List into a Dictionary is entirely pointless unless you query it at least three times?
While we’re at it, what’s your thought process when you’re deciding on a specific collection? What do you consider, and what do you find to be best practices?
E.g. (using my phone to write this, disregard syntax)
public class BigData
{
public int Id;
public SubBigData SubBigData;
}
//elsewhere...
public SubBigData GetDataById(int id)
{
List<BigData> data = GetDataFromSomewhere();
return data.Where(d => d.Id == id).Select(d => d.SubBigData).FirstOrDefault();
//vs
return data.ToDictionary(d => d.Id, d.SubBigData)[id];
}
Solution
If I could filter the result before I get it, that would be ideal. So, if that GetDataFromSomewhere
is query able, query your data there.
Summary
I did a test to show actual results:
On a collection of 1 000 000 records, get 1 000 records by ID
The Linq
expression in your example took 5.7
seconds.
A simplified Linq
expression took 6.6
seconds.
The Dictionary conversion and retrieve took 26
seconds.
Interestingly, if you convert the entire collection to a Dictionary and use that as your new source, it will take 2.7
seconds to get 1000 records.
So, the Dictionary conversion is quicker here. Though, I’m unsure if this is a sufficient test.
Code
using System.Diagnostics;
#region Create and populate collection
List<TestObject> listCollection = new();
for (int i = 0; i < 1000000; i++)
{
listCollection.Add(new TestObject()
{
Id = i,
Name = $"Name{i}"
});
}
#endregion Create and populate collection
Stopwatch stopwatch = Stopwatch.StartNew();
Random random = new();
#region Get using Linq
for (int i = 0; i < 1000; i++)
{
int id = random.Next(0, 2000000);
_ = listCollection.Where(c => c.Id == id).Select(c => c.Name).FirstOrDefault();
}
Console.WriteLine($"Took {stopwatch.Elapsed} to get 1000 items from list"); //Took 00:00:05.7212037
#endregion Get using Linq
#region Get using Linq (Shorter Expression, takes longer to execute)
stopwatch.Restart();
for (int i = 0; i < 1000; i++)
{
int id = random.Next(0, 2000000);
_ = listCollection.FirstOrDefault(c => c.Id == id)?.Name;
}
Console.WriteLine($"Took {stopwatch.Elapsed} to get 1000 items from list"); //Took 00:00:06.6697507
#endregion Get using Linq (Shorter Expression, takes longer to execute)
#region Get using Dictionary
stopwatch.Restart();
for (int i = 0; i < 1000; i++)
{
try
{
int id = random.Next(0, 2000000);
_ = listCollection.ToDictionary(c => c.Id, c => c.Name)[id];
}
catch {
//Ignore errors when trying to get a value from the Dictionary that does not have a matching key
}
}
Console.WriteLine($"Took {stopwatch.Elapsed} to get 1000 items from dictionary"); //Took 00:00:26.0257818
#endregion Get using Dictionary
#region Get using cached Dictionary
stopwatch.Restart();
var cachedDictionary = listCollection.ToDictionary(c => c.Id, c => c.Name);
for (int i = 0; i < 1000; i++)
{
try
{
int id = random.Next(0, 2000000);
_ = cachedDictionary[id];
}
catch
{
//Ignore errors when trying to get a value from the Dictionary that does not have a matching key
}
}
Console.WriteLine($"Took {stopwatch.Elapsed} to get 1000 items from cached dictionary"); //Took 00:00:02.7144331
#endregion Get using cached Dictionary
stopwatch.Stop();
public class TestObject
{
public int Id { get; set; }
public string Name { get; set; }
}
Answered By – Marius
Answer Checked By – Robin (BugsFixing Admin)