# [SOLVED] Fastest way to compare list of lists against list of sets

## Issue

Is there a faster way to do the following list comprehension?

``````ret = [
[
subList
for subList in lst
if set(subList) not in listOfSets
]
for lst in listOfLists
]
``````

Restrictions

• `listOfSets`: None
• `listOfLists`: Must maintain the ordering of the subsublists during construction, but not necessarily the ordering of the sublists. I.e:
``````[[[1, 2, 3], [4, 5, 6]], [[7, 8, 9]]] != [[[1, 2, 3], [6, 5, 4]], [7, 8, 9]]
``````

but

``````[[[1, 2, 3], [4, 5, 6]], [[7, 8, 9]]] = [[[7, 8, 9]], [[1, 2, 3], [4, 5, 6]]]
``````
• `ret`: `ret` must maintain the same ordering of the original `listOfLists` as detailed above.

My code generates the following list of lists. Each list contains equally sized sublists but the number of sublists van vary. I.e.:

``````listOfLists = [[[1, 2, 3], [4, 6, 5], [9, 8, 7]], [[11, 12, 13]], ...]
``````

I need to filter this list of lists to remove all sublists that do not exist in a list of sets:

``````listOfSets = [{1, 2, 3}, {20, 30, 15}, {6, 7, 8}, ...]

ret = [
[
subList
for subList in lst
if set(subList) not in listOfSets
]
for lst in listOfLists
]

ret = [[[4, 6, 5], [9, 8, 7]], [[11, 12, 13]], ...]
``````

Note the missing `[1, 2, 3]` in ret.

I have tried variations of the following

``````ret = [
[
subList
for subList in lst
if not(set(subList) in listOfSets)
] for lst in listOfLists
]
``````

with the idea being that `not(set(subList) in listOfSets)` will return faster since it need only find a single match, but to no avail:

``````%timeit ret = [
[
subList
for subList in lst
if set(subList) not in listOfSets
]
for lst in listOfLists
]
772 µs ± 26.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
``````
``````%timeit ret = [
[
subList
for subList in lst
if not(set(subList) in listOfSets)
] for lst in listOfLists
]
797 µs ± 38.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
``````

## Solution

I will not extend so much, given that the other answer is already very complete, but I got even better performance by using the difference between sets.

Let:

``````>>> set_of_tuples = {(1, 2, 3), (9, 8, 7), (11, 12, 13), (6, 7, 8)}
>>> list_of_lists_of_tuples = [[(1, 2, 3), (4, 6, 5), (9, 8, 7)], [(11, 12, 13)]]
``````

For comparison, here is JJ Hassan’s second example run on my machine (plus, note that I included the `not in` that was in the original question):

``````>>> filtered_list_of_lists_of_tuples = [[sl for sl in l if sl in set_of_tuples] for l in list_of_lists_of_tuples]
>>> filtered_list_of_lists_of_tuples
[[(1, 2, 3), (9, 8, 7)], [(11, 12, 13)]]
>>> %timeit -n1000000 -r7 filtered_list_of_lists_of_tuples = [[sl for sl in l if sl not in set_of_tuples] for l in list_of_lists_of_tuples]
930 ns ± 146 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
``````

Now, using the difference between sets:

``````>>> filtered_list_of_sets_of_tuples = [set(l) - set_of_tuples for l in list_of_lists_of_tuples]
>>> filtered_list_of_sets_of_tuples
[{(4, 6, 5)}, set()]
>>> %timeit -n1000000 -r7 filtered_list_of_sets_of_tuples = [set(l) - set_of_tuples for l in list_of_lists_of_tuples]
864 ns ± 63.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
``````

Try this option as well, as I believe that the difference in speed may become more significant with bigger lists.

The idea behind this is that you have a superlist, which is a list of lists, each containing a sublist, or in this case a `tuple`. But, as per your requirements, the intermediate lists do not need to preserve order (only the superlist and the sublists), and we want to take those elements that are not found in `set_of_tuples`. Consequently, the intermediate lists can be seen as `set`s, and the operation of taking the elements that do not belong to `set_of_tuples` is trivially the difference between sets.

## Edit

I just came up with a slightly faster solution by using `functools` and `itertools`. This new solution, however, is better only when we enough data.

``````filtered_list_of_sets_of_tuples = [set(l) - set_of_tuples for l in list_of_lists_of_tuples]
``````

Now, by a simple application of `map`, this becomes:

``````filtered_list_of_sets_of_tuples = [s - set_of_tuples for s in map(set, list_of_lists_of_tuples)]
``````

Then we can use `operator.sub` to rewrite this as:

``````from operator import sub
filtered_list_of_sets_of_tuples = [sub(s, set_of_tuples) for s in map(set, list_of_lists_of_tuples)]
``````

or, using a plain `list`:

``````from operator import sub
filtered_list_of_sets_of_tuples = list(sub(s, set_of_tuples) for s in map(set, list_of_lists_of_tuples))
``````

Finally, we use `map` once more, this time bringing `itertools.repeat` to the game:

``````from itertools import repeat
from operator import sub

filtered_list_of_sets_of_tuples = list(map(sub, map(set, list_of_lists_of_tuples), repeat(set_of_tuples)))
``````

This new method is actually the slowest given small lists:

``````>>> set_of_tuples = {(1, 2, 3), (9, 8, 7), (11, 12, 13), (6, 7, 8)}
>>> list_of_lists_of_tuples = [[(1, 2, 3), (4, 6, 5), (9, 8, 7)], [(11, 12, 13)]]
>>> %timeit -n1000000 -r20 filtered_list_of_lists_of_tuples = [[sl for sl in l if sl not in set_of_tuples] for l in list_of_lists_of_tuples]
903 ns ± 168 ns per loop (mean ± std. dev. of 20 runs, 1000000 loops each)
>>> %timeit -n1000000 -r20 filtered_list_of_sets_of_tuples = [set(l) - set_of_tuples for l in list_of_lists_of_tuples]
789 ns ± 70 ns per loop (mean ± std. dev. of 20 runs, 1000000 loops each)
>>> %timeit -n1000000 -r20 filtered_list_of_sets_of_tuples = list(map(sub, map(set, list_of_lists_of_tuples), repeat(set_of_tuples)))
1.28 µs ± 299 ns per loop (mean ± std. dev. of 20 runs, 1000000 loops each)
``````

But now let us define bigger lists. I used approximately the sizes you mentioned in a comment:

``````>>> from random import randint
>>> list_of_lists_of_tuples = [[(1, 2, 3), (4, 6, 5), (9, 8, 7)], [(11, 12, 13)]] * 100
>>> set_of_tuples = {(randint(0, 100), randint(0, 100), randint(0, 100)) for _ in range(2680)}
``````

Using this new data, here are the results I got on my machine:

``````>>> %timeit -n10000 -r20 filtered_list_of_lists_of_tuples = [[sl for sl in l if sl not in set_of_tuples] for l in list_of_lists_of_tuples]
65 µs ± 7.05 µs per loop (mean ± std. dev. of 20 runs, 10000 loops each)
>>> %timeit -n10000 -r20 filtered_list_of_sets_of_tuples = [set(l) - set_of_tuples for l in list_of_lists_of_tuples]
58.1 µs ± 6.67 µs per loop (mean ± std. dev. of 20 runs, 10000 loops each)
>>> %timeit -n10000 -r20 filtered_list_of_sets_of_tuples = list(map(sub, map(set, list_of_lists_of_tuples), repeat(set_of_tuples)))
54.1 µs ± 5.34 µs per loop (mean ± std. dev. of 20 runs, 10000 loops each)
``````