Table of Contents

## Issue

Is there a faster way to do the following list comprehension?

```
ret = [
[
subList
for subList in lst
if set(subList) not in listOfSets
]
for lst in listOfLists
]
```

**Restrictions**

`listOfSets`

: None`listOfLists`

: Must maintain the ordering of the subsublists during construction, but not necessarily the ordering of the sublists. I.e:

```
[[[1, 2, 3], [4, 5, 6]], [[7, 8, 9]]] != [[[1, 2, 3], [6, 5, 4]], [7, 8, 9]]
```

but

```
[[[1, 2, 3], [4, 5, 6]], [[7, 8, 9]]] = [[[7, 8, 9]], [[1, 2, 3], [4, 5, 6]]]
```

`ret`

:`ret`

must maintain the same ordering of the original`listOfLists`

as detailed above.

My code generates the following list of lists. Each list contains equally sized sublists but the number of sublists van vary. I.e.:

```
listOfLists = [[[1, 2, 3], [4, 6, 5], [9, 8, 7]], [[11, 12, 13]], ...]
```

I need to filter this list of lists to remove all sublists that do not exist in a list of sets:

```
listOfSets = [{1, 2, 3}, {20, 30, 15}, {6, 7, 8}, ...]
ret = [
[
subList
for subList in lst
if set(subList) not in listOfSets
]
for lst in listOfLists
]
ret = [[[4, 6, 5], [9, 8, 7]], [[11, 12, 13]], ...]
```

Note the missing `[1, 2, 3]`

in ret.

I have tried variations of the following

```
ret = [
[
subList
for subList in lst
if not(set(subList) in listOfSets)
] for lst in listOfLists
]
```

with the idea being that `not(set(subList) in listOfSets)`

will return faster since it need only find a single match, but to no avail:

```
%timeit ret = [
[
subList
for subList in lst
if set(subList) not in listOfSets
]
for lst in listOfLists
]
772 µs ± 26.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

```
%timeit ret = [
[
subList
for subList in lst
if not(set(subList) in listOfSets)
] for lst in listOfLists
]
797 µs ± 38.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

## Solution

## Original Answer

I will not extend so much, given that the other answer is already very complete, but I got even better performance by using the difference between sets.

Let:

```
>>> set_of_tuples = {(1, 2, 3), (9, 8, 7), (11, 12, 13), (6, 7, 8)}
>>> list_of_lists_of_tuples = [[(1, 2, 3), (4, 6, 5), (9, 8, 7)], [(11, 12, 13)]]
```

For comparison, here is JJ Hassan’s second example run on my machine (plus, note that I included the `not in`

that was in the original question):

```
>>> filtered_list_of_lists_of_tuples = [[sl for sl in l if sl in set_of_tuples] for l in list_of_lists_of_tuples]
>>> filtered_list_of_lists_of_tuples
[[(1, 2, 3), (9, 8, 7)], [(11, 12, 13)]]
>>> %timeit -n1000000 -r7 filtered_list_of_lists_of_tuples = [[sl for sl in l if sl not in set_of_tuples] for l in list_of_lists_of_tuples]
930 ns ± 146 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

Now, using the difference between sets:

```
>>> filtered_list_of_sets_of_tuples = [set(l) - set_of_tuples for l in list_of_lists_of_tuples]
>>> filtered_list_of_sets_of_tuples
[{(4, 6, 5)}, set()]
>>> %timeit -n1000000 -r7 filtered_list_of_sets_of_tuples = [set(l) - set_of_tuples for l in list_of_lists_of_tuples]
864 ns ± 63.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

Try this option as well, as I believe that the difference in speed may become more significant with bigger lists.

The idea behind this is that you have a **superlist**, which is a list of **lists**, each containing a **sublist**, or in this case a `tuple`

. But, as per your requirements, the intermediate **lists** do not need to preserve order (only the **superlist** and the **sublists**), and we want to take those elements that are not found in `set_of_tuples`

. Consequently, the intermediate **lists** can be seen as `set`

s, and the operation of taking the elements that do not belong to `set_of_tuples`

is trivially the difference between sets.

## Edit

I just came up with a slightly faster solution by using `functools`

and `itertools`

. This new solution, however, is better only when we enough data.

Let us start with the previous solution:

```
filtered_list_of_sets_of_tuples = [set(l) - set_of_tuples for l in list_of_lists_of_tuples]
```

Now, by a simple application of `map`

, this becomes:

```
filtered_list_of_sets_of_tuples = [s - set_of_tuples for s in map(set, list_of_lists_of_tuples)]
```

Then we can use `operator.sub`

to rewrite this as:

```
from operator import sub
filtered_list_of_sets_of_tuples = [sub(s, set_of_tuples) for s in map(set, list_of_lists_of_tuples)]
```

or, using a plain `list`

:

```
from operator import sub
filtered_list_of_sets_of_tuples = list(sub(s, set_of_tuples) for s in map(set, list_of_lists_of_tuples))
```

Finally, we use `map`

once more, this time bringing `itertools.repeat`

to the game:

```
from itertools import repeat
from operator import sub
filtered_list_of_sets_of_tuples = list(map(sub, map(set, list_of_lists_of_tuples), repeat(set_of_tuples)))
```

This new method is actually the slowest given small lists:

```
>>> set_of_tuples = {(1, 2, 3), (9, 8, 7), (11, 12, 13), (6, 7, 8)}
>>> list_of_lists_of_tuples = [[(1, 2, 3), (4, 6, 5), (9, 8, 7)], [(11, 12, 13)]]
>>> %timeit -n1000000 -r20 filtered_list_of_lists_of_tuples = [[sl for sl in l if sl not in set_of_tuples] for l in list_of_lists_of_tuples]
903 ns ± 168 ns per loop (mean ± std. dev. of 20 runs, 1000000 loops each)
>>> %timeit -n1000000 -r20 filtered_list_of_sets_of_tuples = [set(l) - set_of_tuples for l in list_of_lists_of_tuples]
789 ns ± 70 ns per loop (mean ± std. dev. of 20 runs, 1000000 loops each)
>>> %timeit -n1000000 -r20 filtered_list_of_sets_of_tuples = list(map(sub, map(set, list_of_lists_of_tuples), repeat(set_of_tuples)))
1.28 µs ± 299 ns per loop (mean ± std. dev. of 20 runs, 1000000 loops each)
```

But now let us define bigger lists. I used approximately the sizes you mentioned in a comment:

```
>>> from random import randint
>>> list_of_lists_of_tuples = [[(1, 2, 3), (4, 6, 5), (9, 8, 7)], [(11, 12, 13)]] * 100
>>> set_of_tuples = {(randint(0, 100), randint(0, 100), randint(0, 100)) for _ in range(2680)}
```

Using this new data, here are the results I got on my machine:

```
>>> %timeit -n10000 -r20 filtered_list_of_lists_of_tuples = [[sl for sl in l if sl not in set_of_tuples] for l in list_of_lists_of_tuples]
65 µs ± 7.05 µs per loop (mean ± std. dev. of 20 runs, 10000 loops each)
>>> %timeit -n10000 -r20 filtered_list_of_sets_of_tuples = [set(l) - set_of_tuples for l in list_of_lists_of_tuples]
58.1 µs ± 6.67 µs per loop (mean ± std. dev. of 20 runs, 10000 loops each)
>>> %timeit -n10000 -r20 filtered_list_of_sets_of_tuples = list(map(sub, map(set, list_of_lists_of_tuples), repeat(set_of_tuples)))
54.1 µs ± 5.34 µs per loop (mean ± std. dev. of 20 runs, 10000 loops each)
```

Answered By – ruancomelli

Answer Checked By – Willingham (BugsFixing Volunteer)