[SOLVED] search in list efficiency way (nested lis)

Issue

I have a nested list with 514000 list.
I have to find duplicated list and store their position.
I wrote a code, but with bad efficiency way.514000*514000
Do you have a goold solution with good efficiency way ?
my nested list:

[["src_ip","src_port","dst_ip","dst_port"],["src_ip","src_port","dst_ip","dst_port"],
and so on
] 

Solution

A much faster solution is to use a set to track the items already seen so far and add the location of the duplicates to a list. Because lists are not hashable, they can be converted to a tuple. Here is the resulting code:

seen = set()  # Unique items
result = []   # List of duplicate positions
for pos, item in enumerate(lst):
    tmp = tuple(item)
    if tmp in seen:
        result.append(pos)
    else:
        seen.add(tmp)

If you want to track where is the unique item associated to the duplicates found, you can just change seen to a dict and change seen.add(tmp) to seen[tmp] = pos. The resulting solution runs in linear time (ie. O(n) where n is the size of the input lst). This code takes about 250 ms on my machine with a list containing 514000 sub-lists of 4 short random strings.

Answered By – Jérôme Richard

Answer Checked By – Marie Seifert (BugsFixing Admin)

Leave a Reply

Your email address will not be published. Required fields are marked *