[SOLVED] More Efficient Way to find the Second Largest Item in a List in Python

Issue

I wrote this simple code for a simple task of finding the second largest item in a list of integers:

def second_largest(input_list):
  input_list.sort()
  return input_list[-2]

However, for large lists this function can be really uneffective, for example a running time of more than 1.5 seconds for one million items.

I understand that it’s due to the fact that the function changes the very list itself (with the .sort method) which can be very inefficient for a long list. How can I perform this task without having to use inefficient methods that change the list?

Thank you all in advnace.

Solution

I’ve found that a solution based on np.argpartition is the fastest. It does require Numpy though, and it doesn’t take into account the conversion of your list to a numpy array. But, in the end, if you wish to do some further number crunching down the line you might want to use numpy arrays since operations on those are typically much faster than operations on list. So I’m assuming this is the case for you.


First you should convert your list to an array:

import numpy as np
v = list(range(10000000))
var = np.array(v)

We can then use np.argpartition

%%time
ind = np.argpartition(var, -2)[-2]

CPU times: user 46.3 ms, sys: 25.9 ms, total: 72.2 ms
Wall time: 71.7 ms

ind = 9999998, so the result seems correct.

Now if we compare with the other suggestions here:

%%time
v.pop(v.index(max(v)))
max(v)

CPU times: user 619 ms, sys: 1.96 ms, total: 621 ms
Wall time: 623 ms

About 10 times slower

%%time
heapq.nlargest(2,v)[1]

CPU times: user 2.66 s, sys: 1.84 ms, total: 2.66 s
Wall time: 2.66 s

Well… much slower

And the last one

largest, second_largest = v[0], v[0]
for x in v:
    if x > largest:
        largest, second_largest = x, largest
    elif x > second_largest:
        second_largest = x
print(largest, second_largest) # 999999 999998

CPU times: user 1.41 s, sys: 0 ns, total: 1.41 s
Wall time: 1.41 s

Again, many times slower.


The np.argpartition solution seems the fastest if you plan to work with arrays somewhere down the line. If not, you will spend a lot of CPU time converting your list to an array that you won’t be using again (see below). It still outperforms some other solutions though.
If we take into account the conversion to an array, we get this result:

%%time
var=np.array(v)
ind = np.argpartition(var, -2)[-2]

CPU times: user 867 ms, sys: 59 ms, total: 926 ms
Wall time: 924 ms

This question explains pretty well why this solution is faster. The reason is that you only perform a partial sort using np.argpartition rather than fully sorting the list.

Answered By – lcdumort

Answer Checked By – Katrina (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *