[SOLVED] How to map function directly over list of lists?

Issue

I have built a pixel classifier for images, and for each pixel in the image, I want to define to which pre-defined color cluster it belongs. It works, but at some 5 minutes per image, I think I am doing something unpythonic that can for sure be optimized.

How can we map the function directly over the list of lists?

#First I convert my image to a list
#Below list represents a true image size
list1=[[255, 114, 70],
[120, 89, 15],
[247, 190, 6],
[41, 38, 37],
[102, 102, 10],
[255,255,255]]*3583180

Then we define the clusters to map the colors to and the function to do so (which is taken from the PIL library)

#Define colors of interest
#Colors of interest 
RED=[255, 114, 70]
DARK_YELLOW=[120, 89, 15]
LIGHT_YELLOW=[247, 190, 6]
BLACK=[41, 38, 37]
GREY=[102, 102, 10]
WHITE=[255,255,255]

Colors=[RED, DARK_YELLOW, LIGHT_YELLOW, GREY, BLACK, WHITE]

#Function to find closes cluster by root and squareroot distance of RGB
def distance(c1, c2):
    (r1,g1,b1) = c1
    (r2,g2,b2) = c2
    return math.sqrt((r1 - r2)**2 + (g1 - g2) ** 2 + (b1 - b2) **2)

What remains is to match every color, and make a new list with matched indexes from the original Colors:

Filt_lab=[]

#Match colors and make new list with indexed colors
for pixel in tqdm(list1):
    closest_colors = sorted(Colors, key=lambda color: distance(color, pixel))
    closest_color = closest_colors[0]

    for num, clust in enumerate(Colors):
        if list(clust) == list(closest_color):
            Filt_lab.append(num)

Running a single image takes approximately 5 minutes, which is OK, but likely there is a method in which this time can be greatly reduced?

36%|███▌ | 7691707/21499080 [01:50<03:18, 69721.86it/s]

Expected outcome of Filt_lab:

[0, 1, 2, 4, 3, 5]*3583180

Solution

You can use the Numba’s JIT to speed up the code by a large margin. The idea is to build classified_pixels on the fly by iterating over the colours for each pixel. The colours are stored in a Numpy array where the index is the colour key. The whole computation can run in parallel. This avoid many temporary arrays to be created and written/read in memory and a lot of memory to be allocated. Moreover, the data types can be adapted so that the resulting array is smaller in memory (so written/read faster). Here is the final script:

import numpy as np
import numba as nb

@nb.njit('int32[:,::1](int32[:,:,::1], int32[:,::1])', parallel=True)
def classify(image, colors):
    classified_pixels = np.empty((image.shape[0], image.shape[1]), dtype=np.int32)
    for i in nb.prange(image.shape[0]):
        for j in range(image.shape[1]):
            minId = -1
            minValue = 256*256 # The initial value is the maximum possible value
            ir, ig, ib = image[i, j]
            # Find the color index with the minimum difference
            for k in range(len(colors)):
                cr, cg, cb = colors[k]
                total = (ir-cr)**2 + (ig-cg)**2 + (ib-cb)**2
                if total < minValue:
                    minValue = total
                    minId = k
            classified_pixels[i, j] = minId
    return classified_pixels

# Representative image
np.random.seed(42)
imarray = np.random.rand(3650,2000,3) * 255
image = imarray.astype(np.int32)

# Colors of interest
RED = [255, 0, 0]
DARK_YELLOW = [120, 89, 15]
LIGHT_YELLOW = [247, 190, 6]
BLACK = [41, 38, 37]
GREY = [102, 102, 10]
WHITE = [255, 255, 255]

# Build a Numpy array rather than a dict
colors = np.array([RED, DARK_YELLOW, LIGHT_YELLOW, GREY, BLACK, WHITE], dtype=np.int32)

# Actual classification
classified_pixels = classify(image, colors)

# Convert array to list
cl_pixel_list = classified_pixels.reshape(classified_pixels.shape[0] * classified_pixels.shape[1]).tolist()

# Print
print(cl_pixel_list[0:10])

This implementation takes about 0.19 second on my 6-core machine. It is about 15 times faster than the last provided answer so far and more than thousand times faster than the initial implementation. Note that about half the time is spent in tolist() since classify function is very fast.

Answered By – Jérôme Richard

Answer Checked By – Jay B. (BugsFixing Admin)

Leave a Reply

Your email address will not be published. Required fields are marked *