[SOLVED] Nested for loop is really slow for query set traversal in Django

Issue

I have two models named machine and performance,

class machine(models.Model):
    machine_type = models.CharField(null=True, max_length=10)
    machine_no = models.IntegerField(null=True)    
    machine_name = models.CharField(null=True,max_length=255)
    machine_sis = models.CharField(null=True, max_length=255)
    store_code = models.IntegerField(null=True)
    created = models.DateTimeField(auto_now_add=True)

class Performance(models.Model):
    machine_no = models.IntegerField(null=True)
    power = models.IntegerField(null=True)
    store_code = models.IntegerField(null=True)
    created = models.DateTimeField(auto_now_add=True)

For each Machine, there are multiple fields of in Performance Model and I have to find the count of Performance Model’s rows in the db which have power = some_integer. Here is what my view looks like:

machines = machine.objects.filter(machine_type="G",machine_sis="919")

Let’s say machine.count() sometimes is 100. For each of this machine I need to calculate the number of machines which have power = 100 in performance model. So what I did first was but was really slow:

for obj in machines:
    print performance.objects.filter(machine_no=obj.machine_no,power=100).count()

My second approach was faster than the first approach:

for obj in machines:
    data = performance.objects.filter(machine_no=obj.machine_no,power=100)
    counter = 0
    for p in data: # ***** lets say this loop is called star-loop
        if p.power == 100:
            counter +=1

My problem is that the speed is really slow when I have to check 100 machines in Performance model whose power = something.

I am not using foreign key in Performance Model because the actual architecture is more complex and I can’t use machine number or anything as foreign key because when identifying each machine uniquely I need multiple columns of machine.

Also, this project is working in production and I can’t take much chance. I am using Django 1.11, Python 2.7 and postresql rds instance. I have increased the network performance buy renting a better instance from aws.

Solution

You can do the counting and filtering on the Python side:

from collections import Counter

c = Counter(performance.objects.filter(power=100).
            values_list('machine_no', flat=True))


m = machine.objects.filter(machine_type="G",machine_sis="919")
    .values_list('machine_no', flat=True)

result = sum(v for k,v in c.items() if k in m)

what if i need power = 100 and also separate list of machines with
power = 99 ? do i have to use two separate Counter() functions with
the query ?

No, just add the filter to the same query, using a Q object, then calculate two different results, like this:

from collections import Counter
from django.db.models import Q

c = Counter(performance.objects.filter(power=100 | Q(power=99)).
            values_list('machine_no', 'power'))

m = machine.objects.filter(machine_type="G",machine_sis="919")
    .values_list('machine_no', flat=True)

result_100 = sum(v for k,v in c.items() if k[0] in m and k[1] = 100)
result_99  = sum(v for k,v in c.items() if k[0] in m and k[1] = 99)

Answered By – Burhan Khalid

Answer Checked By – Dawn Plyler (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *