[SOLVED] Question on mixed Python performance results from PC vs smartphone

Issue

I did some Python performance comparison on PC and smartphone and results were confusing.

PC: i7-8750H / 32GB RAM / 1TB SSD / Windows 10
Smartphone: Galaxy S10 with Termux Linux emulator on Android 11

First one was simple Monte Carlo simulation with following code.

import random
import time

def monte_carlo_pi(n_samples: int):
    acc = 0
    for i in range(n_samples):
        x = random.random()
        y = random.random()
        if (x**2 + y**2) < 1.0:
            acc += 1
    return 4.0 * acc / n_samples

start_time = time.time()

print(monte_carlo_pi(10000000))
print(time.time()-start_time)

Surprisingly, it took about 5.2sec for PC and 2.7sec for smartphone.

Second was using pandas with some dataframe operations.

import pandas as pd
import time

start_time = time.time()

df = pd.DataFrame(
    [ [21, 72, -67], 
      [23, 78, 62],
      [32, 74, 54],
      [52, 54, 76],
      [0, 23, 66],
      [2, 1, 2] ],
    index = [11, 22, 33, 44, 55, 66],
    columns = ['a', 'b', 'c'])

df2 = pd.DataFrame()

df2 = pd.concat([df2, df['a']], axis=1)
df2 = pd.concat([df2, df['c']], axis=1)

print(df2)
print(time.time()-start_time)

This time, PC was about 0.007sec and smartphone was about 0.009sec, but the actual time from exec. to finish for smartphone was about 2 sec. My guess is that it takes longer for smartphone to load lengthy pandas package but not sure.

  1. Is ARM processor faster on simple repetitive calculations? Or is or isn’t either one of the processor utilizing multi-core capability?
  2. Is smartphone relatively slow on reading lengthy packages as observed above?
  3. Is there a better way to measure overall Python performance between PC and smartphone?

Solution

The execution time of your Python code is bounded by overheads. Indeed, calling random.random() takes a significant time regarding the number of iteration and the time of one iteration. This comes from the slow module fetch. You can cache the function using rand = random.random. Moreover, x**2 + y**2 is not optimized either. x*x + y*y can be used instead to speed up the computation. Additionally, you do not need a branch either: you can just use acc += (x*x + y*y) < 1.0. Here is the resulting code:

def monte_carlo_pi(n_samples: int):
    acc = 0
    rand = random.random
    for i in range(n_samples):
        x = rand()
        y = rand()
        acc += (x*x + y*y) < 1.0
    return 4.0 * acc / n_samples

This code is about 3 times faster on my machine. It is still slow compared to a native code. For example, using the Numba JIT is even 12 times faster on this simple code (35 times compared to the original code). This slow execution comes from the standard CPython interpreter itself. You can try PyPy (JIT-based interpreter) is you want such a code to be executed much faster without changing the code itself.

Finally, note that the Numpy package is generally used for such code so to get efficient codes.

Is ARM processor faster on simple repetitive calculations? Or is or isn’t either one of the processor utilizing multi-core capability?

None of the two. The speed of the interpreter can change due to the way it is compiled on the two different platforms. Moreover, the operating systems can play a significant role on the performance of the code (typically the speed of the allocations). In your case, one code is running on Windows while another is running on Android (Linux like). AFAIK, object allocations are significantly slower on Windows.

Is smartphone relatively slow on reading lengthy packages as observed above?

This is very-dependent of your hardware, especially your storage device, your RAM and the CPU. The file system also play a significant role as well as the OS itself. The best to know why platform is slower than another is simply to use a profiler.

Is there a better way to measure overall Python performance between PC and smartphone?

There are several benchmark to evaluate the performance of Python implementations on several platforms. One of them is this one. Benchmarking is quite-hard (especially micro-benchmarking) because you can very-easily draw the wrong conclusions due to a missing factor. Think about what you really want to measure and then reduce every possible source of noise to nothing because comparing several factor at once is a major source of benchmarking mistakes. For example, use the same OS on the two devices (eg. Android or at least a Linux-based systems with the same kernel version and a similar configuration/flags). You should use the exact same version of CPython. You should also compile CPython yourself using the same compiler with the same flags (or possibly use equivalent pre-compiled binaries). The version of libraries/packages often matter too. This is especially true for the libc (standard C library) and GMP (library to efficiently compute big numbers). You should also use a partition with the same file-system if you want to measure things related to it. I am probably missing many important points. Sometimes a code can be much slower than another just because it is not aligned the same way in memory (and alignment policy changes regarding the target platform)! Even the size of the code matters a lot. This is why you should check why the same code is slower/faster on a different context using a profiler (eg. perf on Linux for example or possibly VTune on Windows).

Here are related posts:

Here are interesting talks:

Answered By – Jérôme Richard

Answer Checked By – Dawn Plyler (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *