Table of Contents

## Issue

Given the following `numpy`

arrays:

```
import numpy as np
import time
v1 = np.linspace(20, 250, 100000000)
a = np.array([12.592,16.320])
m = np.array([3, 5])
```

How is it possible that list comprehension:

```
start = time.time()
v2 = np.max(
[10 ** _a * np.power(v1.astype(float), -_m) for _m, _a in zip(m, a)],
axis=0
)
end = time.time()
print(end - start) # prints 5.822041034698486
```

is more than twice as fast than `numpy`

‘s `broadcasting`

?

```
start = time.time()
v2 = np.max(
np.power(10, a) * np.power(v1.astype(float)[:, None], -m).reshape(
v1.shape[0],-1),
axis=1
)
end = time.time()
print(end - start) # prints 12.292157173156738
```

And what is the "fastest approach" of calculating `v2`

?

## Solution

**Both are inefficient**. Indeed, you can rewrite the operation much more efficiently using **mathematical property** of the natural logarithm and the exponential. Here is an equivalent (unoptimized) expression that can be further optimized:

```
v2 = np.max(
[np.exp(_a * np.log(10) - _m * np.log(v1.astype(float))) for _m, _a in zip(m, a)],
axis=0
)
```

Since `np.log(v1.astype(float)))`

is a constant, you can **pre-compute** it. Actually, `v1`

items are already of type `float`

(note that `float`

means `np.float64`

for Numpy and not the CPython `float`

object type). `np.log(v1)`

will do the job correctly (unless `v1`

is set to an array of `np.float32`

for example). Additionally, `np.exp`

can be computed only on the final result (ie. after computing `np.max`

) since `a < b`

is equivalent to `e**a < e**b`

. Finally, you can use **in-place** operations to avoid creating several expensive **temporary arrays**. This can be done using the `out`

parameter of many functions like `np.subtract`

or `np.multiply`

for example.

Here is the resulting code:

```
log_v1 = np.log(v1)
tmp = np.empty((len(a), v1.size), dtype=np.float64)
v2 = np.exp(np.max(
[np.subtract(_a * np.log(10), np.multiply(log_v1, _m, out=_tmp), out=_tmp) for _m, _a, _tmp in zip(m, a, tmp)],
axis=0,
out=tmp[0]
), out=tmp[0])
```

For even faster performance, you can simply use **Numba** so to avoid writing huge array in memory (which is slow). Numba can also use **multiple threads** to speed this up a lot. Here is an example:

```
import numba as nb
# Note that fastmath=True can cause issues with values likes NaN Inf.
# Please disable it if your input array contains such spacial values.
@nb.njit('float64[::1](float64[::1], float64[::1], float64[::1])', fastmath=True, parallel=True)
def compute(v1, m, a):
assert a.size > 0 and a.size == m.size
out = np.empty(v1.size, dtype=np.float64)
log10 = np.log(10)
for i in nb.prange(v1.size):
log_v1 = np.log(v1[i])
maxi = a[0] * log10 - m[0] * log_v1
for j in range(1, len(a)):
value = a[j] * log10 - m[j] * log_v1
if value > maxi:
maxi = value
out[i] = np.exp(maxi)
return out
v2 = compute(v1, m.astype(float), a)
```

## Benchmark

Here are results on my 6-core machine:

```
Initial code with big arrays: 10.013 s (inefficient)
Initial code with lists: 8.147 s (inefficient)
Optimized Numpy code: 2.009 s (memory-bound)
Optimized Numba code: 0.300 s (compute-bound)
```

As you can see, **Numba is much faster** than the other methods: about **30 times faster**.

Answered By – Jérôme Richard

Answer Checked By – Clifford M. (BugsFixing Volunteer)