I am interested in calculating statistics in rolling windows on large, 1D numpy arrays. For small window sizes, using numpy strides (a la
numpy.lib.stride_tricks.sliding_window_view) is faster than pandas rolling window implementation, but the opposite is true for large window sizes.
Consider the following:
import numpy as np from numpy.lib.stride_tricks import sliding_window_view import pandas as pd data = np.random.randn(10**6) data_pandas = pd.Series(data) window = 2 %timeit np.mean(sliding_window_view(data, window), axis=1) # 19.3 ms ± 255 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit data_pandas.rolling(window).mean() # 34.3 ms ± 688 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) window = 1000 %timeit np.mean(sliding_window_view(data, window), axis=1) # 302 ms ± 8.01 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit data_pandas.rolling(window).mean() # 31.7 ms ± 958 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) result_numpy = np.mean(sliding_window_view(data, window), axis=1) result_pandas = data_pandas.rolling(window).mean()[window-1:] np.allclose(result_numpy, result_pandas) # True
The pandas implementation is actually faster for a larger window size, whereas the numpy implementation is much slower.
What is going on under the hood with pandas, and how can we get similar performance using numpy?
How can I get similar performance on large windows in numpy as compared to pandas?
TL;DR: The two versions use very different algorithms.
sliding_window_view trick is good to solve the rolling average problem with a small window but this is not a clean way to do that nor an efficient way, especially with a big window. Indeed, Numpy compute a mean and note a rolling average and thus have no clear information that the user is cheating with stride to compute something else. The provided Numpy implementation runs in
O(n * w) where
n is the array size and
w the window size. Pandas does have the information that a rolling average needs to be computed and so it uses a much more efficient algorithm. The Pandas algorithm runs in
O(n) time. For more information about it please read this post.
Here is a much faster Numpy implementation:
cumsum = np.cumsum(data) invSize = 1. / window (cumsum[window-1:] - np.concatenate([, cumsum[:-window]])) * invSize
Here are the performance results on my machine:
Naive Numpy version: 193.2 ms Pandas version: 33.1 ms Fast Numpy version: 8.5 ms
Answered By – Jérôme Richard
Answer Checked By – Pedro (BugsFixing Volunteer)