# [SOLVED] Speeding up group-wise differencing in Pandas

## Issue

Consider the following solution to computing a within-group diff in Pandas:

``````df =  df.set_index(['ticker', 'date']).sort_index()[['value']]
df['diff'] = np.nan
idx = pd.IndexSlice

for ix in df.index.levels:
df.loc[ idx[ix,:], 'diff'] = df.loc[idx[ix,:], 'value' ].diff()
``````

For:

``````> df
date ticker  value
0    63      C   1.65
1    88      C  -1.93
2    22      C  -1.29
3    76      A  -0.79
4    72      B  -1.24
5    34      A  -0.23
6    92      B   2.43
7    22      A   0.55
8    32      A  -2.50
9    59      B  -1.01
``````

It returns:

``````> df
value  diff
ticker date
A      22     0.55   NaN
32    -2.50 -3.05
34    -0.23  2.27
76    -0.79 -0.56
B      59    -1.01   NaN
72    -1.24 -0.23
92     2.43  3.67
C      22    -1.29   NaN
63     1.65  2.94
88    -1.93 -3.58
``````

The solution does not scale well for large dataframes. It takes minutes for a dataframe with a shape `(405344,2)`. This is presumably the case because I am iterating through each value for the first level in the main loop.

Is there any way of speeding this up in Pandas? Is looping through index values a good way of solving this problem? Could `numba` perhaps be used for this?

## Solution

Here’s another way, which ought to be a lot faster.

First, sort based on ticker and date:

``````In : df = df.set_index(['ticker', 'date']).sort_index()

In : df
Out:
value
ticker date
A      22     0.55
32    -2.50
34    -0.23
76    -0.79
B      59    -1.01
72    -1.24
92     2.43
C      22    -1.29
63     1.65
88    -1.93
``````

``````In : df['diff'] = df['value'].diff()
``````

To fill in the NaNs, we can find the first line as follows (there may be a nicer way):

``````In : s = pd.Series(df.index.labels)

In : s != s.shift()
Out:
0     True
1    False
2    False
3    False
4     True
5    False
6    False
7     True
8    False
9    False
dtype: bool

In : df.loc[(s != s.shift()).values 'diff'] = np.nan

In : df
Out:
value  diff
ticker date
A      22     0.55   NaN
32    -2.50 -3.05
34    -0.23  2.27
76    -0.79 -0.56
B      59    -1.01   NaN
72    -1.24 -0.23
92     2.43  3.67
C      22    -1.29   NaN
63     1.65  2.94
88    -1.93 -3.58
``````