[SOLVED] How to compare current row with previous 2 rows based on certain conditions in Python

Issue

I have a dataset similar to below, I want to compare all rows for each inventory item based on the conditions that

  1. if "value" column is not null, then flag it "yes"

  2. Else if "value" column for all row (per each inventory) is null, then evaluate the "item" column, if either "item" column is not null, then that row should be flagged "yes". Or 1 record under "value" is null where 2 records under "value" is not null, then those 2 rows need to be evaluated based on the item

  3. In the case that if both "value" and "item" are both null(per each inventory), evaluate the "year" column and flag the most updated year "yes". Or "value" is null but 1 record under "item" is null where 2 records under "item" is not null, then evaluate the "year" column and flag the most updated year "yes"

  4. df[‘flag’].fillna(‘no’,inplace=True)

I can set up this if then else logic but I don’t know how to compare current row with 2 previous rows. The groupby with transform and custom function are great advices and I wonder how to capture all these scenarios or if there is a better way.

Below is how the simplified df would look like. In my real dataset, as described above, there are also cases that for the same inventory, "value" column are all null, but there are 2 rows under "item" are not null, in that case, "year" column needs to be evaluated for those 2 rows.

df1 = { 'inventory':['inv1','inv1','inv1','inv2','inv2','inv2','inv3','inv3','inv3'],
  'value':['xyz','','','','','','','',''],
  'item':['','304','304','','205','','','',''],
   'year':[2020,2020,2020,2020,2020,2020,2019,2018,2020]}

df1=pd.DataFrame(df1)

desired output would be like below – adding a flag column to flag ‘yes’/’no’ based on the above multiple condtions.

enter image description here

Solution

By borrowing both @Peter Leimbigler and @richardec ideas, I modified the logic as below and it works for my real work situation (which is much more complex than what I presented in the question) because there are different numbers of rows for each inventory item and different combination on "value","item" and "year".

The only down side for my solution is I am not able to use groupby, so I had to create a loop to create individual df for each inventory item and append them in a full dataframe. I hope there would be a better way than that.

df1 = df1.replace('', np.nan)

def make_flag_col(df,n):
    comment_len = len(df.loc[df['value'].notna(),:])
    jus_len = len(df.loc[df['item'].notna(),:])
    maxi = df['year'].max()
    n = df.shape[0]
    if comment_len == 1:
        df.loc[df['value'].notna(),'flag'] = 'keep'
    elif comment_len >=2 or comment_len == 0:
        if jus_len == 1:
            df.loc[df['item'].notna(),'flag'] = 'keep'
        else:
            df.loc[df['year']==maxi,'flag'] = 'keep'
    else:
        print('out of pattern, check')
        return df

l = df1['inventory'].unique().tolist()

full_df = []
for itm in l:
    df_test = df1[df1['inventory']== itm].copy()
    df_test = df_test.sort_values(by=['year']).reset_index()
    df_row_number = df_test.shape[0]

    make_flag_col(df_test,df_row_number - 1)

    for i in range(len(df_test)):
        row = df_test.iloc[i]
        whole = list(row)
        full_df.append(whole)

full_df = pd.DataFrame(full_df,columns=list(df_test.columns))
full_df['flag'].fillna('remove',inplace=True)

Answered By – user032020

Answer Checked By – Clifford M. (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *