[SOLVED] Drop DataFrame Row Based on Existing Condition

Issue

Given the following pandas df

Holding Account Account Type Column A Column B
Rupert 06 (23938996) Holding Account 1825973 1702598
Rupert 07 (23938996) Holding Account 1697870 1825973
Caroline 06 (0131465) Holding Account 11112222 5435450
Caroline 07 (0131465) Holding Account 7896545 11112222

I had been trying to find a way to do the following –

  • Step 1 – For the entire df, search for instances of Column B values appearing in Column A
    (example: Column B of Caroline 07 == Column A of Caroline 06)
  • Step 2 – Rows that meet the above criteria should have there Column B value changed to the Column B value of the row who they matches
    (example: Caroline 7 Column B value will change from 11112222 to 5435450

This means the pandas df will now look as follows –

Holding Account Account Type Column A Column B
Rupert 06 (23938996) Holding Account 1825973 1702598
Rupert 07 (23938996) Holding Account 1697870 1702598
Caroline 06 (0131465) Holding Account 11112222 5435450
Caroline 07 (0131465) Holding Account 7896545 5435450

Code to achieve this: the following code achieves Steps 1 & 2 –

import numpy as np
df['Column B'] = np.where(df['Column B'].isin(df['Column A'].values),df['Column B'].shift(),df['Column B'])

Where I need help: I would like to expand the code, to include the following:

  • Step 3 – Delete the row whose Column A value matched the other row’s Column B, once Step 2 is complete
    (example: Rupert 06 (23938996) and Caroline 06 (0131465) would be deleted, as they received Column B values from Rupert 07 (23938996) and Caroline 07 (0131465).
Holding Account Account Type Column A Column B
Rupert 07 (23938996) Holding Account 1697870 1702598
Caroline 07 (0131465) Holding Account 7896545 5435450

Does anyone know how I can expand the code appropriately?

Solution

Instead of using np.where, just compute some masks:

rows_to_remove = df['Column A'].isin(df['Column B'])
df.loc[df['Column B'].isin(df['Column A'].values), 'Column B'] = df.loc[rows_to_remove, 'Column B'].to_numpy()
df = df[~rows_to_remove]

Output:

>>> df
         Holding Account     Account Type  Column A  Column B
1   Rupert 07 (23938996)  Holding Account   1697870   1702598
3  Caroline 07 (0131465)  Holding Account   7896545   5435450

Answered By – richardec

Answer Checked By – Cary Denson (BugsFixing Admin)

Leave a Reply

Your email address will not be published. Required fields are marked *