[SOLVED] How to mask data using python regular expression in dataframes

Issue

I want to replace dataframe patterns using regular expressions

For example, I’ve following table. I want to replace account number digits with N, e.g. if the account is 5 numbers then it should be replaced with five N’s NNNNN.

Source
Account_Num,Facility Name,Address,City
10605,SAGE MEMORIAL HOSPITAL,STATE ROUTE 264 SOUTH 191,GANADO
2425,WOODRIDGE BEHAVIORAL CENTER,600 NORTH 7TH STREET,XDSDSD

Target

Account_Num,Facility Name,Address,City
NNNNN,AAAA AAAAAAAA AAAAAAA,STATE ROUTE 264 SOUTH 191,GANADO
NNNN,WOODRIDGE BEHAVIORAL CENTER,600 NORTH 7TH STREET,XDSDSD

I was trying with following code:

print(df.replace(to_replace=(\[re.search(r'\\d+',str(df_str))\]),value='NNNNN', regex=True))

Solution

You can use .replace with multiple regular expression conditions:

df = df.astype(str).replace([r'[a-zA-Z]', '\d'], ['A', 'N'], regex=True)

Output:

>>> df
  Account_Num                Facility Name                    Address    City
0       NNNNN       AAAA AAAAAAAA AAAAAAAA  AAAAA AAAAA NNN AAAAA NNN  AAAAAA
1        NNNN  AAAAAAAAA AAAAAAAAAA AAAAAA       NNN AAAAA NAA AAAAAA  AAAAAA

Answered By – richardec

Answer Checked By – Marilyn (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *