[SOLVED] How to check if entries in Pandas DataFrame are in a List using pandas.apply

Issue

I have a DataFrame with a column name that includes string data-type. I want to check if entries of this column exist in a Reference list. I tried pandas.apply, but it doesn’t work.

Sample data:

import pandas as pd

data = [('A', '10'),
        ('B', '10'),
        ('C', '10'),
        ('D', '10'),
        ('E', '20'),
        ('F', '20'),
        ('G', '25') ]

data_df = pd.DataFrame(data, columns = ['name', 'value'])

Sample code:

reference = ['A', 'B', 'Z']


def is_in_reference(x, reference):
    if x in reference:
        return 'Yes'
    else:
        return 'No'
    

data_df['is_in_reference'] = data_df['name'].apply(is_in_reference, args=(reference))

But, I get the error:

TypeError: is_in_reference() takes 2 positional arguments but 4 were given

I appreciate it if you could help me on this.

Solution

You can actually use the built-in Series.isin function as in

data_df['is_in_reference'] = data_df['name'].isin(reference)

But since you asked about apply, the fix is actually a small yet nefarious Python syntax issue, you MUST add a trailing comma in the args tuple:

data_df['is_in_reference'] = data_df['name'].apply(is_in_reference, args=(reference,))

NOTE the , in (reference,), otherwise Python does not turn this into a tuple.

Answered By – tankthinks

Answer Checked By – Senaida (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published.