I have a DataFrame with a column
name that includes string data-type. I want to check if entries of this column exist in a Reference list. I tried
pandas.apply, but it doesn’t work.
import pandas as pd data = [('A', '10'), ('B', '10'), ('C', '10'), ('D', '10'), ('E', '20'), ('F', '20'), ('G', '25') ] data_df = pd.DataFrame(data, columns = ['name', 'value'])
reference = ['A', 'B', 'Z'] def is_in_reference(x, reference): if x in reference: return 'Yes' else: return 'No' data_df['is_in_reference'] = data_df['name'].apply(is_in_reference, args=(reference))
But, I get the error:
TypeError: is_in_reference() takes 2 positional arguments but 4 were given
I appreciate it if you could help me on this.
You can actually use the built-in
Series.isin function as in
data_df['is_in_reference'] = data_df['name'].isin(reference)
But since you asked about
apply, the fix is actually a small yet nefarious Python syntax issue, you MUST add a trailing comma in the args tuple:
data_df['is_in_reference'] = data_df['name'].apply(is_in_reference, args=(reference,))
(reference,), otherwise Python does not turn this into a tuple.
Answered By – tankthinks
Answer Checked By – Senaida (BugsFixing Volunteer)