[SOLVED] Best way to loop through a filtered pandas Dataframe

Issue

I need to loop through a pandas DataFrame, but first I have to filter it. I need to look at how many "old_id"s are attached to each new ID.

I wrote this code and is working fine, but it doesn’t scale really well.

d = dict()

for new_id in (new_id_list):
    
    d[new_id] = df[df['new_id_col'] == new_id]['old_id'].nunique()

How can I make this more efficient?

Solution

Looks like you’re looking for groupby + nunique. This fetches the number of unique "old_id"s per "new_id_col":

out = df.groupby('new_id_col')['old_id'].nunique().to_dict()

Answered By – enke

Answer Checked By – Katrina (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *