[SOLVED] Pandas groupby method to aggregate based on string contained in column

Issue

New to Pandas/Python (student). I have what should be a simple problem but every approach I try fails.

Dataset has "country" column and "indicator" column. Countries appear >1 time. Indicator col tells us who is pro-vaccine ("Vac_plan" and "Vac_done") and who is not (as well as other info). I simply want a total for each country based on the count of who is pro-vaccine for that respective country., e.g.,

Ethiopia  7
Nigeria   5

My latest failed attempts are below:

vaccines_by_country=df.groupby('country')['indicator'=='Vac_plan|Vac_done'].count()

and…

df.groupby(['country']).str.contains('Vac_plan|Vac_done').count() 

TIA for your merciful help.

Solution

You’re quite close in your second attempt; you just need to reverse the order of actions. First find the strings, then group:

df['indicator'].str.contains('Vac_plan|Vac_done').groupby(df['country']).sum()

Answered By – richardec

Answer Checked By – Candace Johnson (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *