Issue
I’m working on a dataframe of Netflix’s movies. I have a column which has the release year of each one and I would like to group this column by decade.
The column data type is int64 and when I group my df by release year it looks like this:
dates = df.groupby("release_year", as_index=False).count()
dates.sort_values('listed_in', ascending= False).head(10)
release_year listed_in
70 2018 1147
69 2017 1032
71 2019 1030
72 2020 953
68 2016 902
73 2021 592
67 2015 560
66 2014 352
65 2013 288
64 2012 237
Now I want to group them by decade. I’ve tried this:
dates.apply(lambda x: (x//10)*10).count()
But it doesn’t work.
What should I do instead?
Thanks in advance!
Solution
Try:
out = df.groupby(df["release_year"] // 10).count()
out.index.name = "decade"
out = out.reset_index().assign(decade=out.index * 10)
print(out)
Prints:
decade release_year listed_in
0 2010 8 8
1 2020 2 2
Answered By – Andrej Kesely
Answer Checked By – Clifford M. (BugsFixing Volunteer)