Issue
I have a dataframe like this (assuming one column):
column
[A,C,B,A]
[HELLO,HELLO,ha]
[test/1, test/1, test2]
The type of the column above is:
dtype(‘O’)
I would like to remove the duplicates here, resulting in:
column
[A,C,B] # - A
[HELLO, ha] # removing 1 hello
[test/1, test2] # removing 1 test/1
Then, I would like to sort the data
column
[A,B,C]
[ha, HELLO]
[test2, test/1] # assuming that number comes before /
I am struggling getting this done in a proper way. Hope anyone has nice ideas (would it make sense to transform to small lists?)
Solution
Assuming that you have lists in the column, use a list comprehension.
If you want to maintain order:
df['column_keep_order'] = [list(dict.fromkeys(x)) for x in df['column']]
If you want to sort the items:
df['column_sorted'] = [sorted(set(x)) for x in df['column']]
output:
column column_keep_order column_sorted
0 [A, C, B, A] [A, C, B] [A, B, C]
1 [HELLO, HELLO, ha] [HELLO, ha] [HELLO, ha]
2 [test/1, test/1, test2] [test/1, test2] [test/1, test2]
reproducible input:
df = pd.DataFrame({'column': [['A','C','B','A'],
['HELLO','HELLO','ha'],
['test/1', 'test/1', 'test2']]})
Answered By – mozway
Answer Checked By – Cary Denson (BugsFixing Admin)