[SOLVED] How to pivot a pandas dataframe such that unique values across multiple columns become new columns?

Issue

I have a pandas dataframe of the form:

df

    col_1      col_2      col_3      col_4
ID
1     A          B          C          A
2     B          D
3     A          C          B

df = pd.DataFrame({'col_1':['A','B','A'], 'col_2':['B','D','C'], 'col_3':['C',np.NaN,'B'], 'col_4':['A', np.NaN, np.NaN]}, index=[1,2,3])

Note that the values repeated across the columns are not accidental- they refer to the same entities (A in col_1 is the same as A in col_4, for instance). I am trying to pivot the values of this dataframe so that these unique values become the new columns. For instance, df would become:

new_df

      A      B      C      D
ID
1     2      1      1      0
2     1      0      0      1
3     1      1      1      0

The new values represent counts. I have tried pd.get_dummies() but it doesn’t give me what I want. What is the most intuitive way to achieve this?

Solution

IIUC using stack with str.get_dummies

df.stack().loc[lambda x : x!=''].str.get_dummies().sum(level=0)
    A  B  C  D
ID            
1   2  1  1  0
2   0  1  0  1
3   1  1  1  0

Answered By – BENY

Answer Checked By – Gilberto Lyons (BugsFixing Admin)

Leave a Reply

Your email address will not be published. Required fields are marked *