[SOLVED] Creating dataframe with multi level column index from from four 2d numpy arrays

Issue

I have four 2d numpy arrays:

import numpy as np
import pandas as pd    
x1 = np.array([[2, 4, 1],
       [2, 2, 1],
       [1, 3, 3],
       [2, 2, 1],
       [3, 3, 2]])
x2 = np.array([[1, 2, 2],
       [4, 1, 4],
       [1, 4, 4],
       [3, 3, 2],
       [2, 2, 4]])

x3 = np.array([[4, 3, 2],
       [4, 3, 2],
       [4, 3, 3],
       [1, 2, 2],
       [1, 4, 3]])      
x4 = np.array([[3, 1, 1],
       [3, 4, 3],
       [2, 2, 1],
       [2, 1, 1],
       [1, 2, 4]])

And I would like to create a dataframe as following:

level_1_label = ['location1','location2','location3']
level_2_label = ['x1','x2','x3','x4']
header = pd.MultiIndex.from_product([level_1_label, level_2_label], names=['Location','Variable'])
df = pd.DataFrame(np.concatenate((x1,x1,x3,x4),axis=1), columns=header)
df.index.name = 'Time'
df

enter image description here

Data in this DataFrame is not in the desired form.

I want the four columns (x1,x2,x3,x4) in the first level column label (location1) should be created by taking the first columns from all the numpy arrays. The next four columns (x1,x2,x3,x4) ie. the four columns in the second first level column label (location2) should be created by taking second columns from all four numpy arrays and so on. The length of first level column label ie. len(level_1_label) will be equal to the number of columns in all four 2d numpy arrays.

Desired DataFrame:

enter image description here

Solution

One option is to reshape the data in Fortran order, before creating the dataframe:


# reusing your code
level_1_label = ['location1','location2','location3']
level_2_label = ['x1','x2','x3','x4']
header = pd.MultiIndex.from_product([level_1_label, level_2_label], names=['Location','Variable'])

# np.vstack is just a convenience wrapper around np.concatenate, axis=1
outcome = np.reshape(np.vstack([x1,x2,x3,x4]), (len(x1), -1), order = 'F')
df = pd.DataFrame(outcome, columns = header)
df.index.name = 'Time'

df

Location location1          location2          location3
Variable        x1 x2 x3 x4        x1 x2 x3 x4        x1 x2 x3 x4
Time
0                2  1  4  3         4  2  3  1         1  2  2  1
1                2  4  4  3         2  1  3  4         1  4  2  3
2                1  1  4  2         3  4  3  2         3  4  3  1
3                2  3  1  2         2  3  2  1         1  2  2  1
4                3  2  1  1         3  2  4  2         2  4  3  4

Answered By – sammywemmy

Answer Checked By – David Marino (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *