# [SOLVED] How to get mean and std from a dictionary of dataframe per each key

## Issue

Here is my dilemma:

I got a dictionary of dataframes like this:

``````dict_df[key]

m1      m2  m3  m4  m5  m6
10410   5   10  21  33  11
15387   3   10  33  45  13
19026   4   16  27  40  11
26083   5   21  16  29  9
27806   4   17  23  31  7
43820   2   12  27  40  18
49199   7   22  30  38  11
50094   4   9   13  18  4
``````

Per each key, it returns a DF with the same column names.

For each key, I need to store the mean and standard deviation of a set of features (let’s take for example m2, m3, m4).

In the end, I want to obtain something like this df below (the numbers are totally random):

``````key   m2_mean    m2_std   m3_mean   m3_std    m4_mean     m4_std
key1    12       55         793      438       44           95
key2    14       442        21       43        14           442
key3    44       1          66       11        42           42
key4    42       42         2        23        98           70
``````

The dataset is not that big, so even if the code is slow should be fine.

Thanks for the help and have a good one!

## Solution

First, let’s define some sample data:

``````>>> df1 = pd.DataFrame({
"col1": [1, 2, 3],
"col2": [4, 5, 6],
})
>>> df2 = pd.DataFrame({
"col1": [7, 8, 9],
"col2": [10, 11, 12],
})
>>> dict_df = {
"df1": df1,
"df2": df2,
}
``````

Now, you can use `.agg()` to calculate the mean and std of your dataframe (I’ve used max for simplicity), `.stack()` to reduce the dataframe into a single series, and `.to_dict()` to generate a representation of this result as a string. Notice that we’ll only use one of the dataframes (`df1`) to show this result:

``````>>> df1.agg(["mean", "max"]).stack().to_dict()
{('mean', 'col1'): 2.0, ('mean', 'col2'): 5.0, ('max', 'col1'): 3.0, ('max', 'col2'): 6.0}
``````

With this dict representation, we can use `pd.DataFrame.from_dict` to build a single dataframe with the metrics for each value in `dict_df`:

``````>>> df = pd.DataFrame.from_dict({
df_name: df[["col1", "col2"]].agg(["mean", "max"]).stack().to_dict()
for df_name, df in dict_df.items()
}, orient="index")
>>> df
mean        max
col1  col2 col1  col2
df1  2.0   5.0  3.0   6.0
df2  8.0  11.0  9.0  12.0
``````

The only important difference with your expected output is in the column names, but we can solve that manually:

``````>>> df.columns = ["_".join(column) for column in df.columns]
>>> df
mean_col1  mean_col2  max_col1  max_col2
df1        2.0        5.0       3.0       6.0
df2        8.0       11.0       9.0      12.0
``````

Code that would do the trick for you:

``````>>> target_columns = ["m2", "m3", "m4"]
>>> df = pd.DataFrame.from_dict({
key: df[target_columns].agg(["mean", "std"]).stack().to_dict()
for key, df in dict_df.items()
}, orient="index")
>>> df.columns = ["_".join(column) for column in df.columns]
>>> df.index.name = "key"
``````