[SOLVED] Get minimum value among the columns and apply function on the result dynamically

Issue

I have a data frame df_temp:

index       timestamp             Output_Energy         Elevation       one     two  three
900 2016-06-20 15:00:00+05:30   48.49896354907767   54.46957706140074   25.0    30.0  40.0
901 2016-06-20 15:01:00+05:30   48.3233075379559    54.2501389347442    0.0     0.0    0.0
902 2016-06-20 15:02:00+05:30   48.146607818280856  54.03070866058405   0.0     0.0    0.0
903 2016-06-20 15:03:00+05:30   47.96891621606703   53.811287065389024  0.0     0.0    0.0
904 2016-06-20 15:04:00+05:30   47.79023387111256   53.59187496519667   0.0     0.0    0.0
905 2016-06-20 15:05:00+05:30   47.61054187943819   53.37247316433761   0.0     0.0    0.0
906 2016-06-20 15:06:00+05:30   47.42985481089121   53.153082456254396  0.0     0.0    0.0
907 2016-06-20 15:07:00+05:30   47.248157187430365  52.93370362427002   0.0     0.0    0.0
.
.

Firstly, I want to get the min value among the columns one, two, three. To do this I am using simple if-else conditions like this:

for i in range(0, len(df_temp)-1):
  if (df_temp.one[i]<df_temp.two[i] and df_temp.one[i]<df_temp.three[i]):
    do something
  elif (df_temp.two[i]<df_temp.one[i] and df_temp.two[i]<df_temp.three[i]):
    do something
  elif (df_temp.three[i]<df_temp.one[i] and df_temp.three[i]<df_temp.two[i]):
    do something

I would like to know a better way to do it as I want to make the code flexible for any number of columns.

Secondly, based on the result (i.e., column one has min value) I get, I want to perform an operation and another operation on the left-out columns (i.e., two and three) using following pre-defined function:

def charge_1(i):
  bat_state1=Battery1.charge(df_temp.timestamp[i],df_temp)
  bat_state2=Battery2.discharge(df_temp.timestamp[i])
  bat_state3=Battery3.discharge(df_temp.timestamp[i])
  df_temp.at[i+1,'one']=bat_state1
  df_temp.at[i+1,'two']=bat_state2
  df_temp.at[i+1,'three']=bat_state3
  return bat_state1, bat_state2, bat_state3

NOTE: This function is calling class methods charge and discharge.

Therefore, I would like to have suggestions on how I can make this function dynamic which perform operation based on result from previous task of finding min value.
For instance, if column two has min value charge method is called on Battery2 and discharge on others.

Open for clarifications, if needed

Solution

To find the minimum value in each column you have to apply df.min(), but you want to know which column has the minimum value. First we find the global minima and then compare that value to the minimum in each column:

>> target_columns = ["one", "two", "three"]
>> global_minima = df[target_columns].min().min()
>> column_has_global_minima = (df[target_columns].min() == global_minima)
>> column_has_global_minima
one     True
two     False
three   False
dtype: bool
>> minimum_columns = column_has_global_minima[column_has_global_minima].index.values
>> minimum_columns
['one']

For the second part of your question, I would suggest using the output of column_has_global_minima, a pandas.Series with boolean values indicating if the column minima matches the global minima (please consider what happens if two or more columns are tied in said value).

Assuming that you want to prepare your function to match any number of columns, you’ll need to contain those Battery{i} objects in a sequence, maybe a list. In my answer, I assume that those objects are stores in the list batteries, and that the order of that list is the same as in target_columns (there’s a correspondence between both). Also, I store the bat_state values in an output list (bat_states) and return that list (although maybe you want to unpack the list and return everything as a tuple: return *bat_states):

def charge_1(i):
    bat_states = list()
    for j, (column_name, column_contains_min) in enumerate(column_has_global_minima.items()):
        if column_contains_min:
            bat_state = batteries[j].charge(df_temp.timestamp[i], df_temp)
        else:
            bat_state = batteries[j].discharge(df_temp.timestamp[i])
        df_temp.at[i+1, column_name] = bat_state
        bat_states.append(bat_state)
    return *bat_states

Answered By – aaossa

Answer Checked By – Pedro (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *