## Issue

I have a dataframe that has 7 columns. I want to create an 8th column with the output of the equation. The equation is: from columns 2-7, for those numbers that are above Col 1, output the minimum value to Col 8.

In excel, I would use this formula in Col8: =MINIFS($Col2:$Col7,$Col2:$Col7,">"&$Col1)

```
Date_Time Col1 Col2 Col3 Col4 Col5 Col6 Col7
20 3/14/2022 0:35 0.68053 0.68048 0.68094 0.6811 0.68111 0.68126 0.68179
21 3/14/2022 0:36 0.6805 0.6805 0.68091 0.68109 0.68111 0.68125 0.68178
22 3/14/2022 0:37 0.68052 0.68051 0.68089 0.68107 0.6811 0.68124 0.68177
23 3/14/2022 0:38 0.68057 0.6805 0.68086 0.68106 0.6811 0.68123 0.68176
24 3/14/2022 0:39 0.68055 0.6805 0.68083 0.68104 0.68109 0.68122 0.68175
```

The output of the above data sample would be:

```
Col8
0.68094
0.68091
0.68089
0.68086
0.68083
```

I have read many posts, and tried with np.where:

```
ColList = [Col2,Col3,Col4,Col5,Col6,Col7]
df['Col8'] = np.where(df[ColList ].min(axis=1) > df['Col1'],df[ColList ].min(axis=1),np.nan)
```

But it is finding the minimum number in the list and then checking if that minimum number is greater than Col1. What I am looking for is it to do it in reverse: for it to only look at the numbers that are greater than Col1 and return the minimum out of those numbers (or return np.nan if there are no numbers greater than Col1).

I prefer a vectorized solution, as the dataset is quite large. Otherwise, I would just create a function/loop to do this.

## Solution

IIUC, you want the *positively closest* values. One way using masking:

```
s = df[["Col1"]].values
tmp = df.loc[:, "Col2":] - s
tmp[tmp<=0] = np.inf
df["Col8"] = tmp.min(axis=1) + s.ravel()
```

Output:

```
print(df["Col8"])
20 0.68094
21 0.68091
22 0.68089
23 0.68086
24 0.68083
Name: Col8, dtype: float64
```

Logic:

- We first calculate the distance between
`Col1`

and others. - Then any negative distances (i.e. those smaller than
`Col1`

) get replaced by`numpy.inf`

to exclude in minimum comparison.

Performance:

```
df = df.sample(n=100000, replace=True).reset_index(drop=True)
%%timeit
s = df[["Col1"]].values
tmp = df.loc[:, "Col2":] - s
tmp[tmp<=0] = np.inf
df["Col8"] = tmp.min(axis=1) + s.ravel()
# 13.1 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

Answered By – Chris

Answer Checked By – Clifford M. (BugsFixing Volunteer)