[SOLVED] Why does np.select not allow me to put in index above total length into choicelist?

Issue

I am trying to get the first value of the the list in each row of df[‘Emails’] but in real life (this is a sample df) I don’t know what the length of the list will be so I am just assuming that the longest will be length of 5 and then trying to whittle it down until I find the right length and selecting that index position but I am getting IndexError: index 5 is out of bounds for axis 0 with size 2 and I can’t figure out what to do about it. Any help appreciated. Thanks.

my current code:

df = pd.DataFrame({'Emails': [['[email protected]', '[email protected]', '[email protected]'],[None, '[email protected]']],
                   'num_wings': [2, 0],
                   'num_specimen_seen': [10, 2]},
                  index=['falcon', 'dog'])
df['Emails'] = np.select([df['Emails'][0],df['Emails'][1],df['Emails'][2]],[df['Emails'][0],df['Emails'][1],df['Emails'][2]])
print(data['Emails'])

Expected output:

Assuming the original dataframe has None in the first index position I want it to take the next index position that isn’t None

Desired Output

              Emails  num_wings  num_specimen_seen
falcon   [email protected]          2                 10
dog     [email protected]          0                  2

Solution

Whenever you have a column containing lists, explode will often be your friend, and this is the case here.

Use explode, groupby(level=0) (to group on the 0th (first) level of the index), and first (which selects the first non-null value (including None, NaN, etc.))

df['Emails'] = df['Emails'].explode().groupby(level=0).first()

Output:

>>> df
               Emails  num_wings  num_specimen_seen
falcon    [email protected]          2                 10
dog     [email protected]          0                  2

Answered By – richardec

Answer Checked By – Dawn Plyler (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *