[SOLVED] Change field in row if other row has same value in different field

Issue

I’m looking for a Python-based solution to a common task I often run into. (I know, there must be cool SQL-magic for this, but… not now.)

Given a table, we want to change the field f2 if oid is the same.
(Assumed that f2 always contains bar or is empty if oid is 42):

id f1 f2 oid
1 foo bar 42
2 foo 49
3 baz 42

Becomes:

id f1 f2 oid
1 foo bar 42
2 foo 49
3 baz bar 42

I’m currently using itertools.combinations to avoid repeated comparison, which seems more efficient than a double for-loop. But for very large datasets, it still becomes unhandy… considering the Big-O-Notation.

Can the problem be solved more elegantly via pandas or numpy?

Solution

Assumed that f2 always contains bar or is empty if oid is 42

Maybe we could use groupby + transform first as well:

df['f2'] = df.replace('', np.nan).groupby('oid')['f2'].transform('first').fillna('')

Output:

   id   f1   f2  oid
0   1  foo  bar   42
1   2  foo        49
2   3  baz  bar   42

Answered By – enke

Answer Checked By – Willingham (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *