Issue
I try and defined a function to process a df (like adding columns and convert all cols head to lower case) before doing the analysis. All other line works fine except the line that I tried to rearrange the columns orders.
the function looks like this
def cleanDf(df):
df.columns = df.columns.str.replace(' ','_')
df.columns = df.columns.str.lower()
df['date1'] = pd.to_datetime(df['date'].astype(str) + ' ' + df['time'].astype(str))
df['weekday'] = df['date1'].dt.day_name()
business_hour_mask = (df['date1'].dt.hour >=9) & (df['date1'].dt.hour <=18)
df['business_hour'] = np.where(business_hour_mask, "Yes","No")
df['week_number'] = df.date1.dt.week
df = df.reindex(['date1','week_number','weekday','business_hour','changed_by','customer','field_name','new_value','old_value','new_value.1','old_value.1','date','time','company_code','sales_organization','distribution_channel','division'], axis=1)
#problem line, i've tried both with and without "df = " in front of this line
return df
my current workaround is to insert that line after i call the function then it works
cleanDf(df)
df = df.reindex(['date1','week_number','weekday','business_hour','changed_by','customer','field_name','new_value','old_value','new_value.1','old_value.1','date','time','company_code','sales_organization','distribution_channel','division'], axis=1)
df.head()
Appreciate if you can advise why the line does not inside the function, but ok when executed separately.
thank you very much
Solution
It’s because you’re reassigning the df
variable inside the function, where it’s just a parameter. Since you’re returing df
though, it’s simple. Just write df = cleanDf(df)
instead of just cleanDf(df)
:
df = cleanDf(df)
df.head()
Per @mozway’s comment, you should also define your cleanDf
function like so:
def cleanDf(df):
df = df.copy()
# ... do your stuff ...
return df
Answered By – richardec
Answer Checked By – Robin (BugsFixing Admin)