# [SOLVED] Creating a function to standardize categorical variables (python)

## Issue

I don’t know if it is right to say "standardize" categorical variable string, but basically I want to create a function to set all observations F or f in the column below to 0 and M or m to 1:

``````
> df['gender']

gender
f
F
f
M
M
m

``````

I tried this:

``````def padroniza_genero(x):
if(x == 'f' or x == 'F'):
replace(['f', 'F'], 0)
else:
replace(1)

``````

But I got an error:

``````NameError: name 'replace' is not defined

``````

Any ideas? Thanks!

## Solution

There is no `replace` function defined in your code.

Back to your goal, use a vector function.

Convert to lower and map f->0, m->1:

``````df['gender_num'] = df['gender'].str.lower().map({'f': 0, 'm': 1})
``````

Or use a comparison (not equal to f) and conversion from boolean to integer:

``````df['gender_num'] = df['gender'].str.lower().ne('f').astype(int)
``````

output:

``````  gender  gender_num
0      f           0
1      F           0
2      f           0
3      M           1
4      M           1
5      m           1
``````

#### generalization

you can generalize to ant number of categories using `pandas.factorize`. Advantage: you will get a real `Categorical` type.

NB. the number values is set depending on whatever values comes first, or lexicographic order if `sort=True`:

``````s, key = pd.factorize(df['gender'].str.lower(), sort=True)
df['gender_num'] = s

key = dict(enumerate(key))
# {0: 'f', 1: 'm'}
``````

Answered By – mozway

Answer Checked By – Gilberto Lyons (BugsFixing Admin)