I have a table with people and their hometown names, but there are same cities with different written, see:
And I need to process this in order to formalize that data to be like this:
- The dataset has more than 2400 distinct values so I can’t hard code.
- I have a Country table dimension with all cities and their correct names.
I tried this stack and would it be exactly what I need but does not work with my entire dataset.
Consider below approach (considering you have lookup table with all proper cities names) for purpose of example – I have it as CTE with just few ones
with cities as ( select 'São Paulo' as city union all select 'Los Angeles' union all select 'Dnipro' union all select 'Kyiv' ) select Name, City as Hometown from your_table left join cities on soundex(Hometown) = soundex(city)
if applied to sample data in your question – output is
Note: you obviously need to take care of potential duplication in case if some cities sounds similar, in this case adding country constraints might help …
Answered By – Mikhail Berlyant
Answer Checked By – Terry (BugsFixing Volunteer)