Table of Contents
Issue
I have a table which is full of arbitrarily formatted phone numbers, like this
027 123 5644
021 393-5593
(07) 123 456
042123456
I need to search for a phone number in a similarly arbitrary format ( e.g. 07123456
should find the entry (07) 123 456
The way I’d do this in a normal programming language is to strip all the non-digit characters out of the ‘needle’, then go through each number in the haystack, strip all non-digit characters out of it, then compare against the needle, eg (in ruby)
digits_only = lambda{ |n| n.gsub /[^\d]/, '' }
needle = digits_only[input_phone_number]
haystack.map(&digits_only).include?(needle)
The catch is, I need to do this in MySQL. It has a host of string functions, none of which really seem to do what I want.
Currently I can think of 2 ‘solutions’
- Hack together a franken-query of
CONCAT
andSUBSTR
- Insert a
%
between every character of the needle ( so it’s like this:%0%7%1%2%3%4%5%6%
)
However, neither of these seem like particularly elegant solutions.
Hopefully someone can help or I might be forced to use the %%%%%% solution
Update: This is operating over a relatively fixed set of data, with maybe a few hundred rows. I just didn’t want to do something ridiculously bad that future programmers would cry over.
If the dataset grows I’ll take the ‘phoneStripped’ approach. Thanks for all the feedback!
could you use a "replace" function to strip out any instances of "(", "-" and " ",
I’m not concerned about the result being numeric.
The main characters I need to consider are +
, -
, (
, )
and space
So would that solution look like this?
SELECT * FROM people
WHERE
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(phonenumber, '('),')'),'-'),' '),'+')
LIKE '123456'
Wouldn’t that be terribly slow?
Solution
This looks like a problem from the start. Any kind of searching you do will require a table scan and we all know that’s bad.
How about adding a column with a hash of the current phone numbers after stripping out all formatting characters. Then you can at least index the hash values and avoid a full blown table scan.
Or is the amount of data small and not expected to grow much?
Then maybe just sucking all the numbers into the client and running a search there.
Answered By – John Dyer
Answer Checked By – Dawn Plyler (BugsFixing Volunteer)