[SOLVED] What is the fastest way of finding EAN Number doublettes in a SQL table?

Issue

In SQL I have an EanTbl (EAN varchar(13) (PK), ProductID int) with millions of EAN barcode numbers as string. Many of which are in reality the same but written differently, i.e. with one or more leading zeros. One ProductID can be assigned to many EAN Barcodes numbers. (historically grown, I would have created the EAN column as Long from the beginning)

From now on, when a EAN barcode is inserted, I want to avoid the creation of additional doublettes by checking if an already existing EAN matches the new one.

Example EanTbl:

        EAN | ProductID
==========================
   123456789| 987           \
  0123456789| 987           - In reality the same EAN
 00123456789| 987           /

Now, when a user Input wants me to insert a new value of for instance 000123456789 I now want the most efficient query possible in T-SQL to discover that there are already matching EAN Barcodes assigned to this ProductID.

I already tried the different flavours of CONVERT() but as you can imagine the performance is horrible. This check needs to happens several hundreds times a minute on a table with 50+ millions of rows.

Thanks in advance for the suggestions.

Solution

Thanks to the suggestions. In the Long term I will reorganize the table as most of you suggested but for the moment I found the fastest way to do what I want is something like that:

SELECT EanCode, ProductID 
FROM EanCodes 
WHERE EanCode = @ean 
    OR EanCode = '0'[email protected] 
    OR EanCode = '00'[email protected] 
    OR EanCode = '000'[email protected] 
    OR EanCode = '0000'[email protected] 
    OR EanCode = '00000'[email protected] 
    OR EanCode = '000000'[email protected]

with @ean being the string parameter of course

It seemed counter-intuitive at first but it is indeed the fastest way.

Answered By – KarmaEDV

Answer Checked By – Katrina (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *