[SOLVED] How to maintain the sort at insert-select scripts?

Issue

We have a table called tblINUser, which has many records and occupies a vast amount of space. In an attempt to reduce the amount of used space, we create a table called tblINUserSortByFilter which contains all the possible text values of this field and we create a foreign key in tblINUser that numerically references this value. We have several databases, because this database is sharded, so it would be great to sort the values similarly accross databases. This was the first attempt:

CREATE TABLE MC.tblINUserSortByFilterType(
    pkINUserSortByFilterTypeID SMALLINT(6) PRIMARY KEY AUTO_INCREMENT,
    SortByFilter varchar(45) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'first',
    INDEX(SortByFilter)
);

INSERT INTO MC.tblINUserSortByFilterType(SortByFilter)
SELECT DISTINCT SortByFilter
FROM MC.tblINUser
ORDER BY SortByFilter = 'first';

ALTER TABLE MC.tblINUser
ADD COLUMN fkINUserSortByFilterTypeID SMALLINT(6) DEFAULT 1,
ADD INDEX (fkINUserSortByFilterTypeID);

UPDATE MC.tblINUser INUser
JOIN MC.tblINUserSortByFilterType INUserSortByFilterType
ON INUser.SortByFilter = INUserSortByFilterType.SortByFilter
SET INUser.fkINUserSortByFilterTypeID = INUserSortByFilterType.pkINUserSortByFilterTypeID;

ALTER TABLE MC.tblINUser
DROP COLUMN SortByFilter;

You may argue, correctly that the sort has the only criteria, which is ORDER BY SortByFilter = 'first' and a clause of ORDER BY SortByFilter = 'first', SortByFilter would be an obvious improvement. This would be a correct criticism, yet, even though we may have a chaotic behavior starting from the second record, it would be reasonable to expect that the very first inserted record would be first, yet, unfortunately, this is not the case. Running select * from MC.tblINUserSortByFilterType; yields

+----------------------------+----------------------------+                                       
| pkINUserSortByFilterTypeID | SortByFilter               |
+----------------------------+----------------------------+                           
|                          5 | first                      |                                                                                                                                                 
|                          4 | first-ASC                  |                                                                                                                                                 
|                          3 | last                       |                             
|                          1 | none                       |                  
|                          2 | StatTeacher.IsActive DESC  |                                                                                                                                                
+----------------------------+----------------------------+

as we can see, not even this expectation is met, since first has an id of 5. An improvement is achieved by changing the inserts to

INSERT INTO MC.tblINUserSortByFilterType(SortByFilter)
SELECT DISTINCT SortByFilter
FROM MC.tblINUser
WHERE SortByFilter = 'first';

INSERT INTO MC.tblINUserSortByFilterType(SortByFilter)
SELECT DISTINCT SortByFilter
FROM MC.tblINUser
WHERE SortByFilter <> 'first';

and then the result of the same selection we get this result:

+----------------------------+----------------------------+
| pkINUserSortByFilterTypeID | SortByFilter               |
+----------------------------+----------------------------+
|                          1 | first                      |
|                          3 | first-ASC                  |
|                          4 | last                       |
|                          2 | none                       |
|                          5 | StatTeacher.IsActive DESC  |
+----------------------------+----------------------------+
5 rows in set (0.00 sec)

as we can see, first is correctly receiving a value of 1. Yet, it seems that if we run the same inserts over different copies of the database, the order of subsequent rows might be unreliable. So, how could we ensure that the records would be inserted in the exact order that the following query yields?

SELECT DISTINCT SortByFilter
FROM MC.tblINUser
WHERE SortByFilter = 'first', SortByFilter;

I know that we can solve this by

  • using a cursor for the insert
  • looping the records received
  • inserting them individually

But that would have as many insert statements as the number of records the above query yields. Is there a way to achieve the same with a single command?

Solution

it would be reasonable to expect that the very first inserted record would be first

I don’t think so. You used ORDER BY SortByFilter = 'first' which returns 0 for all values except ‘first’, followed by 1 for ‘first’. The value 1 sorts after the value 0, so the entry ‘first’ ends up being last. The other values end up sorting more or less randomly.

Demo:

mysql> create table mytable (SortByFilter varchar(64));
Query OK, 0 rows affected (0.02 sec)

mysql> insert into mytable values ('first'), ('first-ASC'), 
  ('last'), ('none'), ('StatTeacher.IsActive DESC');
Query OK, 5 rows affected (0.01 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> select SortByFilter='first', SortByFilter from mytable 
  order by SortByFilter = 'first';
+----------------------+---------------------------+
| SortByFilter='first' | SortByFilter              |
+----------------------+---------------------------+
|                    0 | first-ASC                 |
|                    0 | last                      |
|                    0 | none                      |
|                    0 | StatTeacher.IsActive DESC |
|                    1 | first                     |
+----------------------+---------------------------+

I suggest do not rely on automatic sorting. Be specific about the sort order of every value. Here’s one way to do it:

mysql> select field(SortByFilter, 'first', 'first-ASC',
  'none', 'StatTeacher.IsActive DESC', 'last') AS SortOrder, 
  SortByFilter 
  from mytable order by SortOrder;
+-----------+---------------------------+
| SortOrder | SortByFilter              |
+-----------+---------------------------+
|         1 | first                     |
|         2 | first-ASC                 |
|         3 | none                      |
|         4 | StatTeacher.IsActive DESC |
|         5 | last                      |
+-----------+---------------------------+

Answered By – Bill Karwin

Answer Checked By – Clifford M. (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published.