I have a mysql query which combines data from 3 tables, which I’m calling "first_table", "second_table", and "third_table" as shown below.
This query consistently shows up in the MySQL slow query log, even though all fields referenced in the query are indexed, and the actual amount of data in these tables is not large (< 1000 records, except for "third_table" which has more like 10,000 records).
I’m trying to determine if there is a better way to structure this query to achieve better performance, and what part of this query is likely to be the most likely culprit for causing the slowdown.
Please note that "third_table.placements" is a JSON field type. All "label" fields are varchar(255), "id" fields are primary key integer fields, "sample_img" is an integer, "guid" is a string, "deleted" is an integer, and "timestamp" is a datetime.
SELECT DISTINCT first_table.id, first_table.label, (SELECT guid FROM second_table WHERE second_table.id = first_table.sample_img) AS guid, Count(third_table.id) AS related_count, Sum(Json_length(third_table.placements)) AS placements_count FROM first_table LEFT JOIN third_table ON Json_overlaps(third_table.placements, Cast(first_table.id AS CHAR)) WHERE first_table.deleted IS NULL AND third_table.deleted IS NULL AND Unix_timestamp(third_table.timestamp) >= 1647586800 AND Unix_timestamp(third_table.timestamp) < 1648191600 GROUP BY first_table.id ORDER BY Lower(first_table.label) ASC LIMIT 0, 1000
The biggest problem is that these are not sargable:
WHERE ... Unix_timestamp(third_table.timestamp) < 1648191600 ORDER BY Lower(first_table.label)
That is, don’t hide a potentially indexed column inside a function call. Instead:
WHERE ... third_table.timestamp < FROM_UNIXTIME(1648191600)
and use a case insensitive
first_table.label. That is any collation ending in
_ci. (Please provide
SHOW CREATE TABLE so I can point that out, and to check the vague "all fields are indexed" — That usually indicates not knowing the benefits of "composite" indexes.)
Json_overlaps(...) is probably also not sargable. But it gets trickier to fix. Please explain the structure of the json and the types of
Do you really need 1000 rows in the output? That is quite large for "pagination".
How big are the tables? UUIDs/GUIDs are notorious when the tables are too big to be cached in RAM.
It is possibly never useful to have both
SELECT DISTINCT and
GROUP BY. Removing the
DISTINCT may speed up the query by avoiding an extra sort.
Do you really want
LEFT JOIN, not just
JOIN? (I don’t understand the query enough to make a guess.)
After you have fixed most of those, and if you still need help, I may have a way to get rid of the
GROUP BY by adding a ‘derived’ table. Later. (Then I may be able to address the "json_overlaps" discussion.)
EXPLAIN SELECT ...
Answered By – Rick James
Answer Checked By – Marilyn (BugsFixing Volunteer)