[SOLVED] Optimize a count query or restructure the design?

Issue

Im trying to do report system by range date (from/to date) for products (around 30k products that will repetitive record in DB) viewed/clicked by users.

Each time user click on product i record product_id and date on single row in Database. My problem is coming when i have to select and display report because table quickly grow up to 4 millions in 2 months and i have to keep record up to 6 months.

My question is there any better way to optimize the query or the way i record them?

DB Table

CREATE TABLE `product_view` (
    `id` int(11) NOT NULL,
    `product_id` int(11) NOT NULL,
    `date_create` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

ALTER TABLE `product_view`
    ADD PRIMARY KEY (`id`),
    ADD KEY `product_id` (`product_id`) USING BTREE;

My select query without range date take around 50 seconds to pull results

SELECT SQL_NO_CACHE pd.name, pv.product_id, p.model, COUNT(pv.id) as total
FROM product_view pv
    LEFT JOIN product p ON p.product_id = pv.product_id
    LEFT JOIN product_description pd ON pd.product_id = pv.product_id
WHERE pv.product_id > 0
GROUP BY pv.product_id
ORDER BY total
DESC LIMIT 0,20

Query EXAMPlE

id  select_type     table   type      possible_keys     key           key_len      ref                      rows        Extra
1   SIMPLE           pv     range   product_id           product_id     4         NULL                      1647717     Using where; Using index; Using temporary; Using filesort
1   SIMPLE           p     eq_ref   PRIMARY              PRIMARY        4         test.pv.product_id        1
1   SIMPLE           pd     ref     PRIMARY,product_id   PRIMARY        4         test.pv.product_id        1

Query with range date

SELECT SQL_NO_CACHE pd.name, pv.product_id, p.model, COUNT(pv.id) as total
FROM product_view pv
    LEFT JOIN product p ON p.product_id = pv.product_id
    LEFT JOIN product_description pd ON pd.product_id = pv.product_id
WHERE pv.product_id > 0
    AND DATE(pv.date_create) >= '2021-07-25'
    AND DATE(pv.date_create) <= '2022-03-10'
GROUP BY pv.product_id
ORDER BY total DESC LIMIT 0,20

Solution

Try rewriting the query as a correlated query:

select p.product_id, p.model, pd.name, (
    select count(*)
    from product_view as pv
    where pv.product_id = p.product_id
    and pv.date_create >= '2021-07-25'
    and pv.date_create <  '2022-03-10' + interval 1 day
) as total
from product as p
left join product_description as pd on p.product_id = pd.product_id
where exists (
    select 1
    from product_view as pv
    where pv.product_id = p.product_id
    and pv.date_create >= '2021-07-25'
    and pv.date_create <  '2022-03-10' + interval 1 day
    -- this is a far more optimized version for dates used in your op
)
order by total desc
limit 0, 20

This does not involve grouping so it should be faster than your original query. If date filter is not required then remove the where exists part and and pv.date_create ... from the count sub-query.

Secondly, I don’t see any useful indexes in the explain. You should try the following indexes:

create index ix1 on product_view (product_id, date_create)
-- should be (i) good for joining (ii) "covers" the date column

Answered By – Salman A

Answer Checked By – Cary Denson (BugsFixing Admin)

Leave a Reply

Your email address will not be published. Required fields are marked *