Issue
Im trying to do report system by range date (from/to date) for products (around 30k products that will repetitive record in DB) viewed/clicked by users.
Each time user click on product i record product_id and date on single row in Database. My problem is coming when i have to select and display report because table quickly grow up to 4 millions in 2 months and i have to keep record up to 6 months.
My question is there any better way to optimize the query or the way i record them?
DB Table
CREATE TABLE `product_view` (
`id` int(11) NOT NULL,
`product_id` int(11) NOT NULL,
`date_create` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `product_view`
ADD PRIMARY KEY (`id`),
ADD KEY `product_id` (`product_id`) USING BTREE;
My select query without range date take around 50 seconds to pull results
SELECT SQL_NO_CACHE pd.name, pv.product_id, p.model, COUNT(pv.id) as total
FROM product_view pv
LEFT JOIN product p ON p.product_id = pv.product_id
LEFT JOIN product_description pd ON pd.product_id = pv.product_id
WHERE pv.product_id > 0
GROUP BY pv.product_id
ORDER BY total
DESC LIMIT 0,20
Query EXAMPlE
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE pv range product_id product_id 4 NULL 1647717 Using where; Using index; Using temporary; Using filesort
1 SIMPLE p eq_ref PRIMARY PRIMARY 4 test.pv.product_id 1
1 SIMPLE pd ref PRIMARY,product_id PRIMARY 4 test.pv.product_id 1
Query with range date
SELECT SQL_NO_CACHE pd.name, pv.product_id, p.model, COUNT(pv.id) as total
FROM product_view pv
LEFT JOIN product p ON p.product_id = pv.product_id
LEFT JOIN product_description pd ON pd.product_id = pv.product_id
WHERE pv.product_id > 0
AND DATE(pv.date_create) >= '2021-07-25'
AND DATE(pv.date_create) <= '2022-03-10'
GROUP BY pv.product_id
ORDER BY total DESC LIMIT 0,20
Solution
Try rewriting the query as a correlated query:
select p.product_id, p.model, pd.name, (
select count(*)
from product_view as pv
where pv.product_id = p.product_id
and pv.date_create >= '2021-07-25'
and pv.date_create < '2022-03-10' + interval 1 day
) as total
from product as p
left join product_description as pd on p.product_id = pd.product_id
where exists (
select 1
from product_view as pv
where pv.product_id = p.product_id
and pv.date_create >= '2021-07-25'
and pv.date_create < '2022-03-10' + interval 1 day
-- this is a far more optimized version for dates used in your op
)
order by total desc
limit 0, 20
This does not involve grouping so it should be faster than your original query. If date filter is not required then remove the where exists
part and and pv.date_create ...
from the count sub-query.
Secondly, I don’t see any useful indexes in the explain. You should try the following indexes:
create index ix1 on product_view (product_id, date_create)
-- should be (i) good for joining (ii) "covers" the date column
Answered By – Salman A
Answer Checked By – Cary Denson (BugsFixing Admin)