[SOLVED] MySQL date grouping select statement

Issue

I have a MySQL 8.0.23 table called events with the following schema:

eventUID – integer NOT NULL

camtimestamp – MySQL datetime stamp

direction – string – either "In" or "Out"

propUID – integer

In a single SELECT statement, I am trying to determine by hour for the last 24 hours how many cars are "In" and how many are "Out". Here is what I am trying (does not yet have the 24 hour limit built in).

select camtimestamp,count(*) from events where direction ="In" and propUID = 7 group by year(camtimestamp),month(camtimestamp),day(camtimestamp),hour(camtimestamp);

And this is a sample of what I am getting.

2022-02-14 22:02:40 38
2022-02-14 21:56:56 15
2022-02-14 20:55:30 47
2022-02-14 19:59:18 51
2022-02-14 18:59:50 36
2022-02-14 17:52:04 10
2022-02-14 16:58:01 16
2022-02-14 15:59:00 36
2022-02-14 14:58:52 44

I also have a table called datehourlist with which I can join in my SELECT.

Sample data:

2019-05-01 00:00:00
2019-05-01 01:00:00
2019-05-01 02:00:00
2019-05-01 03:00:00
2019-05-01 04:00:00
2019-05-01 05:00:00
2019-05-01 06:00:00
2019-05-01 07:00:00
2019-05-01 08:00:00

Also:

mysql> select min(datehour) from datehourlist;
+---------------------+
| min(datehour)       |
+---------------------+
| 2019-05-01 00:00:00 |
+---------------------+
1 row in set (0.02 sec)

mysql> select max(datehour) from datehourlist;
+---------------------+
| max(datehour)       |
+---------------------+
| 2040-12-31 00:00:00 |
+---------------------+
1 row in set (0.02 sec)

datehourlist has every hour in it from May 1, 2019 until December 31, 2040.

This is a sample of what I really want from this:

Column 1 below is a rounded grouped timestamp (vs col 1 above being a non-rounded, actual timestamp)

Column 1 below does not skip an hour if there is no data from that hour.

Column 2 below is the "In" count for that hour.

Column 3 below is the "Out" count for that hour.

2019-05-02 06:00:00 5 10
2019-05-02 07:00:00 127 10
2019-05-02 08:00:00 0 0 
2019-05-02 09:00:00 115 10
2019-05-02 10:00:00 71 10
2019-05-02 11:00:00 147 10
2019-05-02 12:00:00 140 10

What SELECT statement should I use to get the output I need?

Also, how would I optimize that SELECT statement?

Within events, I have 500k events and growing by 100s everyday.

Thank you in advance for your help.

Thank you for a great solution so quickly.

SELECT dhl.datehour datehour,
       COALESCE(SUM(ev.direction = 'In'), 0) `In`,
       COALESCE(SUM(ev.direction = 'Out'), 0) `Out`
  FROM datehourlist dhl
  LEFT JOIN events  ev
       ON DATE_FORMAT(ev.camtimestamp, '%Y-%m-%d %H:00:00') = dhl.datehour
 WHERE ev.camtimestamp >= DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00') - INTERVAL 24 HOUR
   AND ev.camtimestamp <  DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00')
   AND ev.propUID = 7
 GROUP BY dhl.datehour;

Solution

First, we need a trunc_to_hour() function that takes an arbitrary DATETIME or TIMESTAMP value and gives back the beginning of its hour. That is this.

DATE_FORMAT(camtimestamp, '%Y-%m-%d %H:00:00') 

Second, we need a WHERE expression that can handle the most recent 24 hours. That is this.

WHERE camtimestamp >= DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00') - INTERVAL 24 HOUR
  AND camtimestamp <  DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00')

For the example timestamp 2021-03-14 16:04:30 this gives the following.

WHERE camtimestamp >= `2021-03-13 16:00:00`
  AND camtimestamp <  '2021-03-14 16:00:00`

That is, it chooses records for the most recent 24 complete clock hours. You may have to adjust this WHERE expression if you want the hours to date.

Third, we need conditional sums (for In and Out).

It happens that the expression direction = 'In' gives 1 when direction is In, 0 when direction is some other string (like Out), and NULL if direction itself is NULL. So

   SUM(direction='In') 

counts the rows meeting that criterion.

Fourth, when the SUM is null, we want to show zero. Like this.

   COALESCE(SUM(direction='In'),0)

Fifth, we can put it together something like this:

SELECT DATE_FORMAT(ev.camtimestamp, '%Y-%m-%d %H:00:00') datehour,
       COALESCE(SUM(ev.direction = 'In'), 0) `In`,
       COALESCE(SUM(ev.direction = 'Out'), 0) `Out`
  FROM events ev
 WHERE ev.camtimestamp >= DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00') - INTERVAL 24 HOUR
   AND ev.camtimestamp <  DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00')
   AND ev.propUID = 7
 GROUP BY DATE_FORMAT(ev.camtimestamp, '%Y-%m-%d %H:00:00')

That gives you your result set. But it still might be missing some hours if there are no records for those hours.

So, sixth, we can join that to your pre-existing hourly calendar table like this:

SELECT dhl.datehour datehour,
       COALESCE(SUM(ev.direction = 'In'), 0) `In`,
       COALESCE(SUM(ev.direction = 'Out'), 0) `Out`
  FROM datehourlist dhl
  LEFT JOIN events  ev
       ON DATE_FORMAT(ev.camtimestamp, '%Y-%m-%d %H:00:00') = dhl.datehour
 WHERE ev.camtimestamp >= DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00') - INTERVAL 24 HOUR
   AND ev.camtimestamp <  DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00')
   AND ev.propUID = 7
 GROUP BY dhl.datehour

And that should do it. (Not debugged.)

Answered By – O. Jones

Answer Checked By – Dawn Plyler (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published.