[SOLVED] How to query date range date by date as rows

Issue

My table presences holdes date ranges (start and end) together with an employee and location id, so that I am able to store an employee’s presence at a location in a specific date range:

CREATE TABLE `presences` (
  id int(11) auto_increment not null,
  start timestamp null default null,
  end timestamp null default null,
  employee int(11) null default null,
  location int(11) null default null,
  PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

/* Example data: */
INSERT INTO `presences` (start, end, employee, location) VALUES
("2022-04-05 00:00:00", "2022-04-05 00:00:00", 1, 1),
("2022-04-06 00:00:00", "2022-04-07 00:00:00", 1, 1),
("2022-04-07 00:00:00", "2022-04-10 00:00:00", 1, 1),
("2022-01-01 00:00:00", "2022-12-31 00:00:00", 1, 1);

In my app, I am querying the table with a given date range and I receive rows where at least one day of the tables date range fits into the provided date range:

SELECT *
FROM `presences`
WHERE DATE(start) <= '2022-04-06' AND DATE(end) >= '2022-04-04'

Is there a way to split the resulting rows date range? E.g. the provide range is 2022-04-04 to 2022-04-06 and a resulting row has a date range 2022-01-01 to 2022-12-31. The range of the result that fits in provide date range is 2022-04-04 to 2022-04-06, thus I want to split that range in a way, that there is not one resulting row, but one row per date in range like:

id | employee | location |    date
--------------------------------------
 4 |     1    |     1    | 2022-04-04 
 4 |     1    |     1    | 2022-04-05
 4 |     1    |     1    | 2022-04-06  

If the provided range is 2022-04-01 to 2022-04-06 and my resulting row is 2022-04-05 to 2022-04-06, I want to see

id | employee | location |    date
--------------------------------------
 X |     Y    |     Z    | 2022-04-05
 X |     Y    |     Z    | 2022-04-06  

Fiddle: http://sqlfiddle.com/#!9/0d4e64/1

Solution

With a suitable source of dates (as given by answers to "Get a list of dates between two dates" or generate days from date range), the dates can be filtered based on the query date range (e.g. ‘2022-04-04’ and ‘2022-04-06’) and joined to the rows in the results. Each date can be joined to each row; to limit these to dates in the interval (start, end) for each result row from presences, simply join between the start & end timestamps:

SET @after := '2022-04-04',
    @before := '2022-04-06';

SELECT *
  FROM `presences` AS p
    JOIN (SELECT * FROM `days` WHERE `day` BETWEEN @after AND @before)

      AS d ON d.`day` BETWEEN Date(p.`start`) AND Date(p.`end`) -- The core of the solution

  WHERE Date(start) <= @before AND Date(end) >= @after
  ORDER BY id, d.`day`
  ;

The above uses a sub-select to filter the days. This can also be done by filtering in the join, or the WHERE clause (as can also be seen in a fiddle):

SELECT p.id, p.employee, p.location, d.day
  FROM `presences` AS p
    JOIN `days` AS d ON d.`day` BETWEEN Date(p.`start`) AND Date(p.`end`)
  WHERE Date(`start`) <= @before AND Date(`end`) >= @after
    AND d.day BETWEEN @after AND @before
  ORDER BY id, d.`day`
  ;
  
  
SELECT p.id, p.employee, p.location, d.day
  FROM `presences` AS p
    JOIN `days` AS d 
      ON    d.`day` BETWEEN @after AND @before 
        AND d.`day` BETWEEN Date(p.`start`) AND Date(p.`end`)
  WHERE Date(`start`) <= @before AND Date(`end`) >= @after
  ORDER BY id, d.`day`
  ;

Which to use is a matter of readability and potentially efficiency. For the example queries above, an EXPLAIN of each shows the same execution plan, so there shouldn’t be any efficiency differences:

+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                                           |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------------------------------------------------+
|  1 | SIMPLE      | p     | NULL       | ALL   | NULL          | NULL | NULL    | NULL |    4 |      100 | Using where; Using temporary; Using filesort                    |
|  1 | SIMPLE      | d     | NULL       | range | day           | day  | 4       | NULL |    4 |      100 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------------------------------------------------+

A Note on Column Types

If the start and end columns will actually only ever hold dates (i.e. the time portion will always be 00:00:00), they can be redefined as the DATE type, and the calls to the Date() function can be removed from the query, simplifying it. You could get away with not redefining the columns and merely removing the calls to Date(), as the various date-time columns are comparable as-is, though this may cause problems should any of the TIMESTAMP columns have a time other than 00:00:00.

The primary advantage of removing functions is to allow indices (here, on start and end) to be used. The example schema didn’t index start and end, so there wouldn’t be any difference in execution time or plans. If, however, you add an index:

CREATE INDEX presence_interval ON presences (`start`, `end`);

then the execution plans for the queries with the calls to Date() removed are all:

+----+-------------+-------+------------+-------+-------------------+-------------------+---------+------+------+----------+-----------------------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys     | key               | key_len | ref  | rows | filtered | Extra                                                           |
+----+-------------+-------+------------+-------+-------------------+-------------------+---------+------+------+----------+-----------------------------------------------------------------+
|  1 | SIMPLE      | p     | NULL       | range | presence_interval | presence_interval | 5       | NULL |    2 |    33.33 | Using index condition; Using temporary; Using filesort          |
|  1 | SIMPLE      | d     | NULL       | range | day               | day               | 4       | NULL |    4 |      100 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+-------+-------------------+-------------------+---------+------+------+----------+-----------------------------------------------------------------+

Note there are fewer rows from the presences column, and the filtered percentage is cut by 2/3.

Answered By – outis

Answer Checked By – Mary Flores (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published.