[SOLVED] Counts of numbers of date for service use in each month

Issue

I’m currently re-arranging a health service data. My data frame includes the start and end dates of service use for each individuals

id <- c("A", "A", "B")
start <- c("2018-04-01", "2019-04-02", "2018-09-01")
end <- c("2019-04-01", "2019-04-05", "2018-09-02")
df <- data.frame(id, start, end)

 id        start          end
  A    2018-04-01   2019-04-01
  A    2019-04-02   2019-04-05
  B    2018-09-01   2018-09-02

I want to do the following things: (1) calculate the number of dates in each month for each service use; (2) calculate dates of service use for each individual; (3) construct new columns for all possible months; and (4) generate a new data frame. The ultimate goal is to construct the following data frame:

 id  2018_Jan 2018_Feb 2018_Mar 2018_Apr 2018_May 2018_Jun ... 2018_Sep ... 2019_Sep
  A     0        0         0        30       31       31   ...     30   ...     1
  B     0        0         0         0        0        0   ...      1   ...     0

The lubridate package and function command should be helpful in this. My question is similar to this post Count the number of days in each month of a date range, where it counted the number of days in each month. However, I’m not sure how to apply it to formulate the data frame that I want.

I will be really grateful for your help on this.

Solution

Here’s a {tidyverse} solution.

  1. Use dplyr::summarize() and seq() to generate the full range of dates for each observation.
    • I include end - 1 in seq() to not include the end date in the count, consistent with your example.
  2. Convert these to months using lubridate::floor_date(unit = "month") (technically, changes each date to the first of the month).
  3. dplyr::count() up month-days for each id.
  4. Because you want columns for months with no observations in your output, I wrote a function to add unobserved months based on tidyr::complete().
  5. Finally, tidyr::pivot_wider() to get a column for each month.
library(tidyverse)
library(lubridate)

complete_months <- function(.data, month, ..., fill = list()) {
  month <- pull(.data, {{ month }})
  firstday <- floor_date(min(month, na.rm = TRUE), unit = "year")
  lastday <- ceiling_date(max(month, na.rm = TRUE), unit = "year") - 1
  allmonths <- seq(firstday, lastday, by = "month")
  complete(.data, month = allmonths, ..., fill = fill)
}

month_counts <- df %>%
  mutate(across(start:end, ymd)) %>%
  group_by(id, obs = row_number()) %>%
  summarize(
    # use end - 1 in seq() to omit end date from count
    month = floor_date(seq(start, end - 1, by = 1), unit = "month"),
    .groups = "drop"
  ) %>% 
  count(month, id) %>% 
  complete_months(month, id, fill = list(n = 0)) %>% 
  mutate(month = strftime(month, "%Y_%b")) %>% 
  pivot_wider(
    names_from = month,
    values_from = n
  )

month_counts

# # A tibble: 2 x 25
#   id    `2018_Jan` `2018_Feb` `2018_Mar` `2018_Apr` `2018_May` `2018_Jun`
#   <chr>      <int>      <int>      <int>      <int>      <int>      <int>
# 1 A              0          0          0         30         31         30
# 2 B              0          0          0          0          0          0
# # ... with 18 more variables: `2018_Jul` <int>, `2018_Aug` <int>,
# #   `2018_Sep` <int>, `2018_Oct` <int>, `2018_Nov` <int>, `2018_Dec` <int>,
# #   `2019_Jan` <int>, `2019_Feb` <int>, `2019_Mar` <int>, `2019_Apr` <int>,
# #   `2019_May` <int>, `2019_Jun` <int>, `2019_Jul` <int>, `2019_Aug` <int>,
# #   `2019_Sep` <int>, `2019_Oct` <int>, `2019_Nov` <int>, `2019_Dec` <int>

Answered By – zephryl

Answer Checked By – Pedro (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published.