[SOLVED] Make plyr::ddply code compatible with dplyr-equivalent custom function

Issue

I am attempting to adapt a long function (rcompanion::groupwiseMean) to use dplyr instead of plyr::ddply in its code to avoid dependency on the now deprecated plyr package.

I would like to define a custom ddply2 function, taking the same arguments as the original plyr function, but with dplyr under the hood. The benefit would be to only redefine the function once at the top of the existing long function/script without changing anything else. My attempts have failed so far. Demo below.

I have been using this resource: plyr::ddply equivalent in dplyr

Original plyr:ddplyr call

data <- mtcars
var <- "mpg"
group <- c("cyl", "am")

# Original plyr:ddply-fed function:
fun.y <- function(x, idx) { length(x[, idx]) }

# Original plyr:ddply call:
plyr::ddply(.data = data, .variables = group, var, .fun = fun.y)
#>   cyl am V1
#> 1   4  0  3
#> 2   4  1  8
#> 3   6  0  4
#> 4   6  1  3
#> 5   8  0 12
#> 6   8  1  2

This is the function that I CANNOT rewrite

fun.y <- function(x, idx) { length(x[, idx]) }

However this is just an example. Here are some other functions I will need working with ddply2:

fun.z <- function(x, idx) { as.numeric(mean(x[, idx], trim = trim, na.rm = na.rm)) }
fun.w <- function(x, idx) {
      mean(boot(x[, idx], function(y, j) mean(y[j], trim = trim,
                                              na.rm = na.rm), R = R, ...)$t[, 1])
}

Now let’s proceed to the desired ddply2 call, which I am allowed to modify any way I want. However it must take the same arguments as plyr::ddply.

Attempt to rewrite plyr:ddply as dpply2

library(dplyr)

ddply2 <- function(.data, .variables, var, .fun) {
  .data %>%
    group_by(across({{.variables}})) %>%
    do(.fun(., {{var}}))
}

ddply2(.data = data, .variables = group, var, .fun = fun.y)
# Error in `do()`:
# ! Results 1, 2, 3, 4, 5, 6 must be data frames, not integer.

Edit

Again, I cannot rewrite fun.y, fun.z, or fun.w, only ddply2. So solutions based on summarize() or count() will not work as they are not generalizable to other functions. plyr:ddplyr did not require summarize() or count(), that’s the idea.

Solution

After some discussion I now understand that what is desired is to rewrite this function using dplyr rather than plyr such that for inputs such as those listed in the inputs section below it gives the same result.

dd <- function(data, group, var, fun) 
  plyr::ddply(.data = data, .variables = group, var, .fun = fun)

To do that the new function can use group_by with either summarize or group_modify. dd1 below uses the first and dd2 uses the second. Use whichever you prefer.

Note that the way fun.z was written it assumes a data frame and not a tibble (because data frames return a vector if there is only one column whereas tibble returns another tibble) so we use as.data.frame to ensure that. Also plyr returns a data frame and at the end of dd1 and dd2 we convert the tibble produced to data frame to ensure that the result is identical.

dd1 <- function(data, group, var, fun)
  data %>% 
    group_by(across(all_of(group))) %>%
    summarize(V1 = fun(as.data.frame(cur_data()), var), .groups = "drop") %>%
    as.data.frame

dd2 <- function(data, group, var, fun)
  data %>%
    group_by(across(all_of(group))) %>%
    group_modify(~ { data.frame(V1 = fun(as.data.frame(.), var)) }) %>%
    ungroup %>%
    as.data.frame

Now test it out

# inputs - start #

data <- mtcars
trim <- 0
na.rm <- FALSE
var <- "mpg"
group <- c("cyl", "am")

fun.z <- function(x, idx) { 
  as.numeric(mean(x[, idx], trim = trim, na.rm = na.rm))
}

# inputs - end #

library(dplyr)

dd.out <- dd(data, group, var, fun.z) # plyr
dd1.out <- dd1(data, group, var, fun.z)
dd2.out <- dd2(data, group, var, fun.z)

identical(dd1.out, dd.out)
## [1] TRUE

identical(dd2.out, dd.out)
## [1] TRUE

Answered By – G. Grothendieck

Answer Checked By – Dawn Plyler (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published.