[SOLVED] Function by group returning NAs

Issue

I have the following data frame that I am trying to make a function for:

df<- structure(list(BLG = c(37.037037037037, 12.0603015075377, 93.5593220338983, 
3.96563119629874, 77.634011090573, 71.608040201005, 3.96563119629874, 
119.775421085465, 44.8765893792072), GSF = c(0, 0, 0, 0, 11.090573012939, 
0, 0, 0, 0), LMB = c(66.6666666666667, 24.1206030150754, 40.6779661016949, 
31.7250495703899, 73.9371534195933, 67.8391959798995, 31.7250495703899, 
22.4578914535246, 31.413612565445), YLB = c(0, 0, 0, 0, 14.7874306839187, 
0, 0, 0, 0), BLC = c(3.7037037037037, 0, 4.06779661016949, 7.93126239259749, 
7.39371534195933, 11.3065326633166, 7.93126239259749, 3.74298190892077, 
22.4382946896036), WHC = c(7.40740740740741, 0, 0, 0, 0, 0, 0, 
7.48596381784155, 4.48765893792072), RSF = c(0, 0, 0, 0, 0, 0, 
0, 0, 4.48765893792072), CCF = c(3.7037037037037, 0, 8.13559322033898, 
0, 0, 0, 0, 0, 0), BLB = c(0, 0, 0, 0, 0, 0, 0, 0, 0), group = c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L)), row.names = c(NA, -9L), class = c("data.table", 
"data.frame"))

Function

p_true<- c(83, 10, 47, 8, 9, 6, 12, 5, 8) #true value for each column 

estimate2 = function(df) {
  
  y_est2 = df
  
  sqrt(mean((y_est2-p_true)^2))/p_true*100
}


final<- df %>%
  group_by(group) %>%
  group_modify(~ as.data.frame.list(estimate2(.)))

The final output should be a 3×9 data frame: one value for each column per group. Can get the intended output format with plyr::ddply(df, .(group), estimate2)

Even without trying to run the function across groups with estimate2(df) (and taking out the group column) it still says argument is not logical or numeric; returning NA.

I’m not sure why though because I’ve run functions very similar to this one that only differ slightly by the actual equation inside and they work fine.

Anyone know where I’m going wrong?

Solution

The problem is the mean command. Looking at the help for it with ?mean it says:

x
An R object. Currently there are methods for numeric/logical vectors and date, date-time and time interval objects. Complex vectors are allowed for trim = 0, only.

But you want to calculate the mean for three rows of a data frame.

I’m not entirely sure if the following is what you want, but you can unlist your data frame so that it is a vector. The division by p_true is then recycled to the length of this vector. You can then combine the result again into a data frame:

p_true<- c(83, 10, 47, 8, 9, 6, 12, 5, 8) #true value for each column 

estimate2 = function(df) {
  
  y_est2 = df
  
  return_df <- as.data.frame(t(sqrt(mean(unlist((y_est2-p_true)^2)))/p_true*100))
  names(return_df) <- names(y_est2)
  return(return_df)
}

final<- df %>%
  group_by(group) %>%
  group_modify(~ as.data.frame.list(estimate2(.)))

This returns:

# A tibble: 3 x 10
# Groups:   group [3]
  group   BLG   GSF   LMB   YLB   BLC   WHC   RSF   CCF   BLB
  <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1  38.7  321.  68.3  401.  357.  535.  268.  642.  401.
2     2  45.9  381.  81.1  477.  424.  635.  318.  763.  477.
3     3  45.6  378.  80.4  473.  420.  630.  315.  756.  473.

Answered By – deschen

Answer Checked By – David Goodson (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published.