[SOLVED] Replicating dplyr pipe structure with apply family or loop

Issue

I have a data frame df in which for each column I want to calculate what share of occurrences also occur in another column. Each row of occurrences has a weight so ideally I would like to get a weighted share.

A <- c(0, 1, 0, 0, 1, 0, 1, 1, 1, 0)
B <- c(0, 1, 0, 1, 1, 0, 0, 0, 0, 0)
C <- c(0, 0, 0, 1, 1, 0, 0, 0, 0, 1)
D <- c(1, 0, 0, 1, 1, 0, 0, 0, 0, 0)
weight <- c(0.5, 1, 0.2, 0.3, 1.4, 1.5, 0.8, 1.2, 1, 0.9)
df <- data.frame(A, B, C, D, weight)

I was trying to calculate it for each column pair this way:

#total weight of occurences in A
wgt_A <- df%>%
  filter(A == 1)%>%
  summarise(weight_A = sum(weight))%>%
  select(weight_A)

#weighted share of occurrences in A that also occur in B
wgt_A_B <- df%>%
  filter(A == 1, B == 1)%>%
  summarise(weight_A_B = sum(weight))%>%
  select(weight_A_B)

Result_1 <- wgt_A_B / wgt_A 

I would want to end up with six results in total for all combinations of the 4 columns. However, for this I would need to replicate this dplyr pipe a lot of times and my actual dataset has 20+ columns like this. Is there a more efficient/quicker way to do this with apply/sapply or some kind of loop where I can also select for which columns I want to perform this?

I’m new to R and stackoverflow so please let me know (and excuse me) if I’m doing/saying anything stupid

Solution

We may use combn to do the combinations in base R

out <- combn(df[1:4], 2, FUN = function(x)
    sum(df$weight[x[[1]] & x[[2]]])/ sum(df$weight[as.logical(x[[1]])]) )
names(out) <- combn(names(df)[1:4], 2, FUN = paste, collapse = "_")

-output

> out
      A_B       A_C       A_D       B_C       B_D       C_D 
0.4444444 0.2592593 0.2592593 0.6296296 0.6296296 0.6538462 

Answered By – akrun

Answer Checked By – Katrina (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *