Issue
I have a data frame df
in which for each column I want to calculate what share of occurrences also occur in another column. Each row of occurrences has a weight so ideally I would like to get a weighted share.
A <- c(0, 1, 0, 0, 1, 0, 1, 1, 1, 0)
B <- c(0, 1, 0, 1, 1, 0, 0, 0, 0, 0)
C <- c(0, 0, 0, 1, 1, 0, 0, 0, 0, 1)
D <- c(1, 0, 0, 1, 1, 0, 0, 0, 0, 0)
weight <- c(0.5, 1, 0.2, 0.3, 1.4, 1.5, 0.8, 1.2, 1, 0.9)
df <- data.frame(A, B, C, D, weight)
I was trying to calculate it for each column pair this way:
#total weight of occurences in A
wgt_A <- df%>%
filter(A == 1)%>%
summarise(weight_A = sum(weight))%>%
select(weight_A)
#weighted share of occurrences in A that also occur in B
wgt_A_B <- df%>%
filter(A == 1, B == 1)%>%
summarise(weight_A_B = sum(weight))%>%
select(weight_A_B)
Result_1 <- wgt_A_B / wgt_A
I would want to end up with six results in total for all combinations of the 4 columns. However, for this I would need to replicate this dplyr pipe a lot of times and my actual dataset has 20+ columns like this. Is there a more efficient/quicker way to do this with apply/sapply or some kind of loop where I can also select for which columns I want to perform this?
I’m new to R and stackoverflow so please let me know (and excuse me) if I’m doing/saying anything stupid
Solution
We may use combn
to do the combinations in base R
out <- combn(df[1:4], 2, FUN = function(x)
sum(df$weight[x[[1]] & x[[2]]])/ sum(df$weight[as.logical(x[[1]])]) )
names(out) <- combn(names(df)[1:4], 2, FUN = paste, collapse = "_")
-output
> out
A_B A_C A_D B_C B_D C_D
0.4444444 0.2592593 0.2592593 0.6296296 0.6296296 0.6538462
Answered By – akrun
Answer Checked By – Katrina (BugsFixing Volunteer)