[SOLVED] Writing a function across multiple subgroups

Issue

I am trying to calculate a population parameter for multiple species within their respective sample sites. I have a sample of my df structured as:

Dataframe

df<- structure(list(waterbody = c("Homer", "Homer", "Homer", "Homer", 
"Homer", "Homer", "Homer", "Homer", "Homer", "Homer", "Homer", 
"Homer", "Homer", "Homer", "Homer", "Homer", "Homer", "Homer", 
"Homer", "Homer", "Homer", "Homer", "Homer", "Homer", "Homer", 
"Homer", "Homer", "Homer", "Homer", "Homer", "Homer", "Homer", 
"Homer", "Homer", "Homer", "Homer", "Homer", "Homer", "Homer", 
"Homer", "Homer", "Homer", "Homer", "Homer", "Homer", "Homer", 
"Homer", "Homer", "Homer", "Homer", "Homer"), sample_site = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L), species = c("LMB", "LMB", "BLG", "LMB", "BLG", "BLG", 
"BLG", "BLG", "BLG", "LMB", "LMB", "LMB", "LMB", "LMB", "BLG", 
"BLG", "LMB", "LMB", "BLG", "BLG", "LMB", "LMB", "LMB", "BLG", 
"BLG", "BLG", "BLG", "BLG", "BLG", "BLG", "BLG", "BLG", "LMB", 
"LMB", "LMB", "BLG", "LMB", "LMB", "LMB", "BLG", "LMB", "LMB", 
"LMB", "BLG", "LMB", "BLG", "LMB", "LMB", "BLG", "LMB", "BLG"
), length_mm = c(430L, 430L, 165L, 345L, 128L, 117L, 93L, 135L, 
161L, 402L, 347L, 450L, 477L, 255L, 115L, 91L, 445L, 335L, 119L, 
124L, 249L, 135L, 361L, 160L, 115L, 130L, 155L, 116L, 158L, 130L, 
126L, 158L, 500L, 330L, 150L, 90L, 333L, 404L, 343L, 150L, 285L, 
303L, 340L, 120L, 420L, 115L, 295L, 322L, 85L, 145L, 185L), stock = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 0, 1), quality = c(1, 1, 1, 1, 0, 0, 0, 0, 
1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 
0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 
1)), row.names = c(NA, -51L), class = "data.frame")

This is filtered down to just 2 species in two different sample sites, my full data frame having hundreds of sample sites and 20+ species. I want to write a function that sums the total number of quality individuals (represented by a ‘1’ in the column), and divide that by the total number of stock individuals (again, denoted by a ‘1’ in the column). Manually, this looks like:

a<- filter(df, waterbody=="Homer", sample_site==1, species=="LMB", quality==1)
b<- filter(df, waterbody=="Homer", sample_site==1, species=="LMB", stock==1)

(count(a))/(count(b))*100

Resulting in a value of 83.333 ((10 quality/12 stock)*100). However, I want to do this for each species within each sample site. So for sample sites 1 &2, there would be a value ranging from 0-100 for LMB and BLG.

I’m hoping to have the end result be a data frame stuctured as:

results<- structure(list(waterbody = c("Homer", "Homer", "Homer", "Homer", 
"Homer", "Homer"), transect = c(1L, 1L, 1L, 2L, 2L, 2L), species = c("BLC", 
"BLG", "LMB", "BLC", "BLG", "GSF"), psd = c(50, 31.58, 83.33, 
100, 33.33, 0)), row.names = c(NA, 6L), class = "data.frame")

The math that goes into the function is obviously pretty simple, the issues I’m running into is how to apply it to filtered data so that I am not counting, for example, the number of quality individuals across multiple sample sites.

Any help/insight would be greatly appreciated

Solution

Here is a dplyr solution:

library(dplyr)
df %>% 
  group_by(waterbody, sample_site, species) %>% 
  summarise(psd = (sum(quality==1)/sum(stock == 1))*100)
  waterbody sample_site species   psd
  <chr>           <int> <chr>   <dbl>
1 Homer               1 BLG      31.6
2 Homer               1 LMB      83.3
3 Homer               2 BLG      33.3
4 Homer               2 LMB      81.8

Answered By – TarJae

Answer Checked By – Marilyn (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published.