Issue
I wrote a function to use aggregate to both sum values of a specific column, and count the number of rows of that column, categorised by the values of an adjacent column (in this case numbers between 6-12). The column to Sum & Count is called Count. and the column to factorise by is called CharLen.
Two tiny dfs then placed in a list
# Test df1
{
Seq1 <- as.character(rep(c("AAA", "BBB", "CCC"),times = 4))
Count1 <- rep(c(12,56,3),times = 4)
CharLen1 <- c(6,6,6,7,7,7,9,11,12,8,10,9)
Testdf1 <- data.frame(Seq1, Count1, CharLen1); colnames(Testdf1) <- c("Seq", "Count", "CharLen")
rm(Seq1)
rm(Count1)
rm(CharLen1)
}
# Test df2
{
Seq2 <- as.character(c("DDD", "EEE", "FFF", "AAA", "BBB", "GGG", "AAA", "BBB", "CCC", "AAA", "BBB", "CCC"))
Count2 <- rep(c(7,3,15),times = 4)
CharLen2 <- c(8,6,8,7,12,12,12,11,12,8,10,9)
Testdf2 <- data.frame(Seq2, Count2, CharLen2); colnames(Testdf2) <- c("Seq", "Count", "CharLen")
rm(Seq2)
rm(Count2)
rm(CharLen2)
}
# List these dataframes together
List_of_dfs <- lapply(ls(pattern="Testdf[0-9]+"), function(x) get(x))
I wrote this into a function for the purpose of passing it a list of a large number of large and different row-length data frames.
(the data frames always have the same column number, name, and value type)
"List_of_dfs"
Function
SumCountFunction <- function(i) {
aggregate(Count ~ CharLen, data=i, FUN = function(x) c(Sum=sum(x),
Count=length(x)))
}
lapply the function to list of dfs
SummayCountOut <- lapply(List_of_dfs, SumCountFunction)
Once done I extract this to a single Summary Df
SummaryDf <- do.call("rbind", SummayCountOut)
Then add a numerical ID corresponding to the original dataframe position within the original List_of_dfs
SummaryDf[["SampleNumber"]] <- rep(seq_along(SummayCountOut), sapply(SummayCountOut, nrow))
My question and confusion is this:
- When I generate "SummayCountOut" the console correctly shows two new columns of data: "Count.Sum" & "Count.Count".
- When I convert to the single large summary dataframe "SummaryDf" this also shows correct data.
- But when I View(SummaryDf) instead of calling SummaryDf direct, the two new columns I need have disappeared.
From what I can find this is due to the object only residing while the function is called? I tried using "return" as found in another SO thread but this didn’t retain the new columns, and the only other thing I found was "<<-" which others here have stated is inherently evil.
Originally I was piping in dplyr using group_by and summary functions. I couldn’t get dplyr code into a function though (I think due to NSE or lazy eval?), hence wishing to use base R instead.
Solution
Basically, your SumCountFunction produces an embedded matrix of two columns and not a flat dataframe. You can see this with str()
call where Count is a matrix of 14 rows, 2 columns:
str(SummaryDf)
# 'data.frame': 14 obs. of 2 variables:
# $ CharLen: num 6 7 8 9 10 11 12 6 7 8 ...
# $ Count : num [1:14, 1:2] 71 71 12 15 56 56 3 3 7 29 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr "Sum" "SCount"'data.frame': 14 obs. of 2 variables:
The challenge is aggregate()
runs one grouping aggregation at time. When using c()
you are casting both aggregates together into a matrix.
Consider merging two or more than two separate aggregate calls and then rename columns to avoid the Count (original dataframe column) repeat.
# TWO-DF MERGE
SumCountFunction <- function(i) {
merge(aggregate(Count ~ CharLen, data=i, FUN = sum),
aggregate(Count ~ CharLen, data=i, FUN = length),
by = "CharLen")
}
# CHAIN MERGE (ALTERNATIVE)
SumCountFunction <- function(i) {
dfs <- lapply(c('sum', 'length'), function(f) aggregate(Count ~ CharLen, data=i, FUN = f))
Reduce(function(x, y) merge(x, y, by = "CharLen"), dfs)
}
SummaryDf <- setNames(do.call("rbind", SummayCountOut),
c("CharLen", "Count.Count", "Count.Sum"))
str(SummaryDf)
# 'data.frame': 14 obs. of 3 variables:
# $ CharLen : num 6 7 8 9 10 11 12 6 7 8 ...
# $ Count.Count: num 71 71 12 15 56 56 3 3 7 29 ...
# $ Count.Sum : int 3 3 1 2 1 1 1 1 1 3 ...
Answered By – Parfait
Answer Checked By – Cary Denson (BugsFixing Admin)