Table of Contents
Issue
I need to count the number of rows in a LogicalMatrix that are all TRUE
.
Because I need to be able to do this 1 – 2500 million times on a relatively regular basis speed actually really matters:
My current best:
The most efficient / fastest single-process way I’ve figured how to do this is in the how many Rcpp function (hm2
).
My limited profiling abilities show me that the vast majority of the time is spent in doing the if(r_tll == xcolls){...
. I can’t seem to think of a different algorithm that would be faster ( I have tried the break out of the loop as soon as a FALSE
is found and it is much slower).
details that can be assumed:
I can assume that:
- The matrix will always have fewer than 10 million rows.
- All the output matrices from upstream will have the same number of cols (for a given session/process/thread).
- There will never be more than 2326 cols per matrix.
minimal example:
m <- matrix(sample(c(T,F),50000*10, replace = T),ncol = 10L)
head(m)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] FALSE FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE
#> [2,] FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE TRUE
#> [3,] FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
#> [4,] TRUE TRUE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
#> [5,] TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE TRUE TRUE
#> [6,] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
// [[Rcpp::export]]
int hm(const LogicalMatrix& x){
const int xrows = x.nrow();
const int xcols = x.ncol();
int n_all_true = 0;
for(size_t row = 0; row < xrows; row++) {
int r_ttl = 0;
for(size_t col = 0; col < xcols; col++) {
r_ttl += x(row,col);
}
if(r_ttl == xcols){
n_all_true++;
}
}
return n_all_true;
}
I don’t understand why, but on my machine if I bake in the number of cols it is faster (if someone could explain why this is it would be great too):
// [[Rcpp::export]]
int hm2(const LogicalMatrix& x){
const int xrows = x.nrow();
// const int xcols = x.ncol();
int n_all_true = 0;
for(size_t row = 0; row < xrows; row++) {
int r_ttl = 0;
for(size_t col = 0; col < 10; col++) {
r_ttl += x(row,col);
}
if(r_ttl == 10){
n_all_true += 1;
}
}
return n_all_true;
}
timing:
microbenchmark(hm(m), hm2(m), times = 1000)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> hm(m) 597.828 599.0995 683.3482 605.397 643.8655 1659.711 1000
#> hm2(m) 236.847 237.6565 267.8787 238.748 253.5280 683.221 1000
Solution
Here’s your function, and the output from compiling it via cppFunction
:
require(Rcpp)
cppFunction('int hm(const LogicalMatrix& x)
{
const int xrows = x.nrow();
const int xcols = x.ncol();
int n_all_true = 0;
for(size_t row = 0; row < xrows; row++) {
int r_ttl = 0;
for(size_t col = 0; col < xcols; col++) {
r_ttl += x(row,col);
}
if(r_ttl == xcols){
n_all_true++;
}
}
return n_all_true;
}')
# file.*.cpp: In function ‘int hm(const LogicalMatrix&)’:
# file.*.cpp:12:29: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
# for(size_t row = 0; row < xrows; row++) {
# ^
# file.*.cpp:14:31: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
# for(size_t col = 0; col < xcols; col++) {
# ^
Note the warnings. I can get a bit of an improvement by using int
instead of size_t
for both row
and col
. Other than that, I can’t find much room for improvement.
And here’s my code, benchmarks, and reproducible example:
require(Rcpp)
require(microbenchmark)
cppFunction('int hm_jmu(const LogicalMatrix& x)
{
const int xrows = x.nrow();
const int xcols = x.ncol();
int n_all_true = 0;
for(int row = 0; row < xrows; row++) {
int r_ttl = 0;
for(int col = 0; col < xcols; col++) {
r_ttl += x(row,col);
}
if(r_ttl == xcols){
n_all_true++;
}
}
return n_all_true;
}')
hm3 <- function(m) {
nc <- ncol(m)
sum(rowSums(m) == nc)
}
set.seed(21)
m <- matrix(sample(c(T,F),50000*10, replace = T),ncol = 10L)
microbenchmark(hm(m), hm3(m), hm_jmu(m), times=1000)
# Unit: microseconds
# expr min lq median uq max neval
# hm(m) 578.844 594.1460 607.357 636.4410 858.347 1000
# hm3(m) 6389.014 6452.9595 6476.197 6735.5465 33720.870 1000
# hm_jmu(m) 409.920 415.0395 424.401 449.0075 650.127 1000
Answered By – Joshua Ulrich
Answer Checked By – Timothy Miller (BugsFixing Admin)