[SOLVED] Fastest Way to count all-TRUE rows of a LogicalMatrix R / C++ / Rcpp

Issue

I need to count the number of rows in a LogicalMatrix that are all TRUE.

Because I need to be able to do this 1 – 2500 million times on a relatively regular basis speed actually really matters:

My current best:

The most efficient / fastest single-process way I’ve figured how to do this is in the how many Rcpp function (hm2).

My limited profiling abilities show me that the vast majority of the time is spent in doing the if(r_tll == xcolls){.... I can’t seem to think of a different algorithm that would be faster ( I have tried the break out of the loop as soon as a FALSE is found and it is much slower).

details that can be assumed:

I can assume that:

  1. The matrix will always have fewer than 10 million rows.
  2. All the output matrices from upstream will have the same number of cols (for a given session/process/thread).
  3. There will never be more than 2326 cols per matrix.

minimal example:

m <- matrix(sample(c(T,F),50000*10, replace = T),ncol = 10L)
head(m)
#>       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
#> [1,] FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE
#> [2,] FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE
#> [3,] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
#> [4,]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE
#> [5,]  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE
#> [6,] FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
  // [[Rcpp::export]]
int hm(const LogicalMatrix& x){
  const int xrows = x.nrow();
  const int xcols = x.ncol();
  int n_all_true = 0;

  for(size_t row = 0; row < xrows; row++) {
    int r_ttl = 0;
    for(size_t col = 0; col < xcols; col++) {
      r_ttl += x(row,col);
    }
    if(r_ttl == xcols){
      n_all_true++;
    }
  }
  return n_all_true;
}

I don’t understand why, but on my machine if I bake in the number of cols it is faster (if someone could explain why this is it would be great too):

// [[Rcpp::export]]
int hm2(const LogicalMatrix& x){
  const int xrows = x.nrow();
  // const int xcols = x.ncol();
  int n_all_true = 0;

  for(size_t row = 0; row < xrows; row++) {
    int r_ttl = 0;
    for(size_t col = 0; col < 10; col++) {
      r_ttl += x(row,col);
    }
    if(r_ttl == 10){
      n_all_true += 1;
    }
  }
  return n_all_true;
}

timing:

microbenchmark(hm(m), hm2(m), times = 1000)
#>  Unit: microseconds
#>   expr     min       lq     mean  median       uq      max neval
#>  hm(m) 597.828 599.0995 683.3482 605.397 643.8655 1659.711  1000
#> hm2(m) 236.847 237.6565 267.8787 238.748 253.5280  683.221  1000

Solution

Here’s your function, and the output from compiling it via cppFunction:

require(Rcpp)
cppFunction('int hm(const LogicalMatrix& x)
{
  const int xrows = x.nrow();
  const int xcols = x.ncol();
  int n_all_true = 0;

  for(size_t row = 0; row < xrows; row++) {
    int r_ttl = 0;
    for(size_t col = 0; col < xcols; col++) {
      r_ttl += x(row,col);
    }
    if(r_ttl == xcols){
      n_all_true++;
    }
  }
  return n_all_true;
}')
# file.*.cpp: In function ‘int hm(const LogicalMatrix&)’:
# file.*.cpp:12:29: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
#    for(size_t row = 0; row < xrows; row++) {
#                              ^
# file.*.cpp:14:31: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
#      for(size_t col = 0; col < xcols; col++) {
#                                ^

Note the warnings. I can get a bit of an improvement by using int instead of size_t for both row and col. Other than that, I can’t find much room for improvement.

And here’s my code, benchmarks, and reproducible example:

require(Rcpp)
require(microbenchmark)

cppFunction('int hm_jmu(const LogicalMatrix& x)
{
  const int xrows = x.nrow();
  const int xcols = x.ncol();
  int n_all_true = 0;

  for(int row = 0; row < xrows; row++) {
    int r_ttl = 0;
    for(int col = 0; col < xcols; col++) {
      r_ttl += x(row,col);
    }
    if(r_ttl == xcols){
      n_all_true++;
    }
  }
  return n_all_true;
}')

hm3 <- function(m) {
  nc <- ncol(m)
  sum(rowSums(m) == nc)
}

set.seed(21)
m <- matrix(sample(c(T,F),50000*10, replace = T),ncol = 10L)
microbenchmark(hm(m), hm3(m), hm_jmu(m), times=1000)
# Unit: microseconds
#       expr      min        lq   median        uq       max neval
#      hm(m)  578.844  594.1460  607.357  636.4410   858.347  1000
#     hm3(m) 6389.014 6452.9595 6476.197 6735.5465 33720.870  1000
#  hm_jmu(m)  409.920  415.0395  424.401  449.0075   650.127  1000

Answered By – Joshua Ulrich

Answer Checked By – Timothy Miller (BugsFixing Admin)

Leave a Reply

Your email address will not be published. Required fields are marked *