[SOLVED] What is the use case of std::boyer_moore_searcher?

Issue

I’m trying to find the point of having different search algorithms in C++ standard.

My experiment shows it is somewhat faster in MSVC implementation, but still it is way slower than strstr, because strstr is vectorized. With gcc on Compiler Explorer it is slower than the default, and strstr is much faster: https://godbolt.org/z/zW3o87j8T

So if I don’t care much about performance, I use default std::search, and if I do care, I use strstr or manually-vectorized algorithm. And Boyer–Moore algorithm doesn’t seem to be vectorizable, as it performs large table lookups of individual characters.

Where’s the application area of std::boyer_moore_searcher wide enough to have it standartized?


Here’s my test program and test data:

#include <chrono>
#include <cstring>
#include <iostream>
#include <functional>
#include <random>
#include <search.h>
#include <string>

const std::string haystack =
R"(Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque hendrerit quam vel dui scelerisque vehicula. Suspendisse ipsum felis,
    mattis vitae tortor at, rutrum semper nibh. Suspendisse dui ante, maximus sit amet lorem in, tempor ornare massa. Sed eget interdum mauris.
    Vivamus iaculis ante odio, quis gravida libero aliquet ut. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.
    Fusce egestas, felis et sagittis luctus, nisl lacus pretium tortor, sed vestibulum arcu odio nec dolor. Quisque scelerisque diam ac facilisis placerat.
    Cras et ante non nunc ornare ultricies. Pellentesque in feugiat nibh. Etiam tincidunt finibus pharetra. Proin ornare mollis elit, in pretium turpis.
    Aenean pharetra ipsum enim, ultrices iaculis odio interdum eget. Proin ullamcorper ullamcorper eros, ullamcorper condimentum dolor rhoncus eget.
    Fusce ligula velit, posuere quis fermentum id, faucibus ac ipsum.

    Aliquam quis metus eget neque vestibulum malesuada. Suspendisse et fermentum turpis. Phasellus egestas metus quis lacinia pellentesque. Orci varius
    natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. In fringilla justo vitae massa laoreet rutrum. Donec accumsan sem velit,
    quis fermentum mi egestas sit amet. Suspendisse potenti. Vestibulum semper lacinia urna volutpat laoreet. Vestibulum scelerisque libero nunc, vitae
    tincidunt nulla accumsan id. Sed non nisl nec nisi varius viverra nec consectetur augue. Duis lectus eros, mollis eget lectus eget, euismod bibendum
    tortor. Ut pharetra euismod metus.

    Morbi suscipit urna turpis, nec blandit lacus lobortis a. Donec purus nunc, elementum sit amet massa ac, dapibus pharetra enim. Mauris bibendum tempor
    orci rutrum auctor. Donec sed maximus erat, porttitor interdum sapien. Donec at est id nunc pulvinar molestie. Praesent sed facilisis orci, a congue
    tellus. Nulla venenatis at dolor vitae sagittis. Morbi eget dapibus diam. Ut a eros eros. Cras at augue tortor. Nam ac dictum leo, eget placerat urna.
    Etiam non elit vel neque efficitur bibendum in nec nisi. Maecenas massa mi, placerat in efficitur vel, vehicula non nisi. Pellentesque vitae est velit.

    Nam faucibus nibh ex, vitae fringilla odio dignissim eu. Curabitur eu dui at lectus luctus auctor sit amet sed quam. Nam volutpat, nunc dignissim
    posuere aliquam, dui diam tristique quam, at mollis ex urna in nunc. Donec in dui gravida, varius elit vel, facilisis felis. Vivamus diam enim, commodo
    eget porta ac, aliquet at justo. Duis in augue ac ligula malesuada vehicula eget euismod ipsum. Quisque placerat est sit amet ante elementum,
    quis laoreet est auctor.

    Morbi eleifend gravida dui quis dignissim. Integer sit amet iaculis turpis. Duis metus urna, imperdiet at auctor eget, lacinia sagittis magna.
    Ut vestibulum justo at purus egestas, vitae gravida dui semper. Integer mattis facilisis pulvinar. Mauris vitae finibus nunc. In lacinia hendrerit diam,
    et egestas tellus imperdiet quis. In vulputate fringilla tellus nec blandit. Cras dictum consequat neque, nec laoreet nibh dictum et. Praesent venenatis
    enim sed rhoncus elementum. Nunc in magna in erat pretium volutpat aliquet eget purus. Maecenas porttitor velit a hendrerit dignissim. Mauris ullamcorper
    mauris et magna aliquet fermentum. Ut imperdiet porta erat.)";

const std::string needle = "sed quam";

const void* volatile discard;

int main()
{
    constexpr int N = 1'000'000;

    auto t1 = std::chrono::steady_clock::now();
    for (int i = 0; i < N; ++i) {
        discard = std::strstr(haystack.c_str(), needle.c_str());
    }
    auto t2 = std::chrono::steady_clock::now();
    std::default_searcher s1{ needle.begin(), needle.end() };
    for (int i = 0; i < N; ++i) {
        discard = &*std::search(haystack.begin(), haystack.end(), s1);
    }
    auto t3 = std::chrono::steady_clock::now();
    std::boyer_moore_searcher s2{ needle.begin(), needle.end() };
    for (int i = 0; i < N; ++i) {
        discard = &*std::search(haystack.begin(), haystack.end(), s2);
    }
    auto t4 = std::chrono::steady_clock::now();
    std::boyer_moore_horspool_searcher s3{ needle.begin(), needle.end() };
    for (int i = 0; i < N; ++i) {
        discard = &*std::search(haystack.begin(), haystack.end(), s2);
    }
    auto t5 = std::chrono::steady_clock::now();

    std::cout
        << "strstr  " << std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1) << "\n"
        << "default " << std::chrono::duration_cast<std::chrono::duration<double>>(t3 - t2) << "\n"
        << "BM      " << std::chrono::duration_cast<std::chrono::duration<double>>(t4 - t3) << "\n"
        << "BMH     " << std::chrono::duration_cast<std::chrono::duration<double>>(t5 - t4) << "\n";
}

Solution

It depends on input size and pattern size.

My benchmark had too small pattern size.

I tried with the author’s data from here: https://github.com/mclow/search-library. I was not able to build benchmark program quickly, so I used data from there with my benchmark program. It showed that for ~100 bytes pattern strstr is better, but for ~8 KB patterns BM and BMH are better.

Answered By – Alex Guteniev

Answer Checked By – David Goodson (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *