[SOLVED] Why second spin in Spinlock gives performance boost?


Here is a basic Spinlock implemented with std::atomic_flag.
The author of the book claims that second while in the lock() boosts performance.

class Spinlock
    std::atomic_flag flag{};
    void lock() {
        while (flag.test_and_set(std::memory_order_acquire)) {
            while (flag.test(std::memory_order_acquire)); //Spin here
    void unlock() {

The reason we use test() in an extra inner loop is performance:
test() doesn’t invalidate cache line, whereas test_and_set() does.

Can someone please elaborate on this quote? Test is still a read operation and need to be read from memory right?


Reading a memory address does not clear the cache line.

Writing does.

So in a modern computer, there is RAM, and there are multiple layers of cache "around" the CPU (they are called L1, L2 and L3 cache, but the important part is that they are layers, and the CPU is at the middle). In a multi-core system, often the outer layers are shared; the innermost layer is usually not, and is specific to a given CPU.

Clearing the cache line means informing every other cache holding this memory "the data you own may be stale, throw it out".

Test and set writes true and atomically returns the old value. It clears the cache line, because it writes.

Test does not write. If you have another thread unsynchronized with this one, it reading the cache of this memory doesn’t have to be poked.

The outer loop writes true, and exits if it replaced false. The inner loop waits until there is a false visible, then falls to outer loop. The inner loop need not clear every other cpu’s cache status of the value of the atomic flag, but the outer has to (as it could change the false to true). As spinning could go on for a while, avoiding continuous cache clearing seems like a good idea.

Answered By – Yakk – Adam Nevraumont

Answer Checked By – David Marino (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *