[SOLVED] Performance difference Rust and C++

Issue

I am currently learning Rust, and as a first exercise I wanted to implement a function that computes the nth fibonacci number:

fn main() {
    for i in 0..48 {
        println!("{}: {}", i, fibonacci(i));
    }
}

fn fibonacci(n: u32) -> u32 {
    match n {
        0 => 0,
        1 => 1,
        _ => fibonacci(n - 1) + fibonacci(n - 2),
    }
}

I run it as:

$ time cargo run --release

real    0m15.380s
user    0m15.362s
sys     0m0.014s

As an exercise, I also implemented the same algorithm in C++. I was expecting a similar performance, but the C++ code runs in 80% of the time:

#include<iostream>

unsigned int fibonacci(unsigned int n);

int main (int argc, char* argv[]) {
        for(unsigned int i = 0; i < 48; ++i) {
                std::cout << i << ": " << fibonacci(i) << '\n';
        }
        return 0;
}

unsigned int fibonacci(unsigned int n) {
        if(n == 0) {
                return 0;
        } else if (n == 1) {
                return 1;
        } else {
                return fibonacci(n - 1) + fibonacci(n - 2);
        }
}

Compiled as:

$ g++ test.cpp -o test.exe -O2

And running:

$ time ./test.exe

real    0m12.127s
user    0m12.124s
sys     0m0.000s

Why do I see such a difference in performance? I am not interested in calculating the fibonacci faster in Rust (with a different algorithm); I am only interested on where the difference comes from. This is just an exercise in my progress as I learn Rust.

Solution

TL;DR: It’s not Rust vs C++, it’s LLVM (Clang) vs GCC.

Different optimizers optimize the code differently, and in this case GCC produces larger but faster code.

This can be verified using godbolt.

Here is Rust, compiled with both GCC (via rustgcc-master):

example::fibonacci:
    push    r15
    push    r14
    push    r13
    push    r12
    push    rbp
    xor     ebp, ebp
    push    rbx
    mov     ebx, edi
    sub     rsp, 24
.L2:
    test    ebx, ebx
    je      .L1
    cmp     ebx, 1
    je      .L4
    lea     r12d, -1[rbx]
    xor     r13d, r13d
.L19:
    cmp     r12d, 1
    je      .L6
    lea     r14d, -1[r12]
    xor     r15d, r15d
.L16:
    cmp     r14d, 1
    je      .L8
    lea     edx, -1[r14]
    xor     ecx, ecx
.L13:
    cmp     edx, 1
    je      .L10
    lea     edi, -1[rdx]
    mov     DWORD PTR 12[rsp], ecx
    mov     DWORD PTR 8[rsp], edx
    call    example::fibonacci.localalias
    mov     ecx, DWORD PTR 12[rsp]
    mov     edx, DWORD PTR 8[rsp]
    add     ecx, eax
    sub     edx, 2
    jne     .L13
.L14:
    add     r15d, ecx
    sub     r14d, 2
    je      .L17
    jmp     .L16
.L4:
    add     ebp, 1
.L1:
    add     rsp, 24
    mov     eax, ebp
    pop     rbx
    pop     rbp
    pop     r12
    pop     r13
    pop     r14
    pop     r15
    ret
.L6:
    add     r13d, 1
.L20:
    sub     ebx, 2
    add     ebp, r13d
    jmp     .L2
.L8:
    add     r15d, 1
.L17:
    add     r13d, r15d
    sub     r12d, 2
    je      .L20
    jmp     .L19
.L10:
    add     ecx, 1
    jmp     .L14

And with LLVM (via rustc):

example::fibonacci:
    push    rbp
    push    r14
    push    rbx
    mov     ebx, edi
    xor     ebp, ebp
    mov     r14, qword ptr [rip + example::[email protected]]
    cmp     ebx, 2
    jb      .LBB0_3
.LBB0_2:
    lea     edi, [rbx - 1]
    call    r14
    add     ebp, eax
    add     ebx, -2
    cmp     ebx, 2
    jae     .LBB0_2
.LBB0_3:
    add     ebx, ebp
    mov     eax, ebx
    pop     rbx
    pop     r14
    pop     rbp
    ret

We can see that LLVM produces a naive version — calling the function in each iteration of the loop — while GCC partially unrolls the recursion by inlining some calls. This results in a smaller number of calls in the case of GCC, and at about 5ns of overhead per function call, it’s significant enough.

We can do the same exercise with the C++ version using LLVM via Clang and GCC and note that the result is pretty much similar.

So, as announced, it’s a LLVM vs GCC difference, not a language one.

Incidentally, the fact that optimizers may produce such widely different results is a reason why I am quite excited at the progress of the rustc_codegen_gcc initiative (dubbed rustgcc-master on godbolt) which aims at pluging a GCC backend into the rustc frontend: once complete anyone will be able to switch to the better optimizer for their own workload.

Answered By – Matthieu M.

Answer Checked By – Mary Flores (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *