[SOLVED] What are generally the most time-consuming parts of Rust compilation?

Issue

Also, how much faster would you estimate that Rust compilation might be, if its compiler was written from scratch (like Go) instead of using LLVM?

Solution

The most time-consuming parts of Rust compilation are generally the optimisation passes & final codegen, though there are probably degenerate situations for which other parts are problematic e.g. building rust-csv with -Z time-passes yields this for the final csv crate:

time: 0.000; rss: 174MB monomorphization_collector_root_collections
time: 0.002; rss: 53MB  parse_crate
time: 0.000; rss: 53MB  attributes_injection
time: 0.000; rss: 53MB  recursion_limit
time: 0.000; rss: 53MB  plugin_loading
time: 0.000; rss: 53MB  plugin_registration
time: 0.000; rss: 53MB  pre_AST_expansion_lint_checks
time: 0.000; rss: 56MB  crate_injection
time: 0.000; rss: 57MB  pre_AST_expansion_lint_checks
time: 0.000; rss: 57MB  pre_AST_expansion_lint_checks
time: 0.000; rss: 58MB  pre_AST_expansion_lint_checks
time: 0.000; rss: 58MB  pre_AST_expansion_lint_checks
time: 0.072; rss: 178MB monomorphization_collector_graph_walk
time: 0.000; rss: 60MB  pre_AST_expansion_lint_checks
time: 0.009; rss: 178MB partition_and_assert_distinct_symbols
time: 0.000; rss: 178MB find_cgu_reuse
time: 0.000; rss: 60MB  pre_AST_expansion_lint_checks
time: 0.001; rss: 60MB  pre_AST_expansion_lint_checks
time: 0.000; rss: 61MB  pre_AST_expansion_lint_checks
time: 0.000; rss: 61MB  pre_AST_expansion_lint_checks
time: 0.014; rss: 184MB LLVM_module_optimize_function_passes(bstr.9t899r4h-cgu.1)
time: 0.010; rss: 187MB LLVM_module_optimize_function_passes(bstr.9t899r4h-cgu.2)
time: 0.096; rss: 90MB  expand_crate
time: 0.000; rss: 90MB  check_unused_macros
time: 0.096; rss: 90MB  macro_expand_crate
time: 0.000; rss: 90MB  maybe_building_test_harness
time: 0.001; rss: 90MB  AST_validation
time: 0.000; rss: 90MB  maybe_create_a_macro_crate
time: 0.015; rss: 190MB LLVM_module_optimize_function_passes(bstr.9t899r4h-cgu.11)
time: 0.001; rss: 94MB  complete_gated_feature_checking
time: 0.132; rss: 94MB  configure_and_expand
time: 0.000; rss: 94MB  prepare_outputs
time: 0.008; rss: 192MB LLVM_module_optimize_function_passes(bstr.9t899r4h-cgu.3)
time: 0.028; rss: 99MB  hir_lowering
time: 0.003; rss: 99MB  early_lint_checks
time: 0.001; rss: 101MB setup_global_ctxt
time: 0.000; rss: 101MB dep_graph_tcx_init
time: 0.001; rss: 101MB create_global_ctxt
time: 0.022; rss: 195MB LLVM_module_optimize_function_passes(bstr.9t899r4h-cgu.0)
time: 0.000; rss: 104MB looking_for_entry_point
time: 0.000; rss: 104MB looking_for_plugin_registrar
time: 0.000; rss: 104MB looking_for_derive_registrar
time: 0.031; rss: 107MB misc_checking_1
time: 0.023; rss: 113MB type_collecting
time: 0.001; rss: 113MB impl_wf_inference
time: 0.018; rss: 199MB LLVM_module_optimize_function_passes(bstr.9t899r4h-cgu.14)
time: 0.213; rss: 201MB LLVM_module_optimize_module_passes(bstr.9t899r4h-cgu.1)
time: 0.000; rss: 126MB unsafety_checking
time: 0.000; rss: 126MB orphan_checking
time: 0.052; rss: 126MB coherence_checking
time: 0.177; rss: 203MB LLVM_module_optimize_module_passes(bstr.9t899r4h-cgu.3)
time: 0.014; rss: 203MB LLVM_module_optimize_function_passes(bstr.9t899r4h-cgu.15)
time: 0.036; rss: 204MB LLVM_module_optimize_function_passes(bstr.9t899r4h-cgu.10)
time: 0.202; rss: 205MB LLVM_module_optimize_module_passes(bstr.9t899r4h-cgu.0)
time: 0.172; rss: 205MB LLVM_module_optimize_module_passes(bstr.9t899r4h-cgu.14)
time: 0.335; rss: 205MB LLVM_module_optimize_module_passes(bstr.9t899r4h-cgu.2)
time: 0.006; rss: 205MB LLVM_module_optimize_function_passes(bstr.9t899r4h-cgu.8)
time: 0.326; rss: 206MB LLVM_module_optimize_module_passes(bstr.9t899r4h-cgu.11)
time: 0.157; rss: 131MB wf_checking
time: 0.012; rss: 206MB LLVM_module_optimize_function_passes(bstr.9t899r4h-cgu.6)
time: 0.007; rss: 206MB LLVM_module_optimize_function_passes(bstr.9t899r4h-cgu.5)
time: 0.004; rss: 206MB LLVM_module_optimize_function_passes(bstr.9t899r4h-cgu.7)
time: 0.036; rss: 131MB item_types_checking
time: 0.463; rss: 206MB codegen_to_LLVM_IR
time: 0.000; rss: 206MB assert_dep_graph
time: 0.000; rss: 206MB serialize_dep_graph
time: 0.547; rss: 206MB codegen_crate
time: 0.010; rss: 166MB free_global_ctxt
time: 0.003; rss: 166MB LLVM_module_optimize_function_passes(bstr.9t899r4h-cgu.12)
time: 0.052; rss: 166MB LLVM_module_optimize_module_passes(bstr.9t899r4h-cgu.7)
time: 0.075; rss: 167MB LLVM_module_optimize_module_passes(bstr.9t899r4h-cgu.5)
time: 0.005; rss: 167MB LLVM_module_optimize_function_passes(bstr.9t899r4h-cgu.4)
time: 0.033; rss: 167MB LLVM_module_optimize_module_passes(bstr.9t899r4h-cgu.12)
time: 0.216; rss: 167MB LLVM_module_optimize_module_passes(bstr.9t899r4h-cgu.15)
time: 0.004; rss: 167MB LLVM_module_optimize_function_passes(bstr.9t899r4h-cgu.13)
time: 0.005; rss: 168MB LLVM_module_optimize_function_passes(bstr.9t899r4h-cgu.9)
time: 0.197; rss: 170MB LLVM_module_optimize_module_passes(bstr.9t899r4h-cgu.10)
time: 0.040; rss: 171MB LLVM_module_optimize_module_passes(bstr.9t899r4h-cgu.13)
time: 0.140; rss: 171MB LLVM_module_optimize_module_passes(bstr.9t899r4h-cgu.6)
time: 0.055; rss: 171MB LLVM_module_optimize_module_passes(bstr.9t899r4h-cgu.9)
time: 0.088; rss: 171MB LLVM_module_optimize_module_passes(bstr.9t899r4h-cgu.4)
time: 0.222; rss: 171MB LLVM_module_optimize_module_passes(bstr.9t899r4h-cgu.8)
time: 0.049; rss: 192MB LLVM_lto_optimize(bstr.9t899r4h-cgu.8)
time: 0.047; rss: 193MB LLVM_lto_optimize(bstr.9t899r4h-cgu.0)
time: 0.117; rss: 197MB LLVM_lto_optimize(bstr.9t899r4h-cgu.14)
time: 0.108; rss: 198MB LLVM_lto_optimize(bstr.9t899r4h-cgu.11)
time: 0.131; rss: 199MB LLVM_lto_optimize(bstr.9t899r4h-cgu.15)
time: 0.161; rss: 199MB LLVM_lto_optimize(bstr.9t899r4h-cgu.1)
time: 0.115; rss: 203MB LLVM_lto_optimize(bstr.9t899r4h-cgu.10)
time: 0.315; rss: 207MB LLVM_lto_optimize(bstr.9t899r4h-cgu.2)
time: 0.574; rss: 148MB item_bodies_checking
time: 0.844; rss: 148MB type_check_crate
time: 0.015; rss: 208MB LLVM_lto_optimize(bstr.9t899r4h-cgu.6)
time: 0.118; rss: 209MB LLVM_lto_optimize(bstr.9t899r4h-cgu.4)
time: 0.043; rss: 210MB LLVM_lto_optimize(bstr.9t899r4h-cgu.5)
time: 0.030; rss: 148MB match_checking
time: 0.014; rss: 210MB LLVM_lto_optimize(bstr.9t899r4h-cgu.12)
time: 0.019; rss: 150MB liveness_and_intrinsic_checking
time: 0.049; rss: 150MB misc_checking_2
time: 0.087; rss: 210MB LLVM_lto_optimize(bstr.9t899r4h-cgu.3)
time: 0.050; rss: 210MB LLVM_lto_optimize(bstr.9t899r4h-cgu.13)
time: 0.023; rss: 210MB LLVM_lto_optimize(bstr.9t899r4h-cgu.7)
time: 0.031; rss: 210MB LLVM_lto_optimize(bstr.9t899r4h-cgu.9)
time: 1.256; rss: 213MB LLVM_passes(crate)
time: 0.000; rss: 213MB join_worker_thread
time: 0.812; rss: 213MB finish_ongoing_codegen
time: 0.000; rss: 213MB serialize_work_products
time: 0.000; rss: 213MB link_binary_check_files_are_writeable
time: 0.003; rss: 213MB link_rlib
time: 0.000; rss: 213MB link_binary_remove_temps
time: 0.004; rss: 213MB link_binary
time: 0.004; rss: 213MB link_crate
time: 0.000; rss: 213MB llvm_dump_timing_file
time: 0.817; rss: 213MB link
time: 2.690; rss: 213MB     total

As you can see the items with large time counts are pretty much all LLVM optimisation passes, that’s why cargo check is so useful (it just typechecks and stops there, so it’s generally much, much faster than a full codegen, even a debug one).

AFAIK this is a mix of advanced optimisations being plain expensive, and rustc historically generating large & complex IR and leaving a bit of a mess for llvm to untangle. And LLVM itself is heavy and not exactly lightning-fast.

I believe this slowly improves via a mix of improving the codegen and adding or moving optimisation passes to MIR. And of course ongoing extensive effort at chipping at the issue from various angles.

Also, how much faster would you estimate that Rust compilation might be, if its compiler was written from scratch (like Go) instead of using LLVM?

Well if you want fast-and-inefficient that’s pretty much the value proposition of using cranelift as the debug backend, so you can get numbers there.

But to get the optimisations in is much more expensive in both person-hours and actual CPU time.

And "debug rust" is extremely slow, to the extent that it’s one of the first thing people check when somebody asks why their rust is slower than their python, 95% of the time it’s because they were compiling in debug.

As Calvin Weng’s recent series at Pingcap notes, Rust’s model and purpose very much relies on heavy optimisations in order to fulfill its goals and promises.

Answered By – Masklinn

Answer Checked By – Jay B. (BugsFixing Admin)

Leave a Reply

Your email address will not be published. Required fields are marked *