Zero-Cost Abstraction in Rust: Understanding What the Compiler Actually Does
The Problem
Many of us know Rust as "as fast as C++ but safe". One of the most important mechanisms behind this claim is zero-cost abstraction. But what does this actually mean? Are Iterator chains, closures, Option and Result monadic operations really zero-cost after compilation? Or is this just a marketing slogan?
In this article, we'll look behind the scenes of the Rust compiler. By examining MIR (Mid-level IR) and assembly outputs via Godbolt (Compiler Explorer), we'll see with concrete examples how abstractions are optimized.
What Is Zero-Cost Abstraction?
With Bjarne Stroustrup's definition:
- You don't pay for what you don't use
- What you do use, you can't write better by hand
Rust takes this principle one step further. While C++'s virtual dispatch, RTTI, and exception handling incur runtime costs, Rust's design eliminates these costs through monomorphization and aggressive inlining.
Monomorphization: The Magic Behind Generics
In Rust, generic functions and structs are optimized through monomorphization. The compiler generates separate code for each concrete type. This is similar to C++ templates but more secure with trait bounds.
fn max<T: PartialOrd>(a: T, b: T) -> T {
if a > b { a } else { b }
}
fn main() {
let x = max(3i32, 5i32);
let y = max(3.14f64, 2.71f64);
}
When this code is compiled, two separate functions are generated for max::<i32> and max::<f64>. Looking at the assembly output:
; Optimized assembly for max::<i32>
example::max_i32:
cmp edi, esi
cmovl edi, esi
mov eax, edi
ret
; Optimized assembly for max::<f64>
example::max_f64:
comisd xmm0, xmm1
cmovbe xmm0, xmm1
ret
As you can see, type-specific CPU instructions like cmovl for i32 and cmovbe for f64 are used. No vtable, dynamic dispatch, or boxing anywhere.
Iterators: Chained Operations Are Zero-Cost
One of the most misunderstood topics is iterator chains. Does .map().filter().collect() create intermediate lists like in Python? No.
pub fn sum_of_squares_even(v: &[i32]) -> i32 {
v.iter()
.filter(|&x| x % 2 == 0)
.map(|&x| x * x)
.sum()
}
The optimized assembly for this code:
example::sum_of_squares_even:
test rsi, rsi
je .LBB0_1
; Main loop - single pass!
xor eax, eax
.LBB0_3:
mov ecx, dword ptr [rdi]
mov edx, ecx
and edx, 1
imul ecx, ecx
neg edx
sbb edx, edx
not edx
and ecx, edx
add eax, ecx
add rdi, 4
dec rsi
jne .LBB0_3
ret
.LBB0_1:
xor eax, eax
ret
The compiler merged three separate iterator adaptors (iter, filter, map) into a single loop. No intermediate storage, no function calls. That's zero-cost abstraction.
How is this possible? Iterator adaptors are generic structures, each wrapping the previous iterator:
// Actual type of the iterator chain (what the compiler sees)
type ChainType = std::iter::Map<
std::iter::Filter<
std::slice::Iter<'_, i32>,
fn(&&i32) -> bool
>,
fn(&&i32) -> i32
>;
LLVM flattens this nested type structure completely. Result: a single flat loop, zero overhead.
Option and Result: Null Pointer Optimization
Rust's Option<T> and Result<T, E> types carry a discriminant (tag), right? Not always.
pub fn option_size() {
// These two types are the SAME size!
assert_eq!(std::mem::size_of::<Option<&i32>>(), std::mem::size_of::<&i32>());
assert_eq!(std::mem::size_of::<Option<Box<i32>>>(), std::mem::size_of::<Box<i32>>());
assert_eq!(std::mem::size_of::<Option<std::num::NonZeroI32>>(), std::mem::size_of::<i32>());
}
This is possible thanks to null pointer optimization (NPO). Types like &T, Box<T>, NonNull<T> can never be null, so the None value is represented with a null pointer. This way Option<&T> is exactly the same size as *const T.
// In this code, no discriminant field is allocated for Option<&T>
pub fn find_even(v: &[i32]) -> Option<&i32> {
v.iter().find(|&x| x % 2 == 0)
}
In assembly:
example::find_even:
test rsi, rsi
je .LBB2_4
; rdi: slice pointer, rsi: length
; rdx: used as current element pointer
lea rdx, [rdi + 4*rsi]
.LBB2_2:
mov eax, dword ptr [rdi]
test al, 1
je .LBB2_5 ; found even number
add rdi, 4
cmp rdi, rdx
jne .LBB2_2
.LBB2_4:
xor eax, eax ; None -> return null pointer
ret
.LBB2_5:
mov rax, rdi ; Some(&value) -> return pointer
ret
The Option<&i32> return value fits in a single register: null if None, pointer if Some(ptr). No tag whatsoever.
Closures: No Virtual Calls
Rust closures create a unique anonymous type for each call site. These types implement the Fn, FnMut, or FnOnce traits. Thanks to monomorphization, each closure call is resolved statically.
pub fn apply_twice<F: Fn(i32) -> i32>(f: F, x: i32) -> i32 {
f(f(x))
}
pub fn use_closure() -> i32 {
let multiplier = 10;
apply_twice(|x| x * multiplier, 5)
}
Assembly:
example::use_closure:
mov eax, 500 ; 5 * 10 * 10 = 500, completely constant folded!
ret
The compiler inlined the closure, solidified the multiplier variable with constant propagation, and performed the entire calculation at compile time. Nothing remains at runtime.
What happens if the closure captures state?
pub fn counter() -> impl FnMut() -> i32 {
let mut count = 0;
move || {
count += 1;
count
}
}
Here the closure captures the count variable. The compiler creates something like:
// Anonymous struct created by the compiler (approximately)
struct CounterClosure {
count: i32,
}
impl FnMut<()> for CounterClosure {
fn call_mut(&mut self, _: ()) -> i32 {
self.count += 1;
self.count
}
}
This struct is 4 bytes (a single i32). No heap allocation, no vtable pointer.
dyn Trait vs impl Trait: Making Informed Choices
Zero-cost abstraction isn't always possible. When you use dynamic dispatch (dyn Trait), you pay a cost:
// Static dispatch - monomorphized, can be inlined
fn process_static(x: &impl Display) {
println!("{}", x);
}
// Dynamic dispatch - call via vtable, cannot be inlined
fn process_dynamic(x: &dyn Display) {
println!("{}", x);
}
Differences between them:
| Property | impl Trait / <T: Trait> |
dyn Trait |
|---|---|---|
| Dispatch | Static (compile-time) | Dynamic (runtime) |
| Inlinable | Yes | No |
| Binary size | Grows (separate code for each type) | Small (single code path) |
| Call cost | ~0 (inlined) | ~1-2ns (vtable lookup) |
| Allocation | None | Box<dyn Trait> heap allocates |
| Type info | Known at compile-time | Erased at runtime (type erasure) |
Practical advice: Use static dispatch in hot paths. Save dyn Trait for heterogeneous collections or when you want to reduce binary size.
Memory Layout of Enums
Rust enums are smarter than C unions. The compiler chooses the smallest layout that fits all variants:
use std::mem::size_of;
enum Small {
A(i32),
B(i32),
C,
}
enum Mixed {
A(i64),
B(i32, i32),
C(bool),
D,
}
// Small: 8 bytes (4 bytes data + 4 bytes discriminant padding)
// Mixed: 16 bytes (8 bytes data + discriminant)
assert_eq!(size_of::<Small>(), 8);
assert_eq!(size_of::<Mixed>(), 16); // Due to i64 alignment requirement
The Rust compiler also does niche optimization. If an enum has unused bit patterns, it places the discriminant there:
enum NicheOptimized {
A(bool), // bool: 0 or 1, 255 patterns unused
B,
C,
D,
E,
}
// Only 1 byte! (bool's unused 254 patterns encode other variants)
assert_eq!(size_of::<NicheOptimized>(), 1);
This explains why Rust's Option<bool> and Option<Ordering> types don't use extra space.
Real-World Example: Writing a Parser
To see the practical benefit of zero-cost abstraction, let's write a simple CSV parser:
use std::str;
pub struct CsvRow<'a> {
raw: &'a str,
}
impl<'a> CsvRow<'a> {
pub fn fields(&self) -> impl Iterator<Item = &'a str> {
self.raw.split(',').map(|s| s.trim())
}
pub fn get(&self, index: usize) -> Option<&'a str> {
self.fields().nth(index)
}
}
pub fn parse_csv(input: &str) -> impl Iterator<Item = CsvRow<'_>> {
input.lines()
.filter(|l| !l.is_empty() && !l.starts_with('#'))
.map(|line| CsvRow { raw: line })
}
// Usage
pub fn sum_column(input: &str, col: usize) -> Option<i64> {
parse_csv(input)
.filter_map(|row| row.get(col))
.filter_map(|val| val.parse::<i64>().ok())
.sum::<i64>()
.into() // To return None instead of 0
}
This code:
- Zero allocation: No
StringorVeccreated, everything is borrowed - Zero copying:
&strslices reference the original input - Single pass: The
filter->filter_map->sumchain is optimized into a single loop - Zero panic possibility: Error handling with
Option
A Python script with the same functionality would pay the cost of split(), strip(), list creation, and garbage collection for each line.
Understanding the Compiler: cargo-asm and Godbolt
To see assembly output of your own code:
# View function-by-function assembly with cargo-asm
cargo install cargo-asm
cargo asm --lib crate_name::function_name
# Online inspection with Godbolt (Compiler Explorer)
# https://rust.godbolt.org/
# Use -C opt-level=3 -C target-cpu=native for maximum optimization
Compiler optimization levels:
# Cargo.toml
[profile.release]
opt-level = 3 # 0-3, s and z (for size)
lto = true # Link-time optimization
codegen-units = 1 # Disable parallel compilation, more aggressive optimization
panic = "abort" # No stack unwind on panic
lto = true is particularly important. Normally LLVM optimizes each crate separately. With LTO, all crates are combined and inlining can be done beyond crate boundaries. codegen-units = 1 disables parallel codegen and increases optimization quality.
Limits and Reality
There are cases where zero-cost abstraction doesn't always work:
1. Excessive monomorphization (code bloat)
// This code copies the same logic for hundreds of different type combinations
fn process<T: Serialize, U: Deserialize>(input: T) -> U {
// ...
}
Solution: Abstract the internal logic with dyn Trait or enum, make only the input/output layer generic.
2. Async functions and state machine size
// Each .await point adds a field to the state machine
async fn complex_flow() {
let a = step1().await; // State 0
let b = step2(a).await; // State 1
let c = step3(b).await; // State 2
// State machine size: max(sizeof(state_i)) + discriminant
}
Breaking large async functions into smaller pieces reduces state machine size.
3. Box<dyn Future> with type erasure
Trait objects prevent monomorphization; each call goes through vtable.
Summary
Rust's zero-cost abstraction claim is not a marketing slogan, but a direct result of compiler architecture. Thanks to monomorphization, null pointer optimization, niche optimization, and aggressive inlining:
- Iterator chains become a single flat loop
- Generics produce specially optimized code for each concrete type
- Closures are inlined, solidified with constant propagation
Option<&T>uses no extra memory- Enums are stored in the smallest possible layout
These aren't "free" — you pay with compile time and binary size. But at runtime, it's no slower than hand-optimized C code.
Practical advice:
- Use
impl Traitinstead ofdyn Traitin hot paths - Avoid unnecessary
collect()calls — iterators are lazy, don't break the flow - Set
lto = trueandcodegen-units = 1in release builds - Check assembly output of critical functions with
cargo-asm - Break up large async functions, reduce state machine size
Final word: When you hear the word "abstraction" in Rust, don't think of vtables and heap allocation like in Java. Rust's abstractions are zero-cost — and you can prove it by looking at the compiler output.
Tags: rust, zero-cost-abstraction, monomorphization, compiler-optimization, llvm, assembly, iterator, generics, english Date: 2026-05-26MARKDOWN_EOF