Building an operating system from scratch is a rite of passage for systems programmers, and Rust’s promise of memory safety without garbage collection makes it an enticing choice. After years of hands-on experimentation with a Rust-based, ground-up OS, I’ve identified five persistent obstacles that often derail even the most determined developers. These aren’t theoretical stumbling blocks—they’re real, recurring issues that demand careful engineering to overcome.
Why unsafe Code is Inescapable in the Kernel
The kernel’s core mission—managing hardware, handling interrupts, and controlling memory—demands direct, low-level operations that Rust’s safe abstractions can’t provide. While userspace code can neatly wrap unsafe behind carefully designed APIs, the kernel’s bottom layer is inherently unsafe. A single misstep in a page fault handler or memory-mapped I/O access can destabilize the entire system.
The challenge isn’t about reducing unsafe usage; it’s about managing its risks. Assuming only a small fraction of your kernel needs unsafe is a dangerous misconception. Nearly every critical component—the scheduler, memory allocator, and interrupt handlers—relies on it. The solution lies in treating unsafe as a controlled capability, not an afterthought.
- Document every
unsafefunction with a// SAFETY:comment that explicitly states preconditions and invariants. - Use compile-time assertions like
const_assert!to enforce structural guarantees before runtime. - Centralize hardware access in a hardware abstraction layer (HAL) crate, but recognize that the rest of the kernel will still require
unsafefor core operations.
Consider the example of writing to a memory-mapped register. The function must clarify that the caller guarantees the address is valid, properly aligned, and held under a lock:
/// SAFETY: `addr` must point to a valid 4-byte-aligned MMIO address for the device,
/// and the caller must hold the device lock to prevent concurrent access.
pub unsafe fn mmio_write(addr: *mut u32, value: u32) {
addr.write_volatile(value);
}The comment doesn’t make the operation safe—it documents the contract so future maintainers understand the risks.
Breaking the Memory Allocation Deadlock
At first glance, Rust’s standard library seems ready-made for kernel development. You want Vec, Box, and Arc—but these rely on a global allocator. The allocator, in turn, depends on synchronization primitives like locks. Locks require a working scheduler. And the scheduler can’t run without memory allocation. It’s a classic chicken-and-egg dilemma.
Attempting to defer allocator setup until "later" only compounds the problem. Early boot stages often need dynamic data structures to build the initial free memory list, making it impossible to postpone allocation indefinitely.
The workaround is a two-phase approach:
- Phase 1: Bootstrap allocator
Deploy a minimal bump allocator that runs before locks, schedulers, or complex synchronization are available. It allocates memory by incrementing a pointer but cannot free memory, which is acceptable during early boot.
- Phase 2: Full-featured allocator
Once the scheduler and spinlocks are operational, replace the bump allocator with a more sophisticated system like a buddy or slab allocator.
Here’s a simplified representation of the bootstrap phase:
static mut BOOT_HEAP_START: usize = 0;
static mut BOOT_HEAP_OFFSET: usize = 0;
pub unsafe fn boot_alloc(size: usize) -> *mut u8 {
let ptr = (BOOT_HEAP_START + BOOT_HEAP_OFFSET) as *mut u8;
BOOT_HEAP_OFFSET += size;
ptr
}Later, you transition to a proper global allocator using the #[global_allocator] attribute and the alloc crate.
Interrupt Handlers: Where Panics Are Prohibited
Interrupt handlers execute in an unforgiving environment. They cannot block, allocate memory, or trigger panics. In Rust, a panic would unwind the stack, corrupting the interrupted context—leading to system instability or crashes. This constraint clashes with Rust’s default behavior, where unwrap() and expect() are common sources of panics.
Even debug assertions that might panic are risky. The solution involves isolating interrupt handlers from Rust’s normal control flow:
- Mark handlers with
#[naked]or wrap them in assembly routines that save and restore registers before invoking safe Rust code. - Use
#![feature(naked_functions)]to enable raw handler definitions. - Apply
#![deny(unsafe_op_in_unsafe_fn)]to enforce stricter review of unsafe blocks.
For x86_64 systems, an example IDT entry wrapper might look like this:
#[naked]
extern "C" fn double_fault_handler() {
unsafe {
asm!(
"push rax; push rcx; push rdx; ...",
options(noreturn)
);
// Transition to a safe Rust handler without unwinding
}
}Inside the safe handler, log the error and halt execution. No unwinding, no surprises.
Spinlocks: A Single Core Isn’t Enough
Spinlocks seem straightforward—loop until the lock is free—but they reveal hidden complexities when multiple cores and preemption enter the picture. On a single-core system, an infinite spinlock can freeze the entire OS. On multicore systems, a spinlock may lead to deadlocks if an interrupt handler tries to acquire the same lock while the original holder is preempted.
The core issue is that spinlocks require careful integration with the broader system:
- Phase 1: Early boot (single core)
Use a spinlock that disables interrupts to prevent preemption while the lock is held.
- Phase 2: Multicore operation
Transition to a proper mutex that parks the waiting thread instead of spinning indefinitely.
This shift typically coincides with the scheduler becoming operational. Until then, all locks function as interrupt-disabling primitives.
Here’s a simplified, interrupt-safe spinlock for early boot:
pub struct IrqSpinlock {
lock: AtomicBool,
data: UnsafeCell<()>, // Placeholder for protected data
}
impl IrqSpinlock {
pub fn lock(&self) -> IrqGuard {
let flags = disable_interrupts();
while self.lock.swap(true, Ordering::Acquire) {
enable_and_wait(flags);
flags = disable_interrupts();
}
IrqGuard { lock: self, flags }
}
}The guard ensures interrupts are re-enabled when the lock is released.
The Intertwined Dance of Allocator, Scheduler, and Locks
At the heart of kernel development lies a circular dependency: you want a Vec for process management, but Vec needs an allocator. The allocator requires locks, and locks depend on the scheduler to yield if contested. Meanwhile, the scheduler needs a Vec of runnable processes. Breaking this cycle demands explicit layering and temporary stubs.
A phased approach helps manage these dependencies:
- Bootstrap phase
Use a bump allocator (no locking) and a static mutable array for the process list with fixed capacity.
- Initialization phase
Introduce a simple round-robin scheduler compatible with the bump allocator.
- Production phase
Replace temporary structures with fully dynamic alternatives.
This process isn’t glamorous, but it’s necessary. Skipping phases or cutting corners leads to brittle code that fails under real-world conditions.
Looking Ahead: From Prototype to Production System
Rust’s memory safety guarantees make it an attractive language for kernel development, but the path to a reliable OS is anything but straightforward. The challenges aren’t just technical—they’re architectural, requiring thoughtful design, rigorous documentation, and iterative refinement. By confronting these five hard problems head-on, developers can bridge the gap between proof-of-concept and production-ready systems. The journey is long, but the destination—a memory-safe, performant kernel—is within reach for those willing to persist.
AI summary
Learn the real challenges of writing a kernel in Rust and the proven solutions to memory allocator deadlocks, unsafe code risks, and interrupt handler complexities.