Book: Rust Atomics and Locks

23 July 2023 · 1842 words

software development

Learning low level concurrency primitives through Rust! I read the book "Rust Atomics and Locks" by Mara Bos, and it helped me develop my understanding of, and build an intuition for, how atomics work in modern computers.

First of all — I highly recommend buying (or borrowing) the book. Buy locally if you can! Support your local book dealers.

Audience

Developers that need a deeper understanding of the inner workings of our programs and machines.

Focus

To me, the most interesting chapters of the book are Chapter 2. Atomics and Chapter 3. Memory Ordering. Other parts of the book build on these two chapters and show us how to create higher level primitives on top of atomics.

Problem statement

To build a standard library for Rust with useful multi-threading primitives like Mutex, or third-party APIs like Tokio's Channel, we need to have a means of building those primitives.

Atomic operations are one of the necessary building blocks to achieve that.

Sidebar with my friend jonasdn

It's completely possible to have a solid systems programming career without ever needing to know about memory ordering or atomics. But it doesn't hurt to know about them either — They're building blocks for the things that you do use.

– @jonasdn@mastodon.nu

Atomics

Atomics enable us to share information between threads in a data race free way. The tools we have in the toolbox are atomic variables, and atomic fences.

Atomic variables are small — way too small to hold any significant amount of data. The maximum size of an atomic variable depends on the platform but can at least have the size of a pointer on that platform, meaning 64 bits for modern platforms like x86-64 or ARM64/AArch64.

If you have an atomic variable, you can write to it from one thread, and the result will be visible to another thread at some (not too distant) time in the future, without further synchronization.

The most important attribute of atomics is that it's possible to use them to order other instructions, meaning it's possible to know that some other data than the atomic variable itself is ready to read fully and correctly.

Memory Ordering

To be able to achieve our goal of correctly and reliably sending data between threads we need to know when we're allowed to access that data. The term when inside a computer program, or even inside a computer, is a complicated thing to nail down.

Compilers are allowed to reorder instructions, as long as the end result is the same. Rust, C, C++, etc. all have one or more optimization steps that reorder and combine different parts of your code in the name of performance. And that's a good thing, because otherwise our programs would be slow.

Processors also reorder instructions — pipelined instructions stages, speculative execution, and other things end up reordering the actual instructions. This is done because CPUs often wait for things like RAM to be loaded into a cache line, and while it's doing that it can complete other instructions that are ready to execute.

How do we bring order to this world? We need to have control over the chaos that are programs running on CPUs. We need to control in which order things happen. But we don't want too much control. That would slow down our programs — compilers and CPUs reorder instructions for a reason!

Atomic variables

An atomic variable stores a small amount of data, and when you load from it or store to it, you need to specify a memory ordering flag.

The memory orderings available in Rust are found in the Ordering enum:

pub enum Ordering {
    // Weakest ordering guarantees
    Relaxed,
    Release,
    Acquire,
    AcqRel,
    // Strongest ordering guarantees
    SeqCst,
}

The typical methods you will use on atomic variables are

load Loads a value from the variable
store Stores a value to the variable
compare_exchange First checks if the value inside the variable is the expected one, and then replaces it if the check passes

There's far more information available in the book, and in the Rust documentation.

Relaxed ordering

Ordering::Relaxed is the weakest ordering. This is useful when simply sharing data between threads inside the atomic variable, and is the most performant.

Unfortunately Ordering::Relaxed can't be used to prevent the compiler or the processor from reordering instructions by itself. Using a Relaxed ordering only guarantees that once written, the value can be safely read from another thread.

Release & Acquire ordering

Ordering::Release is used to store into an atomic variable, and guarantees that any instructions before it on the same thread happen-before any instructions after an Ordering::Acquire load from that same atomic variable on another thread. But not without a few caveats —

If you store into an atomic variable using the Release ordering, and then load that same variable on another thread using an Acquire ordering there is a guarantee that any instructions before the release-store happen-before any instructions after the acquire-load if and only if:

The release-store is observed by the acquire-load — example: if the acquire-load sees an AtomicBoolean change from false to true, and that is the expected precondition for another instruction to have happened, then the instructions of the release-store happen-before the acquire-load.
The release-store was from the same thread as the instructions expected to have happened-before — example: If two (or more) threads store into an atomic variable only one of the threads will have a happens-before relationship with the acquire-load, namely the thread that had its release-store observed by the acquire-load.

The last rule means that if ten threads all release-store (write) to the same atomic variable, only one of them will have a happens-before relationship with the acquire-load (read), and only that one thread can be expected to have had all of its instructions ordered correctly. If we try to read memory written by any of the other threads at that point, that memory might be improperly written or synchronized leading to Undefined Behavior.

Atomic fences

An atomic fence is an operation that does not read or write any data itself, but can be used together with reading and writing to accomplish a certain ordering. One interesting property is that reading and writing can be done using Ordering::Relaxed, and it will still work. This is mostly useful if there are multiple atomic variables that need to be used together with a single fence. Another interesting property of an atomic fence is that it can be issued conditionally.

In short: issuing a Release fence followed by a Relaxed store on one thread, then issuing a Relaxed load followed by an Acquire fence on another thread establishes a happens-before relationship between the fence instructions even though the actual store and the actual load were both Relaxed. This seemed counterintuitive to me when I read it. Especially the part where the store and the load happen between the fences instead of before/after them.

The paragraphs in the Rust Atomics and Locks book covering atomic fences had me asking how fences were supposed to be used. It was all there, of course, but not clear enough for me to fully understand it without additional resources. Jeff Preshing's blog post made it click. Have a look!

What if we don't use atomics?

Let's declare a shared mutable variable that we are going to use in a couple of examples.

// Have some globally modifiable data (unsafe)
static mut GLOBAL_DATA: u128 = 0;

Imagine starting two threads, writing data in one of them, and reading data from the other.

Thread 1:

// Write some shared data using unsafe
unsafe { GLOBAL_DATA = 12345; }

Thread 2:

// How long do we wait here before the data is readable?
let data = unsafe { GLOBAL_DATA };
// do something with data...

It's almost impossible to know how long to wait for the data to become available in the second thread. This is because it could be stuck in an L1 cache inside the CPU, a cache that may "never" be available to us since the threads can live on different CPU cores.

Fixing it using atomic variables and fences

Thread 1:

// Write some shared data using unsafe
unsafe { GLOBAL_DATA = 12345; }
// Then issue a Release fence and a relaxed write
atomic::fence(Ordering::Release);
atomic_bool.store(true, Ordering::Relaxed);

Thread 2:

// Assume that this code is called often (in a loop?)
// At some point atomic_bool will flip to true in this thread
if atomic_bool.load(Ordering::Relaxed) {
    // Since we issue an Acquire fence here _and_ we saw atomic_bool be true,
    // we can be sure that GLOBAL_DATA is safe to read after this fence.
    atomic::fence(Ordering::Acquire);
    let data = unsafe { GLOBAL_DATA };
    // do something with data...
}

Now we've made sure that the compiler isn't allowed to reorder the data write to GLOBAL_DATA to after the store of true to atomic_bool.

And we've made sure that the read of the GLOBAL_DATA can't be reordered to before the load of atomic_bool.

The Orderings we use: Acquire and Release, in the exact places they're used not only makes sure that instructions can't be reordered by the compiler, they also inform the CPU to share or flush relevant cache lines (in L1, L2, etc.) in a way that makes the write to GLOBAL_DATA with Release visible to the next read of GLOBAL_DATA from the thread that executes the atomic Acquire.

As I wrote in the Atomics section above, atomics is what you use to "order other instructions". Instructions that aren't protected by a Mutex, or data that can't fit inside an atomic variable itself. Typically, it's about fully and correctly writing to some shared memory from one thread, signalling that it's ready through atomic operations, and then reading that shared data from another thread.

The other instruction in the example above is the write to the GLOBAL_DATA variable, which isn't atomic.

It's important to get this right — reading data that isn't fully written is Undefined Behavior and at that point all bets are off!

That one weird paragraph

A quote from Rust Atomics and Locks: Chapter 3. Memory Ordering

The basic happens-before rule is that everything that happens within the same thread happens in order. If a thread is executing f(); g();, then f(); happens-before g().

I spent way too much time thinking about this paragraph in the book. As far as I can tell, function calls are allowed to be reordered, since most code can be reordered.

What I think it means is that the observable effects in that thread at runtime are the same as if f(); happened before g();, not that they were necessarily executed in that order in software or even in hardware.

The point the book tries to make in the paragraphs after that is that happens-before relationships between threads only happen for certain types of operations, of which atomic variables and fences are two.

Final words

Use atomics to

Send small amounts of data between threads directly inside an atomic variable
Send large amounts of data between threads by creating happens-before relationships between instructions in different threads using atomic variables and atomic fences

I can't stress enough how important the last point is — the capability to order other instructions than atomic instructions between threads, and use that to correctly synchronize data between them.

Dig deeper by getting a copy of Rust Atomics and Locks by Mara Bos.