DDIA chapter 5: replication

Kleppmann’s chapter on replication is the kind of writing that reorganises a part of your brain you didn’t know was disorganised. I’ve worked with Postgres streaming replication for years and ‘understood’ it in the way you understand a tool by its symptoms; this chapter gave me the vocabulary to talk about why the symptoms are the way they are.

The headline insight, for me, was the framing of single-leader / multi-leader / leaderless as three points on a tradeoff curve, not three different categories of database. Once you see them as variations on ‘how do we order writes’, the rest — conflict resolution, eventual consistency, read-your-writes — falls out naturally.

What I went away with: I now understand why our reporting replica is two minutes behind production at peak load, and why that is a feature (the replica is async; it gets to ignore back-pressure and just keep replaying). I also finally understand the read-your-writes anomaly I’ve seen in customer-support tooling — write to the leader, immediately read from a replica, get stale data, then a confused tier-1 agent.

Next up: I want to go read about logical replication in Postgres specifically (chapter recommends it). The mental model is that the replica receives a stream of intent (this row was updated) rather than a stream of physical disk changes, which makes it possible to replicate across major versions and to do partial replication. That last bit could be useful for a thing I’m prototyping.