Designing Data Intensive Applications - Chapter 7 Transactions

I’m guessing this is a chapter on concurrency and other things

Many things can go wrong in data systems

A Reliable System needs to deal with these faults

For decades, transactions have beent he mechanism of choice for simplifying these issues.

A transaction is a way for an application to group several reads and writes together into a logical unit.

This chapter will examine examples of things that can go wrong, and see algorithms that databases use to guard against the issues.

Slippery Concept of a Transaction

Almost all relational databases today, and some nonrelational databases support transactions. General idea has remained the same since 1975 (IBM System R, the first SQL DB)

Transactions have advantages and limitations





Single-Object and Multi-Object Operations

He gave examples of how atomicity / isolation work on individual objects / (rows?)

Many distributed datastores have abandoned multi-object transactions

Handling Errors and Aborts

A key feature of a transaction is that it can be aborted ACID is based on this

Not all system follow this idea

Kleppmann says errors will happen, but many software developers prefer to only think about the happy path rather than intricacies of error handling

Aborting isn’t perfect

Weak Isolation Levels

Concurrency bugs are hard to find by testing (They only appear when timing is unlucky)

Serializable Isolation (Transaction Isolation) have performance costs

To prevent performance costs, systems will use weaker isolation levels.

Don’t blindly rely on tools, we need to develop a good understanding of the kinds of concurrency problems that exist, and how to prevent them. Then, we can build reliable applications using tools at disposal

this section will look at several weak isolation levels that are used in practice and discuss in detail what kinds of race conditions and cannot occur, os that you can decide what levle is appropriate to your aplication.

Then, Kleppmann will discuss serializability in detail

Read Committed

Most basic level of transaction isolation

I feel like this is pretty straight forward right?

Why prevent these?

No Dirty Writes

Implementing Read Committed

Very popular Isolation Level

Databases solve this by using row-level locks. When a transaction wants to modify an object, they must acquire a lock first and hold it until transaction is complete.

Preventing dirty reads

Snapshot Isolation and Repeatable Read

What concurrency bugs is this level missing?

This is called a non-repeatable read or read skew

Ofc this isn’t a lasting problem, just for brief second. But other cases would make this horrible

Snapshot Isolation is the most common solution to this problem.


Typically use write locks to prevent dirty writes and don’t use any locks onr eads

Database keeps several different committed versions of an object

Two versions would be okay in the read committed isolation, but this one needs more

Tpyical approach is that read commited uses a separate snapshot for each query, while snapshot isolation uses the same snapshot for entire transaction

In PostgreSQL, transactions are given IDs and snapshots are made for those IDs

Visibility rules for observing a consistent snapshot

When transcation reads form DB, transactions IDs are used to decide which objects it can see and which are invisible.

By carefully defining visibility rules, the database can present a consistent snapshot of the database to the application.

Works like

Indexes and snapshot Isolation

How do indexes work in multi-version databse? One option is to have index point to all version of an object.

In practice, many implementation details determine what happens.

Repeatable Read and Naming Confusion

Snapshot isolation is a useful isolation level, especially for read-only transactions. Many databases call it by different names though.

SQL doesn’t have true snapshot isolation, it’s just really similar.

Preventing Lost Updates

Have mostly ignored the presence of having two transactions writing concurrently

Lost update can occur in a read-modify-write cycle (reads data, then commits a modified read)

If two transactions do this concurrently, data is lost

Atomic write operations

Databases provide atomic update operations

Usually implemented by making an exclusive lock on the object that blocks reads until update is finished

Explicit Locking

If database built-in atomic operations aren’t good enough, then application code can check for anything also trying to read-modify-update

This can happen in game development, where multiple players move characters simultaneously. It’s not based on the database.

Need to add locks in the code, but it’s very tricky to do

automatically detecting lost updates

Alternative is to allow these to occur and have a transaction manager detect a lost update, then abort the transaction and force it to retry.

Advantage is that databases can perform this check efficiently in conjunction with snapshot isolation.

It’s a great feature

Compare and Set

Databases without transactions sometimes have atomic compare-and-set operation. Purpose of this is to avoid lost updates by allowing an update to happen only if the value has not changed since you last read it.

Check to see if old value is different from when the read-modify-write cycle occured

Conflict Resolution and Replication

Replicated databases are much more difficult

Copies of data on multiple nodes and data can potentially be modified concurrently on different nodes

multi-leader and leaderless replication allow multiple writes to happen concurrently

Common approach is to just allow concurrent writes to have conflicts, then use application code or a data structure to resolve and merge these versions after the fact

Atomic operations can work well too

Write Skew and Phantoms

We’re not at the end of potential race conditions yet

Imagine doctors in a hospital where there must be at least one on call doctor

Snapshot isolation will return true for both of the doctors

Characterizing a write skew

This is called write skew

It’s neither a dirty write or a lost update, but definitely a race condition

Atomic single object operations don’t help (multiple objects are involved)

The automoatic detection of lost updates doesn’t help Some databases allow contraints, but most don’t have support for multi-object constraints

If serializable isolation level doesn’t work, then second best option is to explicitly lock the rows

More examples of Write Skew

Meeting Room Booking System

Multiplayer Game Claiming Username Preventing double spending


Using a lock to prevent this is called “Materializing Conflicts” and removing the phantom.


We’ve covered some race conditions for read commited and snapshot isolation levels. We covered tricky problems with write skew and phantoms.

Serializable Isolation is regarded as the strongest isolation level

The rest of this chapter will explore

Actual Serial Execution

Simplest way is to just remove concurrency entirely is to just only run one transaction at a time, in serial order, on a single thread

This was a new invention that came about from improvements in RAM

Encapsulating transactions in stored procedures

Databases can’t just sit around waiting for users to make up their minds if they want to book an airline ticket

Transactions are broken up, databaes don’t allow multi-statement transactions

Application must submit the entire transaction code to the database ahead of a time as a stored procedure


Stored procedures and in-memory data allow executing all transactions on a single thread


Makes concurrency control simpler, but limits transaction throughput of the database to the speed of a single CPU core on a single machine.

Tough for high-write throughput, the single-threaded transaction processor can become a bottleneck.

To scale with multiple CPU cores, can potentially partition data (supported in VoltDB)

Two-Phase Locking (2PL)

For around 30 years, this was the only algorithm for serializability in databases

Several transactions are allowed to concurrently read the same object as long as nobody is writing to it.

When someone wants to write to object, exclusive access is required

Readers and Writers get blocked

This algorithm protects against race conditions


MySQL(InnoDB) and SQL Server have this available in their serializable isolation level

Blocking of readers and writers is implemented by a lock on each object

Reads must require a lock in shared mode Writes must acquire a lock in exclusive mode Transactions may upgrade from shared to exclusive when they read and then write Once lock is acquired, hold lock until finished with transaction

Called two-phase because first phase is when locks are acquired, second phase is when transaction executes

Performance of Two Phase Locking

Performance is very slow compared to weak isolation

Very easy to have a queue of transactions that need to be run

Predicate Locks

Can add locks to search queries to like shared / exclusive to prevent write skew

predicate lock applies even to objects that do not yet exist int he database, bu tmigh tin future.

Two phase locks include predicate locks that prevent all forms of write skew and other race conditions

Index-range locks

Predicate locks do not perform well

Most 2PL DBs implement index-range locking

Don’t need to lock exact queries, Its safe to simplify a predicate by making the query range wide and locking it wide

Serializable Snapshot Isolation (SSI)

Two phase locking doesn’t perform well Serial Execution doesn’t scale well Weak isolation have good performance but have race conditions

Serializable snapshot isolation seems to be promising

First used in 2008 and is the PhD thesis of Michael Cahill

Today (???) SSI is used in both single-node databases (PostgreSQL) and distributed databases(FoundationDB)

Pessimisitc versus optimisic concurrency ocntorl

2PL is called pessimistic

Serializable snapshot is an optimisitic concurrency control technique

I’m not gonna bother writing these, Check out the chapter (Or read Alex Xu’s book)

Can reduce contention weakenss of optimisitic locking using atomic operations

Decisions based on an outdated premise

write skew occurs from having an incorrect premise (Data conditions are met and can write)

Detecting stale MVCC reads

I’m gonna skip these two pages

Something about how to detect faulty reads / bad premises when using multiple versions of objects

Detecting writes that affect prior reads

Also I will skip this one

Performance of serializable snapshot isolation


It depends on the cases