Alex Xu V2 - 4 Distributed Message Queue


The Problem

Message queue are the building blocks between systems

Most popular Message Queues

Message Queue vs. Event Streaming Platforms

Kafka and Pulsar are technically event streaming platforms

Event streaming platforms have more features

Step 1 - Understand the Problem and Establish Design Scope

Producers send messages, consumers consume them What about the performance?

What’s the format of the messages and their size? Is it text only? Is multi-media allowed?

Can messages be repeatedly consumed?

Are messages consumed in same order they were produced?

How many producers and consumers will we support

What data delivery semantic do we support? At-most once? at-least-once? Exactly once?

What’s the target throughput and end-to-end latency?

Functional Requirements

Non-Functional Requirements

Step 2 -> Propose High Level Design and Get Buy-In

Messaging Models


Publish - Subscribe

consumer group for point to point

pub-sub for topics

Topics, partitions, and brokers

Message is sent to one of the partitions


![[Screenshot 2024-02-11 at 10.49.11 AM.png]]


Core service and Storage

Coordination service

Design Deep Dive

He chooses to use an on-disk data structure that takes advantage of great sequential access performance of rotational disks and the aggressive disk caching strategy of modern OS

Messages that are passed are not modified to reduce copying

His design encourages batching. Producers send messages in batches. Message queue persists message in even larger batches. Consumers fetch in batches if possible

Data storage

Write-heavy and read-heavy

No update or delete operations

Predominantly sequential read/write access

Option 1: Database

Relational Database

Database doesn’t support write-heavy AND read-heavy

Option 2: Write-ahead-log (WAL)

A plain file where new entires are appended


Disk performance of sequential access is very good. Rotational disks have large capacity and are pretty afforable

Disks are slow for random access, but we can get track of our needed data

Message Data Structure

Field | Datatype

key | byte val | byte topic | string parition | int offset timestamp size crc | int

Message key is used to determine the partition of the message, if it’s not defined thenw e do random

The message can be found using partition, offset, and topic

Producer Flow

Which broker / partition should the producer sent the message to?

We need to service that routes the messages to the proper broker / parition

Producer sends message to routing layer Routing layer reads # of partitions available from metadata storage Routes message to the active leader to decide Followers copy the data too Leader commits data and sends and “ACK” to the produer

To improve availability this design also uses replication


New routing layer means more network latency Request batching isn’t considered here


We add the routing layer into the producers and consumers directly

Consumer Flow

Push vs. Pull Model

Push model

Pull Model


ISR -> Checks which replicas are in sync with the leader