Alex Xu V2 - 4 Distributed Message Queue

TLDR

Intro to Distributed Systems
- Multiple servers (brokers)
  - Meta data storage
  - Discovery Services (ZooKeeper)
- Replication and Leader / Follower
- Pub / Sub Model
- Point to Consumer
WAL as a Database
- Batch writes a needed here
  - More throughput, lower latency

The Problem

Message queue are the building blocks between systems

Decoupling
- Reduce tight coupling between components
Improved scalability
Increased availability
- If a system goes offline, messages can still be held
- Other components can talk to the queue
Better performance using async responses
- Producers don’t need to wait

Message Queue vs. Event Streaming Platforms

Kafka and Pulsar are technically event streaming platforms

Event streaming platforms have more features

Allow repeated message consumption, long message retention. Append-only log
This chapter will involve a distributed message queue with additional features like the line above

Step 1 - Understand the Problem and Establish Design Scope

Producers send messages, consumers consume them What about the performance?

Message delivery semantics
Data detention

What’s the format of the messages and their size? Is it text only? Is multi-media allowed?

Text messages in the range of KBs

Can messages be repeatedly consumed?

Messages can be repeatedly consumed (This is an added feature, not traditional MQ)

Are messages consumed in same order they were produced?

Yes Does data need to be persisted and what is the data retention?
Yes, data retention is two weeks (This is an added feature)

How many producers and consumers will we support

The more the better

What data delivery semantic do we support? At-most once? at-least-once? Exactly once?

Support at-least-once, but ideally we could support all of them

What’s the target throughput and end-to-end latency?

It should support high throughput for use cases like log aggregation. It should also support low latency delivery for more traditional message queue use cases.

Functional Requirements

Producers send messages to queue
Queue stores messages for 2 weeks
Consumers consume messages from queue
Messages can be consumed repeatedly or only once
Message size in KBs
Deliver messages to consumers in order they were added
Data delivery can be configured

Non-Functional Requirements

High Throughput or Low Latency, configurable based on use cases
Scalable. The system should be distributed in nature. It should be able to support a sudden surge in message volume
Persistent and durable

Step 2 -> Propose High Level Design and Get Buy-In

Messaging Models

Point-to-point

Message only consumed by a single consumer
Consumer acknowledge that it consumed it, and it’s removed
Locks? for multiple consumers?

Publish - Subscribe

Topic
- Categories to organize messages
Messages are sent to topics and received by consumers who subscribe to the topic

consumer group for point to point

pub-sub for topics

Topics, partitions, and brokers

How do we scale?
- What if data volume is too large for a single server?
- Partitions and sharding

Message is sent to one of the partitions

Can organize by key (user_id), or randomly do it
Each consumer has it’s own partition to scan for

HLD

![[Screenshot 2024-02-11 at 10.49.11 AM.png]]

Client

Producer pushes messages
Consumer subscribes and consumes messages

Core service and Storage

Broker -> ServeHolds multiple partitions
- Servers that hold the partitions are called brokers
Storage
- Data storage -> Messages are persisted in data storage partitions
- State Storage -> Consumer states
- Metadata storage -> Configuration and properties of topics

Coordination service

Service discovery
Leader election -> One of brokers as an active controller to assign partitions
Apache ZooKeeper or ETCD to elect a controller

Design Deep Dive

He chooses to use an on-disk data structure that takes advantage of great sequential access performance of rotational disks and the aggressive disk caching strategy of modern OS

Messages that are passed are not modified to reduce copying

His design encourages batching. Producers send messages in batches. Message queue persists message in even larger batches. Consumers fetch in batches if possible

Data storage

Write-heavy and read-heavy

No update or delete operations

Predominantly sequential read/write access

Option 1: Database

Relational Database

Create topic tables and store messages as the rows No SQL
Collection as topic, message as documents

Database doesn’t support write-heavy AND read-heavy

Option 2: Write-ahead-log (WAL)

A plain file where new entires are appended

Use as redo log in MySQL and the WAL in ZooKeeper

LOOK THIS PART UP [5]

Disk performance of sequential access is very good. Rotational disks have large capacity and are pretty afforable

Disks are slow for random access, but we can get track of our needed data

Easily allows several hundred MB/sec of read and write speed

Message Data Structure

Field | Datatype

Message key is used to determine the partition of the message, if it’s not defined thenw e do random

value is a payload of the message

The message can be found using partition, offset, and topic

Producer Flow

Which broker / partition should the producer sent the message to?

We need to service that routes the messages to the proper broker / parition

Producer sends message to routing layer Routing layer reads # of partitions available from metadata storage Routes message to the active leader to decide Followers copy the data too Leader commits data and sends and “ACK” to the produer

To improve availability this design also uses replication

Improves Fault Tolerance

Drawbacks

New routing layer means more network latency Request batching isn’t considered here

alternative

We add the routing layer into the producers and consumers directly

Consumer Flow

Push vs. Pull Model

Push model

low latency, message read as soon as it arrives
cons ->
- Too many messages could arrive and consumers are overwehelmed
- Brokers control the rate, so hard to deal with powerful / diverse consumers

Pull Model

consumers control the rate
Can add more or less consumers to fix the rate
Better for batch processing

Replication

ISR -> Checks which replicas are in sync with the leader

Not in sync means they cannot be used
- Set the number of messages that you want before it can’t be used anymore

ACK