3 minutes
Message Ordering Problem in Pub/Sub
When we start using a Pub/Sub, We might expect that the message will be processed in the order we publish them. The more complex architecture becomes, the more likely messages will be processed out of order. There are many factors that impact message ordering.
Is message ordering important?
- It depends on the kind of messages to be processed.
How can this happen?
- Horizontal scaling
- Events are published close to each other.
- Something unexpected can cause events to be delayed.
- Most of the time, if processing a message fails, Pub/Sub retry the message after some time or move it to the tail of the queue. In both scenarios, there’s a possibility of messages being processed out of order.
How to deal with this situation?
- One Topic
- Partition per Entity
- Versioning Entities
- Independent Updates
One Topic
This is to have a single topic for all events. The downside of this approach is that all handlers receive all events, even if they do not care about them, but in terms of order, it is the simplest approach.
Partition per Entity
Some Pub/Subs support explicit metadata for messages that make it possible to scale processing them while keeping them in order. This can be called an ordering key, partition key, or something similar. In Kafka, a partition is a way to split a topic into multiple queues. Each partition is a separate queue, and Kafka guarantees that the messages within a partition will be delivered in order. The crucial step is to set a partition key for each message. A partition key is a string that uniquely identifies the entity, a unique ID. Kafka uses a hashing algorithm to determine which partition the message should go to.
Versioning Entities
We can add a Version field to the event. For example, when handling the event, the handler checks if the version number is equal to the current version of the item in the database plus one. If the version in the event is bigger, the event is out of order, and the handler should return it back to the queue.
type ItemUpdated struct {
ID uuid.UUID
Name string
Description string
Price string
Version int
}
type Item struct {
ID uuid.UUID
Name string
Description string
Price string
Version int
}
Independent Updates
When you have a single entity updated by multiple event handlers, such as a Booking with a status field. The status can be updated by multiple events, like BookingCreated
, BookingCanceled
, etc. If one of the events arrives out of order, the status will be overwritten with a previous value.
The solution to this problem is straightforward. Make each event update some fields of the model independently. In the example above, you can have a BookingCreated
event that only updates the created_at
field and a BookingCanceled
event that updates the canceled_at
field.
The model keeps the correct state even if the messages arrive out of order. If we need a single field that contains the current status, we can calculate it immediately when reading the data.
type BookingCreated struct {
ID uuid.UUID
CreatedAt time.Time
}
type BookingCanceled struct {
ID uuid.UUID
CanceledAt time.Time
}
type Booking struct {
ID uuid.UUID
LastCreatedAt time.Time
LastCanceledAt time.Time
}