RabbitMQ poisoned messages handling

Tue, Feb 9, 2016

1. The Problem

You never know what will come from remote system. Even if you keep everything under control, there always can be something unexpected. In microservice architecture each component must be ready for everything.
When your software architecture involves message queue broker (like RabbitMQ) the interaction between your micro services looks like follows:

The problem is that if message consumer is not able to handle message, the message will be returned to queue broker and then will be delivered to the message consumer once again for the second attempt.

Here are the reasons why consumer can fail during message handling:

Message is corrupted (bad format)
Message tells lie (external resources that should be related to the message are invalid or don’t exist)
Consumer crash due to internal logic bug
Consumer crash due to external communication issue (networking, database, etc.)
Consumer manual restart (planned)

At first glance, all these issues must be caught and handled in one way or another. We can also have general mechanism that gives each individual message some retries count to be handled by consumer. And then if consumer still cannot handle the message we must remove the message from the working queue and put it somewhere for detailed analysis.

2. General Requirements

To give each individual message some kind of ‘retries counter’, we have to store this counter value somewhere for each individual message. Once this counter value exceeds maximum allowed value, the message must be removed from the working queue.

3. Documentation Overview

3.1. No Such Functionality

As described on Stack Overflow, there is no such functionality in RabbitMQ. This question was also discussed in RabbitMQ letters, blogs and forums.

3.2. Dead Letter Exchange

As part of queue configuration, there is an ability to set ‘dead letter exchange’. RabbitMQ documentation describes the purpose of this feature as follows:
Messages from a queue can be ‘dead-lettered’; that is, republished to another exchange when any of the following events occur:
•   The message is rejected (basic.reject or basic.nack) with requeue=false,
•   The TTL for the message expires; or
•   The queue length limit is exceeded.

4. Selected Strategy

4.1. Point Of View

The general requirement is to store retries counter somewhere. There are 2 places where this value can be stored:

In message itself
Not in message

First option is preferred because in this case we do not need any external storage that must support distributed access for all consumers.

4.2. Incoming Message Structure

Each message contains message body and message headers. Message body is the specific information the message was created for. Message header is usually system information that allows message brocker to route messages and store other “non-user” information. So the best place to store retries counter (as I can see it) is message headers that are string – object map. These headers are pretty much the same as HTTP headers. So the Idea is to have something like this: x-delivery-attempts: 3.

4.3. Solution

Here is the component diagram that shows the general Idea

Here is the algorithm for different scenarios:

Scenario A: Works as expected

Producer puts message to queue
Consumer gets message from queue
Consumer handles message
Consumer acknowledges message
Queue removes the message

Scenario B: Poisoned message

Producer puts message to queue
Consumer gets message from queue
Consumer checks if x-delivery-attempt by > {max allowed retries}
Consumer fails handling message but not crashes
Consumer creates message copy
Consumer increments x-delivery-attempt by 1 and sets this header to message copy
Consumer puts message copy to the tail of queue
Consumer acknowledges original message
Queue removes the original message
Repeat steps 2-8 {max allowed retries} - 1 times
Consumer gets message from queue
Consumer checks if x-delivery-attempt by > {max allowed retries}
Consumer rejects the message with require = false
Queue marks message as ‘x-death’
Queue moves message to dead-letter-exchange

Scenario C: Consumer crushes unexpectedly

Producer puts message to queue
Consumer gets message from queue
Consumer crushes
Queue detects connectivity break to consumer
Queue marks message with Redelivery = true
Consumer gets message from queue
Consumer detects redelivery = true
Consumer creates message copy
Consumer increments x-delivery-attempt by 1 and set this header to message copy
Consumer puts message copy to the tail of queue
Scenario A or Scenario B

Scenario C explanation:
Consumer should not handle redelivered messages because previous attempt finished unexpectedly and there is no guarantee it will not happen again.

4.4. Dead-Letter-Exchange Configuration

As described in RabbitMQ documentation, there is an ability to configure this option using console mode or via administrative web-portal. The queue will detect dead letters automatically based on TTL, Max-length, or consumer rejects with require argument = false

4.5. Limitation

As you can see, each time when something goes wrong, the message goes to the tail of queue.

Thus, message order breaks and this solution can be unacceptable if you need the message order.
For case of long queue and time-consuming operations, the message will reach the consumer once again with significant delay. This also can be unacceptable for some systems.