Dead Letter Queues and Retry Strategies

Series: Designing a Microservice-Friendly Datahub

In event-driven systems, failure is not an edge case—it’s the default state waiting to happen. Networks drop packets. Services restart. Code contains bugs. Messages arrive twice, late, or malformed.

The question is not “How do we prevent failure?”
The real question is “What do we do when failure inevitably happens?”

Dead Letter Queues (DLQs) and retry strategies are the answer—but only when used deliberately. This article explains why they exist, how they work, and how to implement them correctly using the same tools you’ve seen throughout this series: PHP, Redis Streams, .NET, RabbitMQ, and Node.js.

The Core Principle: Failure Is Normal

In synchronous systems, failure is immediate and obvious.
In asynchronous systems, failure is deferred—and that makes it more dangerous.

Without explicit failure handling:

Messages disappear
Pipelines stall silently
State diverges slowly
Bugs surface days later

DLQs exist to make failure visible, contained, and reversible.

What Is a Dead Letter Queue (Really)?

A Dead Letter Queue is not a trash bin.

It is:

A quarantine zone for unprocessable messages
A pressure-release valve for pipelines
A forensic record of what went wrong

Messages land in a DLQ when:

They fail processing too many times
They are structurally invalid
They are “poison messages” (always fail)

Nothing should land in a DLQ silently.

Retry vs DLQ: Know the Difference

Retries are for temporary failures

Network timeouts
Service restarts
Rate limits
Brief unavailability

DLQs are for deterministic failures

Invalid payloads
Unsupported event versions
Missing required data
Logic bugs

Retrying deterministic failures forever is how queues die.

Retry Strategy Fundamentals

A good retry strategy answers three questions:

When should we retry?
How often should we retry?
When do we stop retrying?

The correct answers are never “immediately,” “forever,” and “we’ll see.”

RabbitMQ: DLQs and Retries in Practice

RabbitMQ makes DLQs explicit—and that’s a good thing.

Step 1: Declare a Dead Letter Exchange

channel.ExchangeDeclare(
    exchange: "events.dlx",
    type: ExchangeType.Topic,
    durable: true
);

Step 2: Configure the Main Queue With DLQ Rules

var args = new Dictionary<string, object>
{
    { "x-dead-letter-exchange", "events.dlx" },
    { "x-dead-letter-routing-key", "user.updated.failed" }
};

channel.QueueDeclare(
    queue: "user.updated.consumer",
    durable: true,
    exclusive: false,
    autoDelete: false,
    arguments: args
);

If a message is rejected or expires, RabbitMQ moves it automatically.

Step 3: Consumer Logic With Explicit Failure

consumer.Received += (sender, ea) =>
{
    try
    {
        HandleMessage(ea.Body);
        channel.BasicAck(ea.DeliveryTag, false);
    }
    catch (TransientException)
    {
        // retry
        channel.BasicNack(ea.DeliveryTag, false, true);
    }
    catch (Exception)
    {
        // poison message → DLQ
        channel.BasicNack(ea.DeliveryTag, false, false);
    }
};

This is the critical distinction:

requeue = true → retry
requeue = false → DLQ

Adding Retry Backoff (Without Melting the System)

Immediate retries are dangerous. They create retry storms.

A common RabbitMQ pattern uses delay queues.

Retry Queue With TTL

var retryArgs = new Dictionary<string, object>
{
    { "x-message-ttl", 10000 }, // 10 seconds
    { "x-dead-letter-exchange", "events" },
    { "x-dead-letter-routing-key", "user.updated" }
};

channel.QueueDeclare(
    queue: "user.updated.retry",
    durable: true,
    exclusive: false,
    autoDelete: false,
    arguments: retryArgs
);

Flow:

Failure → send to retry queue
TTL expires
Message returns to main queue
Retry occurs with delay

Backoff without code complexity.

Redis Streams: Handling Poison Messages

Redis Streams don’t have DLQs—but they have pending entries, which serve a similar purpose.

Consumer Group Pending Entries

If a consumer crashes before acknowledging:

XPENDING csl:events csl-group

You’ll see:

Message IDs
Idle time
Assigned consumer

Claiming Stuck Messages

XCLAIM csl:events csl-group processor-2 60000 1689745230000-0

If a message repeatedly fails after manual retries, you stop retrying it.

At that point:

Log it
Alert on it
Move it to a separate inspection stream

Redis forces you to think, not automate blindly.

Node.js Consumer Example With Retry Guard

try {
  await handleEvent(event);
} catch (err) {
  if (isTransient(err)) {
    throw err; // retry via requeue
  }

  await publishToDLQ(event);
}

Retry logic lives in code, but exit paths are explicit.

Poison Messages: The Silent Queue Killers

A poison message:

Always fails
Always retries
Always blocks progress

Symptoms:

Queue lag never decreases
Consumers look “healthy”
Throughput collapses

DLQs exist to sacrifice poison messages so healthy traffic survives.

Visibility Is Non-Negotiable

DLQs without monitoring are mass graves.

You must:

Alert on DLQ growth
Inspect payloads
Track failure reasons
Decide: replay, fix, discard

A DLQ should trigger human attention—not be ignored.

Idempotency Makes Retries Safe

Retries only work if processing is idempotent.

INSERT INTO processed_events (event_id)
VALUES (:event_id)
ON DUPLICATE KEY UPDATE event_id = event_id;

Without this, retries create duplicate side effects.
With this, retries become boring.

Boring is reliability.

Common Anti-Patterns (Avoid These)

Infinite retries
No DLQ
Silent message drops
Treating DLQ as “done”
Retrying deterministic failures

Every one of these turns small bugs into outages.

A Simple Mental Model

Retry = “This might work later”
DLQ = “This needs attention”
Drop = “We accept data loss” (rarely acceptable)

Make the decision explicit. Never let it be accidental.

Closing Thought

Dead Letter Queues are not about pessimism.
They’re about engineering humility.

They acknowledge:

Code is imperfect
Systems fail
Humans need visibility

A system that fails loudly, visibly, and recoverably is far safer than one that pretends failure won’t happen.

Retries keep systems alive.
DLQs keep systems honest.

Both are required for real-world reliability.

Dead Letter Queues and Retry Strategies

The Core Principle: Failure Is Normal

What Is a Dead Letter Queue (Really)?

Retry vs DLQ: Know the Difference

Retries are for temporary failures

DLQs are for deterministic failures

Retry Strategy Fundamentals

RabbitMQ: DLQs and Retries in Practice

Step 1: Declare a Dead Letter Exchange

Step 2: Configure the Main Queue With DLQ Rules

Step 3: Consumer Logic With Explicit Failure

Adding Retry Backoff (Without Melting the System)

Retry Queue With TTL

Redis Streams: Handling Poison Messages

Consumer Group Pending Entries

Claiming Stuck Messages

Node.js Consumer Example With Retry Guard

Poison Messages: The Silent Queue Killers

Visibility Is Non-Negotiable

Idempotency Makes Retries Safe

Common Anti-Patterns (Avoid These)

A Simple Mental Model

Closing Thought

Comments

Designing a Microservice-Friendly Datahub

Event Contracts as APIs

More from this blog

Brownfield Migration: The Strangler Fig Approach to BFF Adoption

BFF Resilience Patterns: Circuit Breakers, Retries & Timeouts with Polly

Caching in the BFF: In-Memory, Redis & Response Caching

Observability for BFF: Structured Logging, Distributed Tracing & Azure Application Insights

Testing the BFF: Unit, Integration & Contract Tests

Command Palette

The Core Principle: Failure Is Normal

What Is a Dead Letter Queue (Really)?

Retry vs DLQ: Know the Difference

Retries are for temporary failures

DLQs are for deterministic failures

Retry Strategy Fundamentals

RabbitMQ: DLQs and Retries in Practice

Step 1: Declare a Dead Letter Exchange

Step 2: Configure the Main Queue With DLQ Rules

Step 3: Consumer Logic With Explicit Failure

Adding Retry Backoff (Without Melting the System)

Retry Queue With TTL

Redis Streams: Handling Poison Messages

Consumer Group Pending Entries

Claiming Stuck Messages

Node.js Consumer Example With Retry Guard

Poison Messages: The Silent Queue Killers

Visibility Is Non-Negotiable

Idempotency Makes Retries Safe

Common Anti-Patterns (Avoid These)

A Simple Mental Model

Closing Thought

Comments

Designing a Microservice-Friendly Datahub

Event Contracts as APIs

More from this blog