Dead Letter Queue (DLQ)

A holding area for messages that keep failing so they can be inspected, fixed, and replayed safely.

A dead letter queue stores messages that cannot be processed after multiple attempts. It isolates bad events from the main flow so they do not block or corrupt processing.

In business systems, DLQs catch problematic webhooks, malformed records, or downstream timeouts. Operators inspect and fix these messages before replaying them.

DLQs fit into event-driven workflows as a safety net. They improve reliability by preventing endless retries and by providing a clear path to remediate and reprocess failures.

Frequently Asked Questions

When should a message go to the DLQ?

After a defined number of retries or on specific fatal errors (e.g., schema violations). Avoid sending transient errors prematurely.

What should I store with each DLQ message?

Original payload, error messages, retry count, timestamps, and correlation IDs. Include the stack trace or status code when possible.

How do I replay DLQ messages safely?

Fix the root cause, then replay in small batches with idempotent processing. Monitor closely and stop if errors recur.

How big should the DLQ be?

Size it based on peak failure scenarios and retention policies. Set alerts on backlog growth and age of oldest message.

How do I prevent DLQs from hiding systemic issues?

Alert on DLQ rate and patterns. If similar errors spike, pause producers or add a circuit breaker instead of endlessly offloading to DLQ.

Can DLQs hold sensitive data?

They often do. Secure them with access controls, encryption at rest, and masking where possible. Limit who can read and replay.

What policies should govern DLQ handling?

Define owners, SLAs for triage, replay procedures, and when to drop irreparable messages. Document steps in a runbook.

Do I need separate DLQs per stream?

Prefer per-stream DLQs to isolate issues and simplify ownership. Shared DLQs complicate triage and blast radius.

How does a DLQ interact with retries?

DLQ is the final stop after retries are exhausted. Tune retry counts and backoff to balance resilience with timely DLQ routing.

Hourglass background
Ready to move faster

Ship glossary-backed automations

Plan Your First 90 Days