Architecture

Event-Driven Architecture in Practice: When and How to Use It

Event-driven architecture solves real problems. It also introduces real complexity. This is how we decide when to use it, which message broker to choose, and how to avoid the operational pitfalls that kill event-driven systems in production.

Brihat Team

Engineering Team

|23 March 202612 min read|

Event-Driven Architecture in Practice: When and How to Use It

The Problem EDA Actually Solves

Event-driven architecture is not a universal solution. It is a solution to a specific set of problems: decoupling services that need to react to each other's state changes without tight coupling, handling high-throughput asynchronous workloads, and building audit trails of system activity.

The wrong reason to use EDA: because microservices and Kafka appear in every architecture diagram you see online. If you are building a system where Service A needs to call Service B and wait for the result, that is a synchronous RPC pattern — REST or gRPC. Converting it to events adds latency, complexity, and a new failure mode without solving the underlying problem.

The Right Use Cases

EDA is the right architectural choice when:

Multiple services need to react to the same event. An order placed event should trigger inventory reservation, payment processing, notification sending, and analytics ingestion — independently, in parallel, without the order service knowing about any of them.
You need temporal decoupling. The downstream service might be temporarily unavailable. The event can be retained in the broker and processed when the service recovers.
You need an audit trail by nature. An event log is a natural audit trail. Every state change in your system is represented as an immutable event.
You have high write throughput with asynchronous processing acceptable. Writing 10,000 records per second to a database directly is expensive. Writing 10,000 events per second to a Kafka topic and processing them in batches is tractable.

Message Broker Selection

Kafka: Use when you need high throughput (millions of messages per second), long retention (days to weeks), replay capability, and ordered processing within a partition. The operational complexity is significant — Kafka requires ZooKeeper or KRaft, and managing a Kafka cluster requires dedicated expertise. Use managed Kafka (AWS MSK, Confluent Cloud) unless you have a dedicated infrastructure team.

RabbitMQ: Use when you need flexible routing (topic exchanges, direct exchanges, fanout), smaller message volumes, and lower operational overhead. RabbitMQ's queue model is simpler than Kafka's partition model and better suited to work queue patterns (one consumer processes each message exactly once).

AWS SQS + SNS: Use when you are already in AWS, need managed infrastructure with zero operational overhead, and your throughput is under a few thousand messages per second. SNS (fanout) + SQS (queue per consumer) implements the pub/sub pattern with AWS managing availability, scaling, and durability.

Event Schema Design

Every event should have a consistent envelope structure:

{
  "id": "uuid",
  "type": "order.placed",
  "version": "1.0",
  "timestamp": "2024-01-15T10:30:00Z",
  "source": "order-service",
  "correlationId": "request-uuid",
  "data": {
    // event-specific payload
  }
}

The version field is critical. When you need to add a field to an event, you increment the version. Consumers declare which versions they can handle. This allows multiple versions of an event to coexist while consumers are upgraded, enabling zero-downtime schema evolution.

Use a schema registry (Confluent Schema Registry for Kafka, or a simpler JSON Schema validation layer) to enforce event schema contracts. A producer publishing malformed events is a system outage for every consumer.

The Exactly-Once Processing Problem

Message brokers guarantee at-least-once delivery. This means your consumer will sometimes receive the same message twice — during network errors, consumer restarts, or broker failovers. Your event handlers must be idempotent.

The pattern: store the event ID in a processed_events table before processing. At the start of each handler, check if the event ID has already been processed. If it has, skip processing and acknowledge the message. This turns at-least-once delivery into effectively-once processing.

The Kafka exactly-once guarantee (using transactions) is available but adds significant complexity and throughput overhead. For most use cases, idempotent handlers with at-least-once delivery are the right trade-off.

Dead Letter Queues and Error Handling

Every event consumer should have a dead letter queue (DLQ) configured. When a message fails processing after a configured number of retries, it moves to the DLQ instead of blocking the queue. An alert fires when messages appear in the DLQ. An operations team member reviews the DLQ messages, fixes the underlying issue, and replays them.

Without a DLQ, a single malformed message can block your entire queue, stopping all processing for the affected consumers. The DLQ is not optional — it is the safety valve that makes event-driven systems operable.

Observability

Debugging distributed event-driven systems requires distributed tracing. Each event should carry a correlationId (also called a trace ID) that is passed downstream through every service that processes the event. When something goes wrong, you can find all log entries and traces associated with a single business transaction by filtering on the correlation ID.

OpenTelemetry with Jaeger or AWS X-Ray gives you distributed traces across services. Configure your message broker consumers to extract the trace context from the event envelope and propagate it through the processing chain.

Building something?

Let's talk. We offer a free 30-min scoping call with no commitment.

Let's Talk →

Building something?

Let's talk. Free 30-min scoping call with no commitment.

Let's Talk →

Brihat Team

Engineering Team

The Brihat Infotech engineering team builds enterprise-grade digital systems — platforms, SaaS products, AI integrations, and workflow automations for clients across healthcare, fintech, edtech, and logistics.

Back to Blog

Found this useful? Share it.

LinkedIn Twitter WhatsApp

More from our blog

Browse All

Architecture

Why Multi-Tenant Architecture Is the Backbone of Every Scalable SaaS

If you are building a B2B SaaS product, the single most important architectural decision you will make is your tenancy model. Get it wrong and you will spend the next two years firefighting instead of shipping features.

Brihat Team29 April 202612 min

Architecture

From Monolith to Microservices: The Incremental Migration Playbook

Most microservices migrations fail because teams try to do them all at once. The strangler fig pattern, done correctly, lets you migrate incrementally without a big-bang rewrite and without taking your system offline.

Brihat Team3 March 202613 min

Enjoyed this article?

Get more like it in your inbox. Practical engineering thinking from the Brihat team — once or twice a month. No spam, ever.