Message Queues Demystified - Part 3: Technologies Deep Dive
Understanding the "what" is not enough. You must understand the "why" and "how" of each
technology to make informed architectural decisions and hold your own in any technical discussion.
Table of Contents
- Apache Kafka
- RabbitMQ
- AWS SQS and SNS
- Redis Streams
- Azure Service Bus
- Google Cloud Pub/Sub
- Comprehensive Comparison Table
- How to Choose the Right Technology
1. Apache Kafka
What Kafka Really Is
Kafka is fundamentally a distributed, replicated, persistent commit log. This is not just a message queue - it is an append-only ledger of events that consumers can read from at any point, any number of times.
Think of it like a database transaction log, except the log itself IS the system, not a side effect.
Core Architecture
Kafka Cluster (3+ brokers for production)
Broker 1 (Leader for Partition 0) Broker 2 (Leader for Partition 1)
+-----------------------------+ +-----------------------------+
| Topic: order-events | | Topic: order-events |
| Partition 0: [msg0, msg1, | | Partition 1: [msg3, msg4, |
| msg2, ...] | | msg5, ...] |
| Partition 2 Replica | | Partition 0 Replica |
+-----------------------------+ +-----------------------------+
Broker 3 (Leader for Partition 2)
+-----------------------------+
| Topic: order-events |
| Partition 2: [msg6, msg7, |
| msg8, ...] |
| Partition 1 Replica |
+-----------------------------+
ZooKeeper/KRaft: Cluster coordination, leader election, metadata
Topics and Partitions - The Core Concepts
Topic: A logical category of messages. Think of it as a "table name" in the Kafka world.
Partition: A topic is split into partitions. Each partition is a physically separate, ordered, immutable log file stored on disk.
Topic: order-events
|
+-- Partition 0: offset[0]=OrderPlaced(101), offset[1]=OrderPlaced(102), offset[2]=OrderPaid(101)
+-- Partition 1: offset[0]=OrderPlaced(103), offset[1]=OrderShipped(101), offset[2]=OrderPlaced(104)
+-- Partition 2: offset[0]=OrderPaid(102), offset[1]=OrderShipped(102), offset[2]=OrderPaid(103)
Why partitions matter:
- Parallelism: Each partition can be consumed by one consumer in a consumer group. 3 partitions = maximum 3 consumers working in parallel.
- Ordering: Messages within one partition are strictly ordered. Messages across partitions have no ordering guarantee.
- Throughput: More partitions = more parallelism = higher aggregate throughput.
Partition assignment by key:
// All events for orderId "12345" go to the same partition
// This guarantees ordering for that specific order
kafkaTemplate.send("order-events",
orderId.toString(), // Partition key - determines which partition
event
);
// Kafka computes: partition = hash(key) % numPartitions
// Same key ALWAYS goes to same partition (unless partition count changes)Offsets - The Bookmark System
Every message in a partition has a unique, monotonically increasing number called an offset.
Partition 0:
Offset: 0 1 2 3 4
Message: [msg-A] [msg-B] [msg-C] [msg-D] [msg-E]
^ ^
| |
Consumer A Consumer A's current
read to here committed offset
Consumers track their own offset (called committed offset). This means:
- A consumer can re-read from any historical offset (replay)
- If a consumer crashes, it restarts from the last committed offset
- Different consumer groups have completely independent offsets for the same partition
Auto-commit vs Manual commit:
// Auto-commit (NOT recommended for production)
@KafkaListener(topics = "orders")
public void handleOrder(OrderEvent event) {
// Offset is auto-committed periodically, regardless of whether processing succeeded
// If processing fails AFTER auto-commit, message is lost
processOrder(event);
}
// Manual commit (recommended)
@KafkaListener(topics = "orders")
public void handleOrder(OrderEvent event, Acknowledgment ack) {
try {
processOrder(event);
ack.acknowledge(); // Commit offset ONLY after successful processing
} catch (Exception e) {
// Do NOT acknowledge - message will be reprocessed
throw e;
}
}Consumer Groups - The Scaling Mechanism
A consumer group is a set of consumers that together consume all partitions of a topic. Kafka ensures:
- Each partition is assigned to exactly one consumer in the group
- If a consumer joins or leaves, partitions are rebalanced
Topic: order-events (6 partitions)
Consumer Group: "order-processing-group" (3 consumers)
Partition 0 --> Consumer A
Partition 1 --> Consumer A
Partition 2 --> Consumer B
Partition 3 --> Consumer B
Partition 4 --> Consumer C
Partition 5 --> Consumer C
Scale up to 6 consumers:
Partition 0 --> Consumer A (one partition each)
Partition 1 --> Consumer B
...
Partition 5 --> Consumer F
Scale up to 7 consumers:
Partitions 0-5 assigned to Consumers A-F
Consumer G is IDLE - no partition to consume (more consumers than partitions = waste)
Key insight: The maximum parallelism of a consumer group equals the number of partitions. You cannot have more active consumers than partitions.
Multiple consumer groups get all messages independently:
Topic: order-events
Consumer Group "email-service": gets all messages (for email processing)
Consumer Group "warehouse-service": gets all messages (for warehouse)
Consumer Group "analytics-service": gets all messages (for analytics)
Each group has its own independent offsets. Adding a new group does not affect existing groups.
Replication and Durability
Every Kafka partition has one leader and zero or more replicas (followers).
- All reads and writes go to the leader
- Replicas passively replicate the leader's log
- If the leader broker fails, one of the in-sync replicas becomes the new leader
ISR (In-Sync Replicas): Replicas that are caught up with the leader within a configurable lag threshold. Only ISR members are eligible to become leader.
Producer acknowledgment settings (acks):
| Setting | Meaning | Durability | Latency | Risk |
|---|---|---|---|---|
acks=0 | Fire and forget, no acknowledgment | Lowest | Lowest | Message loss if broker crashes |
acks=1 | Leader acknowledges write | Medium | Medium | Message loss if leader crashes before replica replicates |
acks=all | All ISR must acknowledge | Highest | Highest | No message loss as long as at least one replica survives |
For financial, payment, or critical data: always use acks=all.
@Configuration
public class KafkaProducerConfig {
@Bean
public ProducerFactory<String, Object> producerFactory() {
Map<String, Object> config = new HashMap<>();
config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka:9092");
config.put(ProducerConfig.ACKS_CONFIG, "all"); // Wait for all ISR
config.put(ProducerConfig.RETRIES_CONFIG, Integer.MAX_VALUE);
config.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 1); // Preserve ordering during retries
config.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true); // Prevent duplicate messages
config.put(ProducerConfig.LINGER_MS_CONFIG, 5); // Batch for 5ms
config.put(ProducerConfig.BATCH_SIZE_CONFIG, 32768); // 32KB batch
config.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
return new DefaultKafkaProducerFactory<>(config);
}
}Log Compaction
Kafka supports two retention policies:
1. Time-based retention (default):
Messages are deleted after a configurable time period (e.g., 7 days)
Regardless of whether consumers have read them
2. Log compaction:
For each unique message key, only the LATEST message is kept
Older messages with the same key are deleted during compaction
Before compaction:
Key=user-123: [email=old@example.com] [email=temp@example.com] [email=new@example.com]
Key=user-456: [email=bob@example.com]
After compaction:
Key=user-123: [email=new@example.com] (latest value only)
Key=user-456: [email=bob@example.com]
Use log compaction for:
- User profile updates (latest state of each user)
- Configuration data (latest config per key)
- Any "update" pattern where you only need the current state per entity
KRaft - Replacing ZooKeeper
Historically, Kafka used ZooKeeper for cluster coordination (leader election, metadata management). This was a pain point because it meant running a separate ZooKeeper cluster.
KRaft (Kafka Raft) - introduced in Kafka 2.8, production-ready in Kafka 3.3 - replaces ZooKeeper entirely. Kafka now manages its own consensus using the Raft protocol, stored in a special metadata topic.
Benefits of KRaft:
- Simpler deployment (one fewer system to manage)
- Faster metadata operations
- Supports more partitions per cluster (from ~200K to millions)
- Better recovery from controller failures
Kafka Configuration Quick Reference
# Producer - balanced for reliability and throughput
acks=all
retries=2147483647
max.in.flight.requests.per.connection=5
enable.idempotence=true
compression.type=lz4
linger.ms=5
batch.size=65536
buffer.memory=33554432
# Consumer - manual commit for reliability
enable.auto.commit=false
auto.offset.reset=earliest
max.poll.records=500
fetch.min.bytes=1
fetch.max.wait.ms=500
session.timeout.ms=30000
heartbeat.interval.ms=10000
# Topic configuration
retention.ms=604800000 # 7 days
replication.factor=3
min.insync.replicas=2 # At least 2 ISR must acknowledge writes2. RabbitMQ
What RabbitMQ Really Is
RabbitMQ is a message broker implementing the AMQP (Advanced Message Queuing Protocol). It is a push-based system with sophisticated routing capabilities. The key abstraction is the Exchange-Queue-Binding model, which gives you powerful message routing without consumer-side filtering.
Core Architecture
Producer --> Exchange --> (Binding Rules) --> Queue --> Consumer
Producer never writes directly to a Queue.
The Exchange is a routing agent.
Bindings define the rules for routing.
Exchange Types - Deep Dive
Direct Exchange
Routes messages to queues whose binding key exactly matches the routing key.
Routing key: "order.created"
|
[Direct Exchange]
|
binding: "order.created" --> [Queue: order-processing]
binding: "order.shipped" --> [Queue: shipping-notifications] (no match - not delivered)
binding: "payment.done" --> [Queue: payment-audit] (no match - not delivered)
Use case: Routing specific event types to specific queues.
Fanout Exchange
Routes to ALL bound queues, completely ignoring the routing key.
[Fanout Exchange: broadcast]
|
+----|----+
| | |
v v v
Queue Queue Queue
(A) (B) (C)
All three queues receive every message.
Use case: Broadcasting notifications, cache invalidation, system-wide events.
Topic Exchange (Most Flexible)
Routes based on pattern matching using wildcards.
Routing keys and which queues they match:
"order.us-east.placed" -->
* Queue "us-east-orders" (binding: "order.us-east.*") MATCH
* Queue "all-orders" (binding: "order.#") MATCH
* Queue "placed-events" (binding: "#.placed") MATCH
* Queue "payments" (binding: "payment.#") NO MATCH
"payment.eu-west.failed" -->
* Queue "payment-alerts" (binding: "payment.*.failed") MATCH
* Queue "eu-west-events" (binding: "*.eu-west.#") MATCH
* Queue "all-events" (binding: "#") MATCH
Pattern syntax:
*matches exactly one word (segment between dots)#matches zero or more words
Headers Exchange
Routes based on message headers instead of routing key. More flexible but slower than other types.
// Producer - sets headers
MessageProperties props = new MessageProperties();
props.setHeader("region", "us-east");
props.setHeader("priority", "high");
props.setHeader("x-match", "all"); // "all" = AND logic, "any" = OR logic
// Consumer binding - match specific header values
@RabbitListener(bindings = @QueueBinding(
value = @Queue("high-priority-us-east"),
exchange = @Exchange(value = "headers-exchange", type = ExchangeTypes.HEADERS),
arguments = {
@Argument(name = "region", value = "us-east"),
@Argument(name = "priority", value = "high"),
@Argument(name = "x-match", value = "all")
}
))
public void handleHighPriorityUsEastMessage(MyMessage message) { ... }Queue Properties and Arguments
@Bean
public Queue orderProcessingQueue() {
return QueueBuilder.durable("order-processing")
// Dead letter configuration
.withArgument("x-dead-letter-exchange", "dlx")
.withArgument("x-dead-letter-routing-key", "order-processing.dead")
// Message expiry
.withArgument("x-message-ttl", 300000) // Messages expire after 5 minutes
// Queue depth limit - reject new messages when queue is full
.withArgument("x-max-length", 100000)
.withArgument("x-overflow", "reject-publish") // Or "drop-head" to drop oldest
// Priority queue support
.withArgument("x-max-priority", 10)
// Quorum queue (replicated, persistent - recommended for production)
.withArgument("x-queue-type", "quorum")
.build();
}Queue types comparison:
| Type | Persistence | Replication | Performance | Use Case |
|---|---|---|---|---|
| Classic | Optional (durable flag) | Mirrored policies (deprecated) | High | Simple use cases, backwards compat |
| Quorum | Always persistent | Raft-based, automatic | Medium-High | Production workloads, reliability |
| Stream | Always persistent | Log-based (like Kafka) | Very High | Event streaming, replay needed |
Use Quorum Queues in production. Classic mirrored queues are deprecated.
Consumer Prefetch (QoS)
Prefetch controls how many unacknowledged messages a consumer can hold at once.
// Without prefetch:
// Broker delivers ALL queued messages to the first available consumer
// One fast consumer gets all work; slow consumer gets nothing
// If fast consumer crashes, hundreds of messages must be redelivered
// With prefetch=10:
// Each consumer receives at most 10 unacknowledged messages
// Work is distributed fairly
// Crash recovery involves at most 10 messages
@Configuration
public class RabbitMQListenerConfig {
@Bean
public SimpleRabbitListenerContainerFactory rabbitListenerContainerFactory(
ConnectionFactory connectionFactory) {
SimpleRabbitListenerContainerFactory factory = new SimpleRabbitListenerContainerFactory();
factory.setConnectionFactory(connectionFactory);
factory.setPrefetchCount(10); // Hold at most 10 unACKed messages
factory.setAcknowledgeMode(AcknowledgeMode.MANUAL);
factory.setConcurrentConsumers(3); // 3 consumer threads
factory.setMaxConcurrentConsumers(10); // Scale up to 10 under load
return factory;
}
}Prefetch tuning guidelines:
prefetch=1- Maximum fairness, minimum throughput (one by one)prefetch=10-50- Good balance for most workloadsprefetch=100+- High throughput, less fairness, larger redelivery batch on crash
Acknowledgment Modes
@Component
public class OrderConsumer {
@RabbitListener(queues = "orders")
public void processOrder(Order order, Channel channel, Message message) throws IOException {
long deliveryTag = message.getMessageProperties().getDeliveryTag();
try {
orderService.process(order);
// ACK: message successfully processed, delete from queue
channel.basicAck(deliveryTag, false);
} catch (BusinessLogicException e) {
// NACK + requeue=false: permanent failure, send to DLQ
channel.basicNack(deliveryTag, false, false);
} catch (TransientException e) {
// NACK + requeue=true: temporary failure, try again
// WARNING: Be careful with this - can create infinite loops
// Always pair with max retry logic
channel.basicNack(deliveryTag, false, true);
}
}
}3. AWS SQS and SNS
AWS SQS - Simple Queue Service
SQS is a fully managed, serverless message queue service. There is no broker to manage, no cluster to provision.
Standard Queue vs FIFO Queue
| Feature | Standard Queue | FIFO Queue |
|---|---|---|
| Throughput | Unlimited | 300 TPS (3,000 with batching) |
| Ordering | Best-effort (approximate) | Strict FIFO within a message group |
| Delivery | At-least-once (occasional duplicates) | Exactly-once processing |
| Deduplication | Not built-in | Built-in (5-minute dedup window) |
| Price | Lower | Higher |
| Use case | High throughput, order not critical | Payment processing, order sequencing |
Visibility Timeout - The Most Important SQS Concept
When a consumer receives a message from SQS, the message is NOT deleted. Instead, it becomes invisible to other consumers for the visibility timeout period.
Message arrives in queue
|
Consumer A calls ReceiveMessage
Message becomes invisible for 30 seconds (visibility timeout)
|
Two possible outcomes:
Outcome 1 - Success:
Consumer A processes successfully
Consumer A calls DeleteMessage
Message is permanently deleted
(within the 30 second window)
Outcome 2 - Consumer crashes or processing takes too long:
30 seconds pass without DeleteMessage
Message becomes VISIBLE again
Consumer B picks it up and processes it
Setting appropriate visibility timeout:
Rule: Visibility Timeout > (Maximum processing time for one message)
If processing takes up to 5 minutes: set visibility timeout to 6-10 minutes
If processing varies, consumer can extend timeout programmatically
// Extend visibility timeout if processing is taking longer than expected
sqsClient.changeMessageVisibility(ChangeMessageVisibilityRequest.builder()
.queueUrl(queueUrl)
.receiptHandle(receiptHandle)
.visibilityTimeout(300) // Extend by another 5 minutes
.build());
Long Polling vs Short Polling
Short Polling (default, wasteful):
Consumer calls ReceiveMessage
SQS checks a subset of servers immediately
Returns 0 results if no messages (even if messages exist on other servers)
Consumer must poll again immediately
Result: Many empty API calls, higher cost, unnecessary latency
Long Polling (recommended):
Consumer calls ReceiveMessage with WaitTimeSeconds=20
SQS waits up to 20 seconds for a message to arrive
Returns as soon as a message is available, or after 20 seconds
Result: Far fewer API calls, lower cost, faster response
@Service
public class SQSMessagePoller {
public void pollMessages() {
ReceiveMessageRequest request = ReceiveMessageRequest.builder()
.queueUrl(queueUrl)
.maxNumberOfMessages(10) // Batch up to 10 messages per call
.waitTimeSeconds(20) // Long polling - wait up to 20 seconds
.visibilityTimeout(300) // 5 minute visibility timeout
.messageAttributeNames("All")
.build();
ReceiveMessageResponse response = sqsClient.receiveMessage(request);
for (Message message : response.messages()) {
processMessage(message);
// Delete after successful processing
sqsClient.deleteMessage(DeleteMessageRequest.builder()
.queueUrl(queueUrl)
.receiptHandle(message.receiptHandle())
.build());
}
}
}SQS Dead Letter Queue Configuration
// Configure DLQ in SQS
@Bean
public Queue orderProcessingQueue() {
// First create the DLQ
CreateQueueResponse dlqResponse = sqsClient.createQueue(
CreateQueueRequest.builder()
.queueName("order-processing-dlq")
.build()
);
String dlqArn = sqsClient.getQueueAttributes(
GetQueueAttributesRequest.builder()
.queueUrl(dlqResponse.queueUrl())
.attributeNames(QueueAttributeName.QUEUE_ARN)
.build()
).attributes().get(QueueAttributeName.QUEUE_ARN);
// Create main queue with redrive policy pointing to DLQ
RedrivePolicy redrivePolicy = RedrivePolicy.builder()
.deadLetterTargetArn(dlqArn)
.maxReceiveCount(3) // Move to DLQ after 3 failed attempts
.build();
CreateQueueResponse mainQueueResponse = sqsClient.createQueue(
CreateQueueRequest.builder()
.queueName("order-processing")
.attributes(Map.of(
QueueAttributeName.REDRIVE_POLICY,
objectMapper.writeValueAsString(redrivePolicy)
))
.build()
);
return ...; // Return your Queue configuration object
}AWS SNS - Simple Notification Service
SNS is a pub/sub messaging service for fan-out patterns.
Core concept: Topics in SNS are for broadcasting. You subscribe HTTP endpoints, SQS queues, Lambda functions, mobile push, SMS, and email to an SNS topic.
The SNS + SQS Fan-out Pattern
This is the standard AWS pattern for event-driven architectures:
Order Service
|
| Publishes ONE message
v
[SNS Topic: order-placed-events]
|
+---+---+---+
| | | |
v v v v
SQS SQS SQS Lambda
(Email) (Analytics) (Warehouse) (Fraud Check)
Each SQS queue can be consumed by its own service at its own pace.
The SNS topic fans out to all subscriptions simultaneously.
SQS provides durability - if the consumer is down, messages wait in the queue.
Why not just SNS to HTTP directly? Because if your service is down when SNS delivers, the message is lost. The SQS buffer provides durability.
SNS Message Filtering
SNS supports filter policies so each SQS queue only receives relevant messages:
// SQS Queue "high-value-orders" subscription filter:
{
"FilterPolicy": {
"orderValue": [{ "numeric": [">=", 1000] }],
"orderStatus": ["PLACED"],
"region": ["us-east-1", "eu-west-1"]
}
}
// Only messages matching ALL conditions are delivered to this subscription.
// Other subscriptions (without filters) receive all messages.4. Redis Streams
What Redis Streams Is
Redis Streams (introduced in Redis 5.0) is an append-only log data structure within Redis. It brings Kafka-like consumer group semantics to Redis.
Key Features
- Append-only log: Messages are added with
XADD, never removed except by explicit trimming - Consumer groups: Multiple consumers can process the stream independently (like Kafka consumer groups)
- Acknowledgment: Consumers must ACK messages, or they remain in the Pending Entries List (PEL)
- Message ID: Format is
timestamp-sequence(e.g.,1685000000000-0) - At-least-once delivery: Unacknowledged messages can be reclaimed
Basic Commands
# Producer: Add message to stream
XADD order-events * orderId "12345" customerId "CUST-001" totalAmount "149.97"
# Returns: "1685000000000-0" (auto-generated message ID: timestamp-sequence)
# Consumer: Read new messages (blocking for up to 5 seconds)
XREAD COUNT 10 BLOCK 5000 STREAMS order-events $
# Create consumer group
XGROUP CREATE order-events email-service-group $ MKSTREAM
# Consumer group read (only unprocessed messages for this group)
XREADGROUP GROUP email-service-group consumer-1 COUNT 10 BLOCK 5000 STREAMS order-events >
# ACK a message (remove from pending list)
XACK order-events email-service-group 1685000000000-0
# View pending (unacknowledged) messages
XPENDING order-events email-service-group - + 10
# Claim abandoned messages (if consumer-1 died, consumer-2 takes over)
XCLAIM order-events email-service-group consumer-2 60000 1685000000000-0Spring Boot Integration
@Service
public class RedisStreamProducer {
private final StringRedisTemplate redisTemplate;
public void publishOrderEvent(OrderEvent event) {
Map<String, String> fields = new LinkedHashMap<>();
fields.put("orderId", event.getOrderId());
fields.put("customerId", event.getCustomerId());
fields.put("eventType", event.getEventType());
fields.put("payload", objectMapper.writeValueAsString(event));
RecordId recordId = redisTemplate.opsForStream()
.add("order-events", fields);
log.info("Published to Redis Stream: {}", recordId);
}
}
@Component
public class RedisStreamConsumer {
@StreamListener
public void consumeOrderEvents() {
// Using Spring Data Redis Stream support
// This handles consumer group management and ACKs automatically
redisTemplate.opsForStream().read(
Consumer.from("email-service-group", "consumer-1"),
StreamReadOptions.empty().count(10).block(Duration.ofSeconds(5)),
StreamOffset.create("order-events", ReadOffset.lastConsumed())
).forEach(record -> {
processEvent(record);
redisTemplate.opsForStream().acknowledge("order-events",
"email-service-group", record.getId());
});
}
}When to Use Redis Streams
Use Redis Streams when:
- You already use Redis and want to avoid adding another broker
- Lower operational overhead is a priority
- Message volumes are moderate (millions per day, not billions)
- Low latency is critical (Redis is in-memory first)
- You need consumer groups with ACK semantics but not Kafka's scale
Do NOT use Redis Streams for:
- Mission-critical data that cannot tolerate any loss (Redis persistence has trade-offs)
- Extremely high throughput (Kafka is purpose-built for this)
- Long retention requirements (Redis memory cost vs Kafka disk cost)
5. Azure Service Bus
Overview
Azure Service Bus is Microsoft's enterprise-grade messaging service in Azure. It supports both queues and topics, with rich features for enterprise integration.
Key features:
- Queues: Standard point-to-point messaging with at-least-once delivery
- Topics and Subscriptions: Pub/Sub with server-side filter rules (SQL-like expressions)
- Sessions: Ordered processing within a session ID (like Kafka partition key)
- Scheduled messages: Deliver a message at a specific future time
- Dead-lettering: Built-in DLQ per queue and per subscription
- Duplicate detection: Built-in 10-minute deduplication window
- Transaction support: Atomic send/receive/complete operations
Pricing tiers:
- Basic: Queues only, simple messages
- Standard: Topics, sessions, duplicate detection
- Premium: Dedicated capacity, higher throughput, VNET isolation
Use Cases
Best fit when:
- You are already in the Azure ecosystem
- You need enterprise messaging features (sessions, scheduled messages, transactions)
- You want fully managed infrastructure with SLA guarantees
- You have .NET/Java enterprise integration requirements
6. Google Cloud Pub/Sub
Overview
Google Cloud Pub/Sub is a fully managed, globally distributed messaging service. It is designed for massive scale with automatic sharding and replication.
Key features:
- Global routing: Messages published in one region can be consumed in another
- At-least-once delivery: Automatic retry until ACK
- Push and pull: Consumers can register a push endpoint (HTTP callback) or pull messages
- Ordered delivery: Optional per-key ordering within a topic
- Exactly-once processing: Available in certain configurations
- Schema support: Avro and Protobuf schema validation
- Filtering: Server-side message filtering per subscription
- Snapshots: Create a point-in-time snapshot for replay
Scale:
- Google uses Pub/Sub internally for systems processing trillions of events per day
- No partition management required - fully automatic scaling
Best fit when:
- You are in the GCP ecosystem
- You want truly serverless, no-ops messaging
- You have global distribution requirements
- You want the simplest possible operational model
7. Comprehensive Comparison Table
| Feature | Apache Kafka | RabbitMQ | AWS SQS | AWS SQS FIFO | Redis Streams | Azure Service Bus |
|---|---|---|---|---|---|---|
| Type | Event streaming | Message broker | Message queue | Message queue | Stream log | Message broker |
| Protocol | Custom TCP | AMQP | HTTP/HTTPS | HTTP/HTTPS | RESP | AMQP 1.0 |
| Throughput | Millions/sec | Tens of thousands/sec | High (unlimited) | 3,000/sec | Very high | Thousands/sec |
| Message retention | Days/weeks (configurable) | Until consumed | 4 days (up to 14) | 4 days (up to 14) | Until trimmed | Until consumed (up to 14 days) |
| Replay | Yes - any offset | No | No | No | Yes - by ID | No |
| Ordering | Per partition (guaranteed) | Per queue (approximate) | Best-effort | Strict FIFO | Per stream | Per session |
| Delivery | At-least-once | At-least-once | At-least-once | Exactly-once | At-least-once | At-least-once |
| Push vs Pull | Pull | Push | Pull (long polling) | Pull | Both | Both |
| Routing | Consumer-side filtering | Exchange bindings | Message filtering | Message group | Consumer groups | SQL-like filter rules |
| Self-hosted | Yes | Yes | No (managed) | No (managed) | Yes | No (managed) |
| Managed cloud | Confluent Cloud, MSK | CloudAMQP | Native AWS | Native AWS | Redis Cloud | Native Azure |
| Schema support | Via Schema Registry | None built-in | None | None | None | Yes (Preview) |
| Best for | Event streaming, analytics, audit log | Complex routing, tasks | AWS-native work queues | Ordered AWS processing | Low-latency, Redis-native | Azure enterprise integration |
8. How to Choose the Right Technology
Decision Framework
Question 1: Do you need to replay historical events?
- YES: Use Kafka (or Azure Event Hubs for Azure)
- NO: Continue to Question 2
Question 2: Is strict message ordering required?
- YES and high-throughput: Kafka (partition key for ordering)
- YES and moderate throughput: SQS FIFO, Azure Service Bus with sessions, RabbitMQ single queue
- NO: Continue to Question 3
Question 3: Are you on a managed cloud with limited ops capacity?
- AWS: Use SQS/SNS
- Azure: Use Azure Service Bus or Event Hubs
- GCP: Use Cloud Pub/Sub
- On-premises or cloud-agnostic: Continue to Question 4
Question 4: Do you need complex routing without consumer-side filtering?
- YES: RabbitMQ (Exchange-based routing is unmatched for flexibility)
- NO: Continue to Question 5
Question 5: Do you need extreme throughput (millions per second)?
- YES: Kafka
- NO: RabbitMQ, SQS, or Redis Streams are all fine
Common Technology Choices by Use Case
| Use Case | Recommended Technology | Reason |
|---|---|---|
| Event sourcing and event replay | Kafka | Long retention, replayable log |
| Microservices integration (AWS shop) | SNS + SQS | Managed, reliable, fan-out |
| Complex routing rules | RabbitMQ | Exchange-based routing |
| Activity stream/audit log | Kafka | Immutable append-only log |
| High-volume IoT ingestion | Kafka | Extreme throughput |
| Background job processing | SQS Standard or RabbitMQ | Simple, reliable, competing consumers |
| Strictly ordered payment processing | SQS FIFO or Kafka (single partition) | Ordering guarantee |
| Cache invalidation broadcast | Redis Pub/Sub or Fanout Exchange | Broadcast, low latency |
| Real-time analytics pipeline | Kafka + Kafka Streams | Stateful stream processing |
| Enterprise integration (.NET) | Azure Service Bus | Native Azure, enterprise features |
| Low-latency in-app events | Redis Streams | In-memory speed |
| Data change capture (CDC) | Kafka + Debezium | Log-based CDC, replay |
The Hybrid Architecture
Most production systems use multiple messaging technologies for different purposes:
Production Architecture Example:
User Service --> [Kafka Topic: user-events]
|
+-----+------+
| |
Email Service Analytics Pipeline
(Kafka Consumer) (Kafka Streams)
Payment Service --> [SQS FIFO: payment-jobs]
|
Payment Worker
(ordered, exactly-once)
Image Upload --> [SQS Standard: image-resize-jobs]
|
+-----+-----+
| | |
Worker1 Worker2 Worker3
(competing consumers)
All Services --> [Redis Pub/Sub: cache-invalidation]
|
All instances receive and invalidate cache
Each technology is doing what it is best at:
- Kafka: event streaming, replay, analytics
- SQS FIFO: ordered payment processing
- SQS Standard: distributed background jobs
- Redis: low-latency cache invalidation broadcast
Previous: Part 2 - Patterns and Architecture
Next: Part 4 - Advanced Concepts
Index: Message Queues Demystified - Index