Message Queues Demystified - Part 3: Technologies Deep Dive

Understanding the "what" is not enough. You must understand the "why" and "how" of each
technology to make informed architectural decisions and hold your own in any technical discussion.

Apache Kafka
RabbitMQ
AWS SQS and SNS
Redis Streams
Azure Service Bus
Google Cloud Pub/Sub
Comprehensive Comparison Table
How to Choose the Right Technology

1. Apache Kafka

What Kafka Really Is

Kafka is fundamentally a distributed, replicated, persistent commit log. This is not just a message queue - it is an append-only ledger of events that consumers can read from at any point, any number of times.

Think of it like a database transaction log, except the log itself IS the system, not a side effect.

Core Architecture

Kafka Cluster (3+ brokers for production)

  Broker 1 (Leader for Partition 0)    Broker 2 (Leader for Partition 1)
  +-----------------------------+       +-----------------------------+
  | Topic: order-events         |       | Topic: order-events         |
  | Partition 0: [msg0, msg1,   |       | Partition 1: [msg3, msg4,   |
  |               msg2, ...]    |       |               msg5, ...]    |
  | Partition 2 Replica         |       | Partition 0 Replica         |
  +-----------------------------+       +-----------------------------+

  Broker 3 (Leader for Partition 2)
  +-----------------------------+
  | Topic: order-events         |
  | Partition 2: [msg6, msg7,   |
  |               msg8, ...]    |
  | Partition 1 Replica         |
  +-----------------------------+

ZooKeeper/KRaft: Cluster coordination, leader election, metadata

Topics and Partitions - The Core Concepts

Topic: A logical category of messages. Think of it as a "table name" in the Kafka world.

Partition: A topic is split into partitions. Each partition is a physically separate, ordered, immutable log file stored on disk.

Topic: order-events
|
+-- Partition 0: offset[0]=OrderPlaced(101), offset[1]=OrderPlaced(102), offset[2]=OrderPaid(101)
+-- Partition 1: offset[0]=OrderPlaced(103), offset[1]=OrderShipped(101), offset[2]=OrderPlaced(104)
+-- Partition 2: offset[0]=OrderPaid(102), offset[1]=OrderShipped(102), offset[2]=OrderPaid(103)

Why partitions matter:

Parallelism: Each partition can be consumed by one consumer in a consumer group. 3 partitions = maximum 3 consumers working in parallel.
Ordering: Messages within one partition are strictly ordered. Messages across partitions have no ordering guarantee.
Throughput: More partitions = more parallelism = higher aggregate throughput.

Partition assignment by key:

// All events for orderId "12345" go to the same partition
// This guarantees ordering for that specific order
kafkaTemplate.send("order-events",
    orderId.toString(),  // Partition key - determines which partition
    event
);
 
// Kafka computes: partition = hash(key) % numPartitions
// Same key ALWAYS goes to same partition (unless partition count changes)

Offsets - The Bookmark System

Every message in a partition has a unique, monotonically increasing number called an offset.

Partition 0:
Offset:  0          1          2          3          4
Message: [msg-A]   [msg-B]   [msg-C]   [msg-D]   [msg-E]
          ^                              ^
          |                              |
     Consumer A                   Consumer A's current
     read to here                 committed offset

Consumers track their own offset (called committed offset). This means:

A consumer can re-read from any historical offset (replay)
If a consumer crashes, it restarts from the last committed offset
Different consumer groups have completely independent offsets for the same partition

Auto-commit vs Manual commit:

// Auto-commit (NOT recommended for production)
@KafkaListener(topics = "orders")
public void handleOrder(OrderEvent event) {
    // Offset is auto-committed periodically, regardless of whether processing succeeded
    // If processing fails AFTER auto-commit, message is lost
    processOrder(event);
}
 
// Manual commit (recommended)
@KafkaListener(topics = "orders")
public void handleOrder(OrderEvent event, Acknowledgment ack) {
    try {
        processOrder(event);
        ack.acknowledge();  // Commit offset ONLY after successful processing
    } catch (Exception e) {
        // Do NOT acknowledge - message will be reprocessed
        throw e;
    }
}

Consumer Groups - The Scaling Mechanism

A consumer group is a set of consumers that together consume all partitions of a topic. Kafka ensures:

Each partition is assigned to exactly one consumer in the group
If a consumer joins or leaves, partitions are rebalanced

Topic: order-events (6 partitions)
Consumer Group: "order-processing-group" (3 consumers)

Partition 0 --> Consumer A
Partition 1 --> Consumer A
Partition 2 --> Consumer B
Partition 3 --> Consumer B
Partition 4 --> Consumer C
Partition 5 --> Consumer C

Scale up to 6 consumers:
Partition 0 --> Consumer A (one partition each)
Partition 1 --> Consumer B
...
Partition 5 --> Consumer F

Scale up to 7 consumers:
Partitions 0-5 assigned to Consumers A-F
Consumer G is IDLE - no partition to consume (more consumers than partitions = waste)

Key insight: The maximum parallelism of a consumer group equals the number of partitions. You cannot have more active consumers than partitions.

Multiple consumer groups get all messages independently:

Topic: order-events
Consumer Group "email-service":     gets all messages (for email processing)
Consumer Group "warehouse-service": gets all messages (for warehouse)
Consumer Group "analytics-service": gets all messages (for analytics)

Each group has its own independent offsets. Adding a new group does not affect existing groups.

Replication and Durability

Every Kafka partition has one leader and zero or more replicas (followers).

All reads and writes go to the leader
Replicas passively replicate the leader's log
If the leader broker fails, one of the in-sync replicas becomes the new leader

ISR (In-Sync Replicas): Replicas that are caught up with the leader within a configurable lag threshold. Only ISR members are eligible to become leader.

Producer acknowledgment settings (acks):

Setting	Meaning	Durability	Latency	Risk
`acks=0`	Fire and forget, no acknowledgment	Lowest	Lowest	Message loss if broker crashes
`acks=1`	Leader acknowledges write	Medium	Medium	Message loss if leader crashes before replica replicates
`acks=all`	All ISR must acknowledge	Highest	Highest	No message loss as long as at least one replica survives

For financial, payment, or critical data: always use acks=all.

@Configuration
public class KafkaProducerConfig {
 
    @Bean
    public ProducerFactory<String, Object> producerFactory() {
        Map<String, Object> config = new HashMap<>();
        config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka:9092");
        config.put(ProducerConfig.ACKS_CONFIG, "all");              // Wait for all ISR
        config.put(ProducerConfig.RETRIES_CONFIG, Integer.MAX_VALUE);
        config.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 1); // Preserve ordering during retries
        config.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true); // Prevent duplicate messages
        config.put(ProducerConfig.LINGER_MS_CONFIG, 5);             // Batch for 5ms
        config.put(ProducerConfig.BATCH_SIZE_CONFIG, 32768);        // 32KB batch
        config.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
        return new DefaultKafkaProducerFactory<>(config);
    }
}

Log Compaction

Kafka supports two retention policies:

1. Time-based retention (default):

Messages are deleted after a configurable time period (e.g., 7 days)
Regardless of whether consumers have read them

2. Log compaction:

For each unique message key, only the LATEST message is kept
Older messages with the same key are deleted during compaction

Before compaction:
Key=user-123: [email=old@example.com] [email=temp@example.com] [email=new@example.com]
Key=user-456: [email=bob@example.com]

After compaction:
Key=user-123: [email=new@example.com]  (latest value only)
Key=user-456: [email=bob@example.com]

Use log compaction for:

User profile updates (latest state of each user)
Configuration data (latest config per key)
Any "update" pattern where you only need the current state per entity

KRaft - Replacing ZooKeeper

Historically, Kafka used ZooKeeper for cluster coordination (leader election, metadata management). This was a pain point because it meant running a separate ZooKeeper cluster.

KRaft (Kafka Raft) - introduced in Kafka 2.8, production-ready in Kafka 3.3 - replaces ZooKeeper entirely. Kafka now manages its own consensus using the Raft protocol, stored in a special metadata topic.

Benefits of KRaft:

Simpler deployment (one fewer system to manage)
Faster metadata operations
Supports more partitions per cluster (from ~200K to millions)
Better recovery from controller failures

Kafka Configuration Quick Reference

# Producer - balanced for reliability and throughput
acks=all
retries=2147483647
max.in.flight.requests.per.connection=5
enable.idempotence=true
compression.type=lz4
linger.ms=5
batch.size=65536
buffer.memory=33554432
 
# Consumer - manual commit for reliability
enable.auto.commit=false
auto.offset.reset=earliest
max.poll.records=500
fetch.min.bytes=1
fetch.max.wait.ms=500
session.timeout.ms=30000
heartbeat.interval.ms=10000
 
# Topic configuration
retention.ms=604800000      # 7 days
replication.factor=3
min.insync.replicas=2       # At least 2 ISR must acknowledge writes

2. RabbitMQ

What RabbitMQ Really Is

RabbitMQ is a message broker implementing the AMQP (Advanced Message Queuing Protocol). It is a push-based system with sophisticated routing capabilities. The key abstraction is the Exchange-Queue-Binding model, which gives you powerful message routing without consumer-side filtering.

Core Architecture

Producer --> Exchange --> (Binding Rules) --> Queue --> Consumer

Producer never writes directly to a Queue.
The Exchange is a routing agent.
Bindings define the rules for routing.

Exchange Types - Deep Dive

Direct Exchange

Routes messages to queues whose binding key exactly matches the routing key.

Routing key: "order.created"
                |
          [Direct Exchange]
                |
   binding: "order.created" --> [Queue: order-processing]
   binding: "order.shipped" --> [Queue: shipping-notifications] (no match - not delivered)
   binding: "payment.done"  --> [Queue: payment-audit]          (no match - not delivered)

Use case: Routing specific event types to specific queues.

Fanout Exchange

Routes to ALL bound queues, completely ignoring the routing key.

[Fanout Exchange: broadcast]
        |
   +----|----+
   |    |    |
   v    v    v
Queue  Queue  Queue
(A)   (B)   (C)
All three queues receive every message.

Use case: Broadcasting notifications, cache invalidation, system-wide events.

Topic Exchange (Most Flexible)

Routes based on pattern matching using wildcards.

Routing keys and which queues they match:

"order.us-east.placed" -->
  * Queue "us-east-orders" (binding: "order.us-east.*")    MATCH
  * Queue "all-orders"     (binding: "order.#")             MATCH
  * Queue "placed-events"  (binding: "#.placed")            MATCH
  * Queue "payments"       (binding: "payment.#")           NO MATCH

"payment.eu-west.failed" -->
  * Queue "payment-alerts" (binding: "payment.*.failed")    MATCH
  * Queue "eu-west-events" (binding: "*.eu-west.#")         MATCH
  * Queue "all-events"     (binding: "#")                   MATCH

Pattern syntax:

* matches exactly one word (segment between dots)
# matches zero or more words

Headers Exchange

Routes based on message headers instead of routing key. More flexible but slower than other types.

// Producer - sets headers
MessageProperties props = new MessageProperties();
props.setHeader("region", "us-east");
props.setHeader("priority", "high");
props.setHeader("x-match", "all");  // "all" = AND logic, "any" = OR logic
 
// Consumer binding - match specific header values
@RabbitListener(bindings = @QueueBinding(
    value = @Queue("high-priority-us-east"),
    exchange = @Exchange(value = "headers-exchange", type = ExchangeTypes.HEADERS),
    arguments = {
        @Argument(name = "region", value = "us-east"),
        @Argument(name = "priority", value = "high"),
        @Argument(name = "x-match", value = "all")
    }
))
public void handleHighPriorityUsEastMessage(MyMessage message) { ... }

Queue Properties and Arguments

@Bean
public Queue orderProcessingQueue() {
    return QueueBuilder.durable("order-processing")
        // Dead letter configuration
        .withArgument("x-dead-letter-exchange", "dlx")
        .withArgument("x-dead-letter-routing-key", "order-processing.dead")
 
        // Message expiry
        .withArgument("x-message-ttl", 300000)           // Messages expire after 5 minutes
 
        // Queue depth limit - reject new messages when queue is full
        .withArgument("x-max-length", 100000)
        .withArgument("x-overflow", "reject-publish")    // Or "drop-head" to drop oldest
 
        // Priority queue support
        .withArgument("x-max-priority", 10)
 
        // Quorum queue (replicated, persistent - recommended for production)
        .withArgument("x-queue-type", "quorum")
 
        .build();
}

Queue types comparison:

Type	Persistence	Replication	Performance	Use Case
Classic	Optional (durable flag)	Mirrored policies (deprecated)	High	Simple use cases, backwards compat
Quorum	Always persistent	Raft-based, automatic	Medium-High	Production workloads, reliability
Stream	Always persistent	Log-based (like Kafka)	Very High	Event streaming, replay needed

Use Quorum Queues in production. Classic mirrored queues are deprecated.

Consumer Prefetch (QoS)

Prefetch controls how many unacknowledged messages a consumer can hold at once.

// Without prefetch:
// Broker delivers ALL queued messages to the first available consumer
// One fast consumer gets all work; slow consumer gets nothing
// If fast consumer crashes, hundreds of messages must be redelivered
 
// With prefetch=10:
// Each consumer receives at most 10 unacknowledged messages
// Work is distributed fairly
// Crash recovery involves at most 10 messages
 
@Configuration
public class RabbitMQListenerConfig {
 
    @Bean
    public SimpleRabbitListenerContainerFactory rabbitListenerContainerFactory(
            ConnectionFactory connectionFactory) {
 
        SimpleRabbitListenerContainerFactory factory = new SimpleRabbitListenerContainerFactory();
        factory.setConnectionFactory(connectionFactory);
        factory.setPrefetchCount(10);           // Hold at most 10 unACKed messages
        factory.setAcknowledgeMode(AcknowledgeMode.MANUAL);
        factory.setConcurrentConsumers(3);      // 3 consumer threads
        factory.setMaxConcurrentConsumers(10);  // Scale up to 10 under load
 
        return factory;
    }
}

Prefetch tuning guidelines:

prefetch=1 - Maximum fairness, minimum throughput (one by one)
prefetch=10-50 - Good balance for most workloads
prefetch=100+ - High throughput, less fairness, larger redelivery batch on crash

Acknowledgment Modes

@Component
public class OrderConsumer {
 
    @RabbitListener(queues = "orders")
    public void processOrder(Order order, Channel channel, Message message) throws IOException {
        long deliveryTag = message.getMessageProperties().getDeliveryTag();
 
        try {
            orderService.process(order);
 
            // ACK: message successfully processed, delete from queue
            channel.basicAck(deliveryTag, false);
 
        } catch (BusinessLogicException e) {
            // NACK + requeue=false: permanent failure, send to DLQ
            channel.basicNack(deliveryTag, false, false);
 
        } catch (TransientException e) {
            // NACK + requeue=true: temporary failure, try again
            // WARNING: Be careful with this - can create infinite loops
            // Always pair with max retry logic
            channel.basicNack(deliveryTag, false, true);
        }
    }
}

3. AWS SQS and SNS

AWS SQS - Simple Queue Service

SQS is a fully managed, serverless message queue service. There is no broker to manage, no cluster to provision.

Standard Queue vs FIFO Queue

Feature	Standard Queue	FIFO Queue
Throughput	Unlimited	300 TPS (3,000 with batching)
Ordering	Best-effort (approximate)	Strict FIFO within a message group
Delivery	At-least-once (occasional duplicates)	Exactly-once processing
Deduplication	Not built-in	Built-in (5-minute dedup window)
Price	Lower	Higher
Use case	High throughput, order not critical	Payment processing, order sequencing

Visibility Timeout - The Most Important SQS Concept

When a consumer receives a message from SQS, the message is NOT deleted. Instead, it becomes invisible to other consumers for the visibility timeout period.

Message arrives in queue
        |
Consumer A calls ReceiveMessage
Message becomes invisible for 30 seconds (visibility timeout)
        |
Two possible outcomes:

Outcome 1 - Success:
Consumer A processes successfully
Consumer A calls DeleteMessage
Message is permanently deleted
(within the 30 second window)

Outcome 2 - Consumer crashes or processing takes too long:
30 seconds pass without DeleteMessage
Message becomes VISIBLE again
Consumer B picks it up and processes it

Setting appropriate visibility timeout:

Rule: Visibility Timeout > (Maximum processing time for one message)

If processing takes up to 5 minutes: set visibility timeout to 6-10 minutes
If processing varies, consumer can extend timeout programmatically

// Extend visibility timeout if processing is taking longer than expected
sqsClient.changeMessageVisibility(ChangeMessageVisibilityRequest.builder()
    .queueUrl(queueUrl)
    .receiptHandle(receiptHandle)
    .visibilityTimeout(300)  // Extend by another 5 minutes
    .build());

Long Polling vs Short Polling

Short Polling (default, wasteful):
Consumer calls ReceiveMessage
SQS checks a subset of servers immediately
Returns 0 results if no messages (even if messages exist on other servers)
Consumer must poll again immediately
Result: Many empty API calls, higher cost, unnecessary latency

Long Polling (recommended):
Consumer calls ReceiveMessage with WaitTimeSeconds=20
SQS waits up to 20 seconds for a message to arrive
Returns as soon as a message is available, or after 20 seconds
Result: Far fewer API calls, lower cost, faster response

@Service
public class SQSMessagePoller {
 
    public void pollMessages() {
        ReceiveMessageRequest request = ReceiveMessageRequest.builder()
            .queueUrl(queueUrl)
            .maxNumberOfMessages(10)       // Batch up to 10 messages per call
            .waitTimeSeconds(20)           // Long polling - wait up to 20 seconds
            .visibilityTimeout(300)        // 5 minute visibility timeout
            .messageAttributeNames("All")
            .build();
 
        ReceiveMessageResponse response = sqsClient.receiveMessage(request);
 
        for (Message message : response.messages()) {
            processMessage(message);
 
            // Delete after successful processing
            sqsClient.deleteMessage(DeleteMessageRequest.builder()
                .queueUrl(queueUrl)
                .receiptHandle(message.receiptHandle())
                .build());
        }
    }
}

SQS Dead Letter Queue Configuration

// Configure DLQ in SQS
@Bean
public Queue orderProcessingQueue() {
    // First create the DLQ
    CreateQueueResponse dlqResponse = sqsClient.createQueue(
        CreateQueueRequest.builder()
            .queueName("order-processing-dlq")
            .build()
    );
 
    String dlqArn = sqsClient.getQueueAttributes(
        GetQueueAttributesRequest.builder()
            .queueUrl(dlqResponse.queueUrl())
            .attributeNames(QueueAttributeName.QUEUE_ARN)
            .build()
    ).attributes().get(QueueAttributeName.QUEUE_ARN);
 
    // Create main queue with redrive policy pointing to DLQ
    RedrivePolicy redrivePolicy = RedrivePolicy.builder()
        .deadLetterTargetArn(dlqArn)
        .maxReceiveCount(3)  // Move to DLQ after 3 failed attempts
        .build();
 
    CreateQueueResponse mainQueueResponse = sqsClient.createQueue(
        CreateQueueRequest.builder()
            .queueName("order-processing")
            .attributes(Map.of(
                QueueAttributeName.REDRIVE_POLICY,
                objectMapper.writeValueAsString(redrivePolicy)
            ))
            .build()
    );
 
    return ...; // Return your Queue configuration object
}

AWS SNS - Simple Notification Service

SNS is a pub/sub messaging service for fan-out patterns.

Core concept: Topics in SNS are for broadcasting. You subscribe HTTP endpoints, SQS queues, Lambda functions, mobile push, SMS, and email to an SNS topic.

The SNS + SQS Fan-out Pattern

This is the standard AWS pattern for event-driven architectures:

Order Service
      |
      | Publishes ONE message
      v
[SNS Topic: order-placed-events]
      |
  +---+---+---+
  |   |   |   |
  v   v   v   v
SQS  SQS  SQS  Lambda
(Email) (Analytics) (Warehouse) (Fraud Check)

Each SQS queue can be consumed by its own service at its own pace.
The SNS topic fans out to all subscriptions simultaneously.
SQS provides durability - if the consumer is down, messages wait in the queue.

Why not just SNS to HTTP directly? Because if your service is down when SNS delivers, the message is lost. The SQS buffer provides durability.

SNS supports filter policies so each SQS queue only receives relevant messages:

// SQS Queue "high-value-orders" subscription filter:
{
  "FilterPolicy": {
    "orderValue": [{ "numeric": [">=", 1000] }],
    "orderStatus": ["PLACED"],
    "region": ["us-east-1", "eu-west-1"]
  }
}
 
// Only messages matching ALL conditions are delivered to this subscription.
// Other subscriptions (without filters) receive all messages.

4. Redis Streams

What Redis Streams Is

Redis Streams (introduced in Redis 5.0) is an append-only log data structure within Redis. It brings Kafka-like consumer group semantics to Redis.

Key Features

Append-only log: Messages are added with XADD, never removed except by explicit trimming
Consumer groups: Multiple consumers can process the stream independently (like Kafka consumer groups)
Acknowledgment: Consumers must ACK messages, or they remain in the Pending Entries List (PEL)
Message ID: Format is timestamp-sequence (e.g., 1685000000000-0)
At-least-once delivery: Unacknowledged messages can be reclaimed

Basic Commands

# Producer: Add message to stream
XADD order-events * orderId "12345" customerId "CUST-001" totalAmount "149.97"
# Returns: "1685000000000-0" (auto-generated message ID: timestamp-sequence)
 
# Consumer: Read new messages (blocking for up to 5 seconds)
XREAD COUNT 10 BLOCK 5000 STREAMS order-events $
 
# Create consumer group
XGROUP CREATE order-events email-service-group $ MKSTREAM
 
# Consumer group read (only unprocessed messages for this group)
XREADGROUP GROUP email-service-group consumer-1 COUNT 10 BLOCK 5000 STREAMS order-events >
 
# ACK a message (remove from pending list)
XACK order-events email-service-group 1685000000000-0
 
# View pending (unacknowledged) messages
XPENDING order-events email-service-group - + 10
 
# Claim abandoned messages (if consumer-1 died, consumer-2 takes over)
XCLAIM order-events email-service-group consumer-2 60000 1685000000000-0

Spring Boot Integration

@Service
public class RedisStreamProducer {
 
    private final StringRedisTemplate redisTemplate;
 
    public void publishOrderEvent(OrderEvent event) {
        Map<String, String> fields = new LinkedHashMap<>();
        fields.put("orderId", event.getOrderId());
        fields.put("customerId", event.getCustomerId());
        fields.put("eventType", event.getEventType());
        fields.put("payload", objectMapper.writeValueAsString(event));
 
        RecordId recordId = redisTemplate.opsForStream()
            .add("order-events", fields);
 
        log.info("Published to Redis Stream: {}", recordId);
    }
}
 
@Component
public class RedisStreamConsumer {
 
    @StreamListener
    public void consumeOrderEvents() {
        // Using Spring Data Redis Stream support
        // This handles consumer group management and ACKs automatically
        redisTemplate.opsForStream().read(
            Consumer.from("email-service-group", "consumer-1"),
            StreamReadOptions.empty().count(10).block(Duration.ofSeconds(5)),
            StreamOffset.create("order-events", ReadOffset.lastConsumed())
        ).forEach(record -> {
            processEvent(record);
            redisTemplate.opsForStream().acknowledge("order-events",
                "email-service-group", record.getId());
        });
    }
}

When to Use Redis Streams

Use Redis Streams when:

You already use Redis and want to avoid adding another broker
Lower operational overhead is a priority
Message volumes are moderate (millions per day, not billions)
Low latency is critical (Redis is in-memory first)
You need consumer groups with ACK semantics but not Kafka's scale

Do NOT use Redis Streams for:

Mission-critical data that cannot tolerate any loss (Redis persistence has trade-offs)
Extremely high throughput (Kafka is purpose-built for this)
Long retention requirements (Redis memory cost vs Kafka disk cost)

5. Azure Service Bus

Overview

Azure Service Bus is Microsoft's enterprise-grade messaging service in Azure. It supports both queues and topics, with rich features for enterprise integration.

Key features:

Queues: Standard point-to-point messaging with at-least-once delivery
Topics and Subscriptions: Pub/Sub with server-side filter rules (SQL-like expressions)
Sessions: Ordered processing within a session ID (like Kafka partition key)
Scheduled messages: Deliver a message at a specific future time
Dead-lettering: Built-in DLQ per queue and per subscription
Duplicate detection: Built-in 10-minute deduplication window
Transaction support: Atomic send/receive/complete operations

Pricing tiers:

Basic: Queues only, simple messages
Standard: Topics, sessions, duplicate detection
Premium: Dedicated capacity, higher throughput, VNET isolation

Use Cases

Best fit when:

You are already in the Azure ecosystem
You need enterprise messaging features (sessions, scheduled messages, transactions)
You want fully managed infrastructure with SLA guarantees
You have .NET/Java enterprise integration requirements

6. Google Cloud Pub/Sub

Overview

Google Cloud Pub/Sub is a fully managed, globally distributed messaging service. It is designed for massive scale with automatic sharding and replication.

Key features:

Global routing: Messages published in one region can be consumed in another
At-least-once delivery: Automatic retry until ACK
Push and pull: Consumers can register a push endpoint (HTTP callback) or pull messages
Ordered delivery: Optional per-key ordering within a topic
Exactly-once processing: Available in certain configurations
Schema support: Avro and Protobuf schema validation
Filtering: Server-side message filtering per subscription
Snapshots: Create a point-in-time snapshot for replay

Scale:

Google uses Pub/Sub internally for systems processing trillions of events per day
No partition management required - fully automatic scaling

Best fit when:

You are in the GCP ecosystem
You want truly serverless, no-ops messaging
You have global distribution requirements
You want the simplest possible operational model

7. Comprehensive Comparison Table

Feature	Apache Kafka	RabbitMQ	AWS SQS	AWS SQS FIFO	Redis Streams	Azure Service Bus
Type	Event streaming	Message broker	Message queue	Message queue	Stream log	Message broker
Protocol	Custom TCP	AMQP	HTTP/HTTPS	HTTP/HTTPS	RESP	AMQP 1.0
Throughput	Millions/sec	Tens of thousands/sec	High (unlimited)	3,000/sec	Very high	Thousands/sec
Message retention	Days/weeks (configurable)	Until consumed	4 days (up to 14)	4 days (up to 14)	Until trimmed	Until consumed (up to 14 days)
Replay	Yes - any offset	No	No	No	Yes - by ID	No
Ordering	Per partition (guaranteed)	Per queue (approximate)	Best-effort	Strict FIFO	Per stream	Per session
Delivery	At-least-once	At-least-once	At-least-once	Exactly-once	At-least-once	At-least-once
Push vs Pull	Pull	Push	Pull (long polling)	Pull	Both	Both
Routing	Consumer-side filtering	Exchange bindings	Message filtering	Message group	Consumer groups	SQL-like filter rules
Self-hosted	Yes	Yes	No (managed)	No (managed)	Yes	No (managed)
Managed cloud	Confluent Cloud, MSK	CloudAMQP	Native AWS	Native AWS	Redis Cloud	Native Azure
Schema support	Via Schema Registry	None built-in	None	None	None	Yes (Preview)
Best for	Event streaming, analytics, audit log	Complex routing, tasks	AWS-native work queues	Ordered AWS processing	Low-latency, Redis-native	Azure enterprise integration

8. How to Choose the Right Technology

Decision Framework

Question 1: Do you need to replay historical events?

YES: Use Kafka (or Azure Event Hubs for Azure)
NO: Continue to Question 2

Question 2: Is strict message ordering required?

YES and high-throughput: Kafka (partition key for ordering)
YES and moderate throughput: SQS FIFO, Azure Service Bus with sessions, RabbitMQ single queue
NO: Continue to Question 3

Question 3: Are you on a managed cloud with limited ops capacity?

AWS: Use SQS/SNS
Azure: Use Azure Service Bus or Event Hubs
GCP: Use Cloud Pub/Sub
On-premises or cloud-agnostic: Continue to Question 4

Question 4: Do you need complex routing without consumer-side filtering?

YES: RabbitMQ (Exchange-based routing is unmatched for flexibility)
NO: Continue to Question 5

Question 5: Do you need extreme throughput (millions per second)?

YES: Kafka
NO: RabbitMQ, SQS, or Redis Streams are all fine

Common Technology Choices by Use Case

Use Case	Recommended Technology	Reason
Event sourcing and event replay	Kafka	Long retention, replayable log
Microservices integration (AWS shop)	SNS + SQS	Managed, reliable, fan-out
Complex routing rules	RabbitMQ	Exchange-based routing
Activity stream/audit log	Kafka	Immutable append-only log
High-volume IoT ingestion	Kafka	Extreme throughput
Background job processing	SQS Standard or RabbitMQ	Simple, reliable, competing consumers
Strictly ordered payment processing	SQS FIFO or Kafka (single partition)	Ordering guarantee
Cache invalidation broadcast	Redis Pub/Sub or Fanout Exchange	Broadcast, low latency
Real-time analytics pipeline	Kafka + Kafka Streams	Stateful stream processing
Enterprise integration (.NET)	Azure Service Bus	Native Azure, enterprise features
Low-latency in-app events	Redis Streams	In-memory speed
Data change capture (CDC)	Kafka + Debezium	Log-based CDC, replay

The Hybrid Architecture

Most production systems use multiple messaging technologies for different purposes:

Production Architecture Example:

User Service -->  [Kafka Topic: user-events]
                  |
            +-----+------+
            |            |
  Email Service     Analytics Pipeline
  (Kafka Consumer)  (Kafka Streams)

Payment Service --> [SQS FIFO: payment-jobs]
                    |
              Payment Worker
              (ordered, exactly-once)

Image Upload --> [SQS Standard: image-resize-jobs]
                 |
           +-----+-----+
           |     |     |
         Worker1 Worker2 Worker3
         (competing consumers)

All Services --> [Redis Pub/Sub: cache-invalidation]
                 |
         All instances receive and invalidate cache

Each technology is doing what it is best at:

Kafka: event streaming, replay, analytics
SQS FIFO: ordered payment processing
SQS Standard: distributed background jobs
Redis: low-latency cache invalidation broadcast

Previous: Part 2 - Patterns and Architecture
Next: Part 4 - Advanced Concepts
Index: Message Queues Demystified - Index

Series: Message Queues Demystified

Message Queues Demystified - Part 3: Technologies Deep Dive

Table of Contents

1. Apache Kafka

What Kafka Really Is

Core Architecture

Topics and Partitions - The Core Concepts

Offsets - The Bookmark System

Consumer Groups - The Scaling Mechanism

Replication and Durability

Log Compaction

KRaft - Replacing ZooKeeper

Kafka Configuration Quick Reference

2. RabbitMQ

What RabbitMQ Really Is

Core Architecture

Exchange Types - Deep Dive

Direct Exchange

Fanout Exchange

Topic Exchange (Most Flexible)

Headers Exchange

Queue Properties and Arguments

Consumer Prefetch (QoS)

Acknowledgment Modes

3. AWS SQS and SNS

AWS SQS - Simple Queue Service

Standard Queue vs FIFO Queue

Visibility Timeout - The Most Important SQS Concept

Long Polling vs Short Polling

SQS Dead Letter Queue Configuration

AWS SNS - Simple Notification Service

The SNS + SQS Fan-out Pattern

SNS Message Filtering

4. Redis Streams

What Redis Streams Is

Key Features

Basic Commands

Spring Boot Integration

When to Use Redis Streams

5. Azure Service Bus

Overview

Use Cases

6. Google Cloud Pub/Sub

Overview

7. Comprehensive Comparison Table

8. How to Choose the Right Technology

Decision Framework

Common Technology Choices by Use Case

The Hybrid Architecture