← Back to Articles
6/6/2026Admin Post

message queues part3 technologies deep dive

Message Queues Demystified - Part 3: Technologies Deep Dive

Understanding the "what" is not enough. You must understand the "why" and "how" of each
technology to make informed architectural decisions and hold your own in any technical discussion.


Table of Contents

  1. Apache Kafka
  2. RabbitMQ
  3. AWS SQS and SNS
  4. Redis Streams
  5. Azure Service Bus
  6. Google Cloud Pub/Sub
  7. Comprehensive Comparison Table
  8. How to Choose the Right Technology

1. Apache Kafka

What Kafka Really Is

Kafka is fundamentally a distributed, replicated, persistent commit log. This is not just a message queue - it is an append-only ledger of events that consumers can read from at any point, any number of times.

Think of it like a database transaction log, except the log itself IS the system, not a side effect.

Core Architecture

Kafka Cluster (3+ brokers for production)

  Broker 1 (Leader for Partition 0)    Broker 2 (Leader for Partition 1)
  +-----------------------------+       +-----------------------------+
  | Topic: order-events         |       | Topic: order-events         |
  | Partition 0: [msg0, msg1,   |       | Partition 1: [msg3, msg4,   |
  |               msg2, ...]    |       |               msg5, ...]    |
  | Partition 2 Replica         |       | Partition 0 Replica         |
  +-----------------------------+       +-----------------------------+

  Broker 3 (Leader for Partition 2)
  +-----------------------------+
  | Topic: order-events         |
  | Partition 2: [msg6, msg7,   |
  |               msg8, ...]    |
  | Partition 1 Replica         |
  +-----------------------------+

ZooKeeper/KRaft: Cluster coordination, leader election, metadata

Topics and Partitions - The Core Concepts

Topic: A logical category of messages. Think of it as a "table name" in the Kafka world.

Partition: A topic is split into partitions. Each partition is a physically separate, ordered, immutable log file stored on disk.

Topic: order-events
|
+-- Partition 0: offset[0]=OrderPlaced(101), offset[1]=OrderPlaced(102), offset[2]=OrderPaid(101)
+-- Partition 1: offset[0]=OrderPlaced(103), offset[1]=OrderShipped(101), offset[2]=OrderPlaced(104)
+-- Partition 2: offset[0]=OrderPaid(102), offset[1]=OrderShipped(102), offset[2]=OrderPaid(103)

Why partitions matter:

  • Parallelism: Each partition can be consumed by one consumer in a consumer group. 3 partitions = maximum 3 consumers working in parallel.
  • Ordering: Messages within one partition are strictly ordered. Messages across partitions have no ordering guarantee.
  • Throughput: More partitions = more parallelism = higher aggregate throughput.

Partition assignment by key:

// All events for orderId "12345" go to the same partition
// This guarantees ordering for that specific order
kafkaTemplate.send("order-events",
    orderId.toString(),  // Partition key - determines which partition
    event
);
 
// Kafka computes: partition = hash(key) % numPartitions
// Same key ALWAYS goes to same partition (unless partition count changes)

Offsets - The Bookmark System

Every message in a partition has a unique, monotonically increasing number called an offset.

Partition 0:
Offset:  0          1          2          3          4
Message: [msg-A]   [msg-B]   [msg-C]   [msg-D]   [msg-E]
          ^                              ^
          |                              |
     Consumer A                   Consumer A's current
     read to here                 committed offset

Consumers track their own offset (called committed offset). This means:

  • A consumer can re-read from any historical offset (replay)
  • If a consumer crashes, it restarts from the last committed offset
  • Different consumer groups have completely independent offsets for the same partition

Auto-commit vs Manual commit:

// Auto-commit (NOT recommended for production)
@KafkaListener(topics = "orders")
public void handleOrder(OrderEvent event) {
    // Offset is auto-committed periodically, regardless of whether processing succeeded
    // If processing fails AFTER auto-commit, message is lost
    processOrder(event);
}
 
// Manual commit (recommended)
@KafkaListener(topics = "orders")
public void handleOrder(OrderEvent event, Acknowledgment ack) {
    try {
        processOrder(event);
        ack.acknowledge();  // Commit offset ONLY after successful processing
    } catch (Exception e) {
        // Do NOT acknowledge - message will be reprocessed
        throw e;
    }
}

Consumer Groups - The Scaling Mechanism

A consumer group is a set of consumers that together consume all partitions of a topic. Kafka ensures:

  • Each partition is assigned to exactly one consumer in the group
  • If a consumer joins or leaves, partitions are rebalanced
Topic: order-events (6 partitions)
Consumer Group: "order-processing-group" (3 consumers)

Partition 0 --> Consumer A
Partition 1 --> Consumer A
Partition 2 --> Consumer B
Partition 3 --> Consumer B
Partition 4 --> Consumer C
Partition 5 --> Consumer C

Scale up to 6 consumers:
Partition 0 --> Consumer A (one partition each)
Partition 1 --> Consumer B
...
Partition 5 --> Consumer F

Scale up to 7 consumers:
Partitions 0-5 assigned to Consumers A-F
Consumer G is IDLE - no partition to consume (more consumers than partitions = waste)

Key insight: The maximum parallelism of a consumer group equals the number of partitions. You cannot have more active consumers than partitions.

Multiple consumer groups get all messages independently:

Topic: order-events
Consumer Group "email-service":     gets all messages (for email processing)
Consumer Group "warehouse-service": gets all messages (for warehouse)
Consumer Group "analytics-service": gets all messages (for analytics)

Each group has its own independent offsets. Adding a new group does not affect existing groups.

Replication and Durability

Every Kafka partition has one leader and zero or more replicas (followers).

  • All reads and writes go to the leader
  • Replicas passively replicate the leader's log
  • If the leader broker fails, one of the in-sync replicas becomes the new leader

ISR (In-Sync Replicas): Replicas that are caught up with the leader within a configurable lag threshold. Only ISR members are eligible to become leader.

Producer acknowledgment settings (acks):

SettingMeaningDurabilityLatencyRisk
acks=0Fire and forget, no acknowledgmentLowestLowestMessage loss if broker crashes
acks=1Leader acknowledges writeMediumMediumMessage loss if leader crashes before replica replicates
acks=allAll ISR must acknowledgeHighestHighestNo message loss as long as at least one replica survives

For financial, payment, or critical data: always use acks=all.

@Configuration
public class KafkaProducerConfig {
 
    @Bean
    public ProducerFactory<String, Object> producerFactory() {
        Map<String, Object> config = new HashMap<>();
        config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka:9092");
        config.put(ProducerConfig.ACKS_CONFIG, "all");              // Wait for all ISR
        config.put(ProducerConfig.RETRIES_CONFIG, Integer.MAX_VALUE);
        config.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 1); // Preserve ordering during retries
        config.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true); // Prevent duplicate messages
        config.put(ProducerConfig.LINGER_MS_CONFIG, 5);             // Batch for 5ms
        config.put(ProducerConfig.BATCH_SIZE_CONFIG, 32768);        // 32KB batch
        config.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
        return new DefaultKafkaProducerFactory<>(config);
    }
}

Log Compaction

Kafka supports two retention policies:

1. Time-based retention (default):

Messages are deleted after a configurable time period (e.g., 7 days)
Regardless of whether consumers have read them

2. Log compaction:

For each unique message key, only the LATEST message is kept
Older messages with the same key are deleted during compaction

Before compaction:
Key=user-123: [email=old@example.com] [email=temp@example.com] [email=new@example.com]
Key=user-456: [email=bob@example.com]

After compaction:
Key=user-123: [email=new@example.com]  (latest value only)
Key=user-456: [email=bob@example.com]

Use log compaction for:

  • User profile updates (latest state of each user)
  • Configuration data (latest config per key)
  • Any "update" pattern where you only need the current state per entity

KRaft - Replacing ZooKeeper

Historically, Kafka used ZooKeeper for cluster coordination (leader election, metadata management). This was a pain point because it meant running a separate ZooKeeper cluster.

KRaft (Kafka Raft) - introduced in Kafka 2.8, production-ready in Kafka 3.3 - replaces ZooKeeper entirely. Kafka now manages its own consensus using the Raft protocol, stored in a special metadata topic.

Benefits of KRaft:

  • Simpler deployment (one fewer system to manage)
  • Faster metadata operations
  • Supports more partitions per cluster (from ~200K to millions)
  • Better recovery from controller failures

Kafka Configuration Quick Reference

# Producer - balanced for reliability and throughput
acks=all
retries=2147483647
max.in.flight.requests.per.connection=5
enable.idempotence=true
compression.type=lz4
linger.ms=5
batch.size=65536
buffer.memory=33554432
 
# Consumer - manual commit for reliability
enable.auto.commit=false
auto.offset.reset=earliest
max.poll.records=500
fetch.min.bytes=1
fetch.max.wait.ms=500
session.timeout.ms=30000
heartbeat.interval.ms=10000
 
# Topic configuration
retention.ms=604800000      # 7 days
replication.factor=3
min.insync.replicas=2       # At least 2 ISR must acknowledge writes

2. RabbitMQ

What RabbitMQ Really Is

RabbitMQ is a message broker implementing the AMQP (Advanced Message Queuing Protocol). It is a push-based system with sophisticated routing capabilities. The key abstraction is the Exchange-Queue-Binding model, which gives you powerful message routing without consumer-side filtering.

Core Architecture

Producer --> Exchange --> (Binding Rules) --> Queue --> Consumer

Producer never writes directly to a Queue.
The Exchange is a routing agent.
Bindings define the rules for routing.

Exchange Types - Deep Dive

Direct Exchange

Routes messages to queues whose binding key exactly matches the routing key.

Routing key: "order.created"
                |
          [Direct Exchange]
                |
   binding: "order.created" --> [Queue: order-processing]
   binding: "order.shipped" --> [Queue: shipping-notifications] (no match - not delivered)
   binding: "payment.done"  --> [Queue: payment-audit]          (no match - not delivered)

Use case: Routing specific event types to specific queues.

Fanout Exchange

Routes to ALL bound queues, completely ignoring the routing key.

[Fanout Exchange: broadcast]
        |
   +----|----+
   |    |    |
   v    v    v
Queue  Queue  Queue
(A)   (B)   (C)
All three queues receive every message.

Use case: Broadcasting notifications, cache invalidation, system-wide events.

Topic Exchange (Most Flexible)

Routes based on pattern matching using wildcards.

Routing keys and which queues they match:

"order.us-east.placed" -->
  * Queue "us-east-orders" (binding: "order.us-east.*")    MATCH
  * Queue "all-orders"     (binding: "order.#")             MATCH
  * Queue "placed-events"  (binding: "#.placed")            MATCH
  * Queue "payments"       (binding: "payment.#")           NO MATCH

"payment.eu-west.failed" -->
  * Queue "payment-alerts" (binding: "payment.*.failed")    MATCH
  * Queue "eu-west-events" (binding: "*.eu-west.#")         MATCH
  * Queue "all-events"     (binding: "#")                   MATCH

Pattern syntax:

  • * matches exactly one word (segment between dots)
  • # matches zero or more words

Headers Exchange

Routes based on message headers instead of routing key. More flexible but slower than other types.

// Producer - sets headers
MessageProperties props = new MessageProperties();
props.setHeader("region", "us-east");
props.setHeader("priority", "high");
props.setHeader("x-match", "all");  // "all" = AND logic, "any" = OR logic
 
// Consumer binding - match specific header values
@RabbitListener(bindings = @QueueBinding(
    value = @Queue("high-priority-us-east"),
    exchange = @Exchange(value = "headers-exchange", type = ExchangeTypes.HEADERS),
    arguments = {
        @Argument(name = "region", value = "us-east"),
        @Argument(name = "priority", value = "high"),
        @Argument(name = "x-match", value = "all")
    }
))
public void handleHighPriorityUsEastMessage(MyMessage message) { ... }

Queue Properties and Arguments

@Bean
public Queue orderProcessingQueue() {
    return QueueBuilder.durable("order-processing")
        // Dead letter configuration
        .withArgument("x-dead-letter-exchange", "dlx")
        .withArgument("x-dead-letter-routing-key", "order-processing.dead")
 
        // Message expiry
        .withArgument("x-message-ttl", 300000)           // Messages expire after 5 minutes
 
        // Queue depth limit - reject new messages when queue is full
        .withArgument("x-max-length", 100000)
        .withArgument("x-overflow", "reject-publish")    // Or "drop-head" to drop oldest
 
        // Priority queue support
        .withArgument("x-max-priority", 10)
 
        // Quorum queue (replicated, persistent - recommended for production)
        .withArgument("x-queue-type", "quorum")
 
        .build();
}

Queue types comparison:

TypePersistenceReplicationPerformanceUse Case
ClassicOptional (durable flag)Mirrored policies (deprecated)HighSimple use cases, backwards compat
QuorumAlways persistentRaft-based, automaticMedium-HighProduction workloads, reliability
StreamAlways persistentLog-based (like Kafka)Very HighEvent streaming, replay needed

Use Quorum Queues in production. Classic mirrored queues are deprecated.


Consumer Prefetch (QoS)

Prefetch controls how many unacknowledged messages a consumer can hold at once.

// Without prefetch:
// Broker delivers ALL queued messages to the first available consumer
// One fast consumer gets all work; slow consumer gets nothing
// If fast consumer crashes, hundreds of messages must be redelivered
 
// With prefetch=10:
// Each consumer receives at most 10 unacknowledged messages
// Work is distributed fairly
// Crash recovery involves at most 10 messages
 
@Configuration
public class RabbitMQListenerConfig {
 
    @Bean
    public SimpleRabbitListenerContainerFactory rabbitListenerContainerFactory(
            ConnectionFactory connectionFactory) {
 
        SimpleRabbitListenerContainerFactory factory = new SimpleRabbitListenerContainerFactory();
        factory.setConnectionFactory(connectionFactory);
        factory.setPrefetchCount(10);           // Hold at most 10 unACKed messages
        factory.setAcknowledgeMode(AcknowledgeMode.MANUAL);
        factory.setConcurrentConsumers(3);      // 3 consumer threads
        factory.setMaxConcurrentConsumers(10);  // Scale up to 10 under load
 
        return factory;
    }
}

Prefetch tuning guidelines:

  • prefetch=1 - Maximum fairness, minimum throughput (one by one)
  • prefetch=10-50 - Good balance for most workloads
  • prefetch=100+ - High throughput, less fairness, larger redelivery batch on crash

Acknowledgment Modes

@Component
public class OrderConsumer {
 
    @RabbitListener(queues = "orders")
    public void processOrder(Order order, Channel channel, Message message) throws IOException {
        long deliveryTag = message.getMessageProperties().getDeliveryTag();
 
        try {
            orderService.process(order);
 
            // ACK: message successfully processed, delete from queue
            channel.basicAck(deliveryTag, false);
 
        } catch (BusinessLogicException e) {
            // NACK + requeue=false: permanent failure, send to DLQ
            channel.basicNack(deliveryTag, false, false);
 
        } catch (TransientException e) {
            // NACK + requeue=true: temporary failure, try again
            // WARNING: Be careful with this - can create infinite loops
            // Always pair with max retry logic
            channel.basicNack(deliveryTag, false, true);
        }
    }
}

3. AWS SQS and SNS

AWS SQS - Simple Queue Service

SQS is a fully managed, serverless message queue service. There is no broker to manage, no cluster to provision.

Standard Queue vs FIFO Queue

FeatureStandard QueueFIFO Queue
ThroughputUnlimited300 TPS (3,000 with batching)
OrderingBest-effort (approximate)Strict FIFO within a message group
DeliveryAt-least-once (occasional duplicates)Exactly-once processing
DeduplicationNot built-inBuilt-in (5-minute dedup window)
PriceLowerHigher
Use caseHigh throughput, order not criticalPayment processing, order sequencing

Visibility Timeout - The Most Important SQS Concept

When a consumer receives a message from SQS, the message is NOT deleted. Instead, it becomes invisible to other consumers for the visibility timeout period.

Message arrives in queue
        |
Consumer A calls ReceiveMessage
Message becomes invisible for 30 seconds (visibility timeout)
        |
Two possible outcomes:

Outcome 1 - Success:
Consumer A processes successfully
Consumer A calls DeleteMessage
Message is permanently deleted
(within the 30 second window)

Outcome 2 - Consumer crashes or processing takes too long:
30 seconds pass without DeleteMessage
Message becomes VISIBLE again
Consumer B picks it up and processes it

Setting appropriate visibility timeout:

Rule: Visibility Timeout > (Maximum processing time for one message)

If processing takes up to 5 minutes: set visibility timeout to 6-10 minutes
If processing varies, consumer can extend timeout programmatically

// Extend visibility timeout if processing is taking longer than expected
sqsClient.changeMessageVisibility(ChangeMessageVisibilityRequest.builder()
    .queueUrl(queueUrl)
    .receiptHandle(receiptHandle)
    .visibilityTimeout(300)  // Extend by another 5 minutes
    .build());

Long Polling vs Short Polling

Short Polling (default, wasteful):
Consumer calls ReceiveMessage
SQS checks a subset of servers immediately
Returns 0 results if no messages (even if messages exist on other servers)
Consumer must poll again immediately
Result: Many empty API calls, higher cost, unnecessary latency

Long Polling (recommended):
Consumer calls ReceiveMessage with WaitTimeSeconds=20
SQS waits up to 20 seconds for a message to arrive
Returns as soon as a message is available, or after 20 seconds
Result: Far fewer API calls, lower cost, faster response
@Service
public class SQSMessagePoller {
 
    public void pollMessages() {
        ReceiveMessageRequest request = ReceiveMessageRequest.builder()
            .queueUrl(queueUrl)
            .maxNumberOfMessages(10)       // Batch up to 10 messages per call
            .waitTimeSeconds(20)           // Long polling - wait up to 20 seconds
            .visibilityTimeout(300)        // 5 minute visibility timeout
            .messageAttributeNames("All")
            .build();
 
        ReceiveMessageResponse response = sqsClient.receiveMessage(request);
 
        for (Message message : response.messages()) {
            processMessage(message);
 
            // Delete after successful processing
            sqsClient.deleteMessage(DeleteMessageRequest.builder()
                .queueUrl(queueUrl)
                .receiptHandle(message.receiptHandle())
                .build());
        }
    }
}

SQS Dead Letter Queue Configuration

// Configure DLQ in SQS
@Bean
public Queue orderProcessingQueue() {
    // First create the DLQ
    CreateQueueResponse dlqResponse = sqsClient.createQueue(
        CreateQueueRequest.builder()
            .queueName("order-processing-dlq")
            .build()
    );
 
    String dlqArn = sqsClient.getQueueAttributes(
        GetQueueAttributesRequest.builder()
            .queueUrl(dlqResponse.queueUrl())
            .attributeNames(QueueAttributeName.QUEUE_ARN)
            .build()
    ).attributes().get(QueueAttributeName.QUEUE_ARN);
 
    // Create main queue with redrive policy pointing to DLQ
    RedrivePolicy redrivePolicy = RedrivePolicy.builder()
        .deadLetterTargetArn(dlqArn)
        .maxReceiveCount(3)  // Move to DLQ after 3 failed attempts
        .build();
 
    CreateQueueResponse mainQueueResponse = sqsClient.createQueue(
        CreateQueueRequest.builder()
            .queueName("order-processing")
            .attributes(Map.of(
                QueueAttributeName.REDRIVE_POLICY,
                objectMapper.writeValueAsString(redrivePolicy)
            ))
            .build()
    );
 
    return ...; // Return your Queue configuration object
}

AWS SNS - Simple Notification Service

SNS is a pub/sub messaging service for fan-out patterns.

Core concept: Topics in SNS are for broadcasting. You subscribe HTTP endpoints, SQS queues, Lambda functions, mobile push, SMS, and email to an SNS topic.

The SNS + SQS Fan-out Pattern

This is the standard AWS pattern for event-driven architectures:

Order Service
      |
      | Publishes ONE message
      v
[SNS Topic: order-placed-events]
      |
  +---+---+---+
  |   |   |   |
  v   v   v   v
SQS  SQS  SQS  Lambda
(Email) (Analytics) (Warehouse) (Fraud Check)

Each SQS queue can be consumed by its own service at its own pace.
The SNS topic fans out to all subscriptions simultaneously.
SQS provides durability - if the consumer is down, messages wait in the queue.

Why not just SNS to HTTP directly? Because if your service is down when SNS delivers, the message is lost. The SQS buffer provides durability.

SNS Message Filtering

SNS supports filter policies so each SQS queue only receives relevant messages:

// SQS Queue "high-value-orders" subscription filter:
{
  "FilterPolicy": {
    "orderValue": [{ "numeric": [">=", 1000] }],
    "orderStatus": ["PLACED"],
    "region": ["us-east-1", "eu-west-1"]
  }
}
 
// Only messages matching ALL conditions are delivered to this subscription.
// Other subscriptions (without filters) receive all messages.

4. Redis Streams

What Redis Streams Is

Redis Streams (introduced in Redis 5.0) is an append-only log data structure within Redis. It brings Kafka-like consumer group semantics to Redis.

Key Features

  • Append-only log: Messages are added with XADD, never removed except by explicit trimming
  • Consumer groups: Multiple consumers can process the stream independently (like Kafka consumer groups)
  • Acknowledgment: Consumers must ACK messages, or they remain in the Pending Entries List (PEL)
  • Message ID: Format is timestamp-sequence (e.g., 1685000000000-0)
  • At-least-once delivery: Unacknowledged messages can be reclaimed

Basic Commands

# Producer: Add message to stream
XADD order-events * orderId "12345" customerId "CUST-001" totalAmount "149.97"
# Returns: "1685000000000-0" (auto-generated message ID: timestamp-sequence)
 
# Consumer: Read new messages (blocking for up to 5 seconds)
XREAD COUNT 10 BLOCK 5000 STREAMS order-events $
 
# Create consumer group
XGROUP CREATE order-events email-service-group $ MKSTREAM
 
# Consumer group read (only unprocessed messages for this group)
XREADGROUP GROUP email-service-group consumer-1 COUNT 10 BLOCK 5000 STREAMS order-events >
 
# ACK a message (remove from pending list)
XACK order-events email-service-group 1685000000000-0
 
# View pending (unacknowledged) messages
XPENDING order-events email-service-group - + 10
 
# Claim abandoned messages (if consumer-1 died, consumer-2 takes over)
XCLAIM order-events email-service-group consumer-2 60000 1685000000000-0

Spring Boot Integration

@Service
public class RedisStreamProducer {
 
    private final StringRedisTemplate redisTemplate;
 
    public void publishOrderEvent(OrderEvent event) {
        Map<String, String> fields = new LinkedHashMap<>();
        fields.put("orderId", event.getOrderId());
        fields.put("customerId", event.getCustomerId());
        fields.put("eventType", event.getEventType());
        fields.put("payload", objectMapper.writeValueAsString(event));
 
        RecordId recordId = redisTemplate.opsForStream()
            .add("order-events", fields);
 
        log.info("Published to Redis Stream: {}", recordId);
    }
}
 
@Component
public class RedisStreamConsumer {
 
    @StreamListener
    public void consumeOrderEvents() {
        // Using Spring Data Redis Stream support
        // This handles consumer group management and ACKs automatically
        redisTemplate.opsForStream().read(
            Consumer.from("email-service-group", "consumer-1"),
            StreamReadOptions.empty().count(10).block(Duration.ofSeconds(5)),
            StreamOffset.create("order-events", ReadOffset.lastConsumed())
        ).forEach(record -> {
            processEvent(record);
            redisTemplate.opsForStream().acknowledge("order-events",
                "email-service-group", record.getId());
        });
    }
}

When to Use Redis Streams

Use Redis Streams when:

  • You already use Redis and want to avoid adding another broker
  • Lower operational overhead is a priority
  • Message volumes are moderate (millions per day, not billions)
  • Low latency is critical (Redis is in-memory first)
  • You need consumer groups with ACK semantics but not Kafka's scale

Do NOT use Redis Streams for:

  • Mission-critical data that cannot tolerate any loss (Redis persistence has trade-offs)
  • Extremely high throughput (Kafka is purpose-built for this)
  • Long retention requirements (Redis memory cost vs Kafka disk cost)

5. Azure Service Bus

Overview

Azure Service Bus is Microsoft's enterprise-grade messaging service in Azure. It supports both queues and topics, with rich features for enterprise integration.

Key features:

  • Queues: Standard point-to-point messaging with at-least-once delivery
  • Topics and Subscriptions: Pub/Sub with server-side filter rules (SQL-like expressions)
  • Sessions: Ordered processing within a session ID (like Kafka partition key)
  • Scheduled messages: Deliver a message at a specific future time
  • Dead-lettering: Built-in DLQ per queue and per subscription
  • Duplicate detection: Built-in 10-minute deduplication window
  • Transaction support: Atomic send/receive/complete operations

Pricing tiers:

  • Basic: Queues only, simple messages
  • Standard: Topics, sessions, duplicate detection
  • Premium: Dedicated capacity, higher throughput, VNET isolation

Use Cases

Best fit when:

  • You are already in the Azure ecosystem
  • You need enterprise messaging features (sessions, scheduled messages, transactions)
  • You want fully managed infrastructure with SLA guarantees
  • You have .NET/Java enterprise integration requirements

6. Google Cloud Pub/Sub

Overview

Google Cloud Pub/Sub is a fully managed, globally distributed messaging service. It is designed for massive scale with automatic sharding and replication.

Key features:

  • Global routing: Messages published in one region can be consumed in another
  • At-least-once delivery: Automatic retry until ACK
  • Push and pull: Consumers can register a push endpoint (HTTP callback) or pull messages
  • Ordered delivery: Optional per-key ordering within a topic
  • Exactly-once processing: Available in certain configurations
  • Schema support: Avro and Protobuf schema validation
  • Filtering: Server-side message filtering per subscription
  • Snapshots: Create a point-in-time snapshot for replay

Scale:

  • Google uses Pub/Sub internally for systems processing trillions of events per day
  • No partition management required - fully automatic scaling

Best fit when:

  • You are in the GCP ecosystem
  • You want truly serverless, no-ops messaging
  • You have global distribution requirements
  • You want the simplest possible operational model

7. Comprehensive Comparison Table

FeatureApache KafkaRabbitMQAWS SQSAWS SQS FIFORedis StreamsAzure Service Bus
TypeEvent streamingMessage brokerMessage queueMessage queueStream logMessage broker
ProtocolCustom TCPAMQPHTTP/HTTPSHTTP/HTTPSRESPAMQP 1.0
ThroughputMillions/secTens of thousands/secHigh (unlimited)3,000/secVery highThousands/sec
Message retentionDays/weeks (configurable)Until consumed4 days (up to 14)4 days (up to 14)Until trimmedUntil consumed (up to 14 days)
ReplayYes - any offsetNoNoNoYes - by IDNo
OrderingPer partition (guaranteed)Per queue (approximate)Best-effortStrict FIFOPer streamPer session
DeliveryAt-least-onceAt-least-onceAt-least-onceExactly-onceAt-least-onceAt-least-once
Push vs PullPullPushPull (long polling)PullBothBoth
RoutingConsumer-side filteringExchange bindingsMessage filteringMessage groupConsumer groupsSQL-like filter rules
Self-hostedYesYesNo (managed)No (managed)YesNo (managed)
Managed cloudConfluent Cloud, MSKCloudAMQPNative AWSNative AWSRedis CloudNative Azure
Schema supportVia Schema RegistryNone built-inNoneNoneNoneYes (Preview)
Best forEvent streaming, analytics, audit logComplex routing, tasksAWS-native work queuesOrdered AWS processingLow-latency, Redis-nativeAzure enterprise integration

8. How to Choose the Right Technology

Decision Framework

Question 1: Do you need to replay historical events?

  • YES: Use Kafka (or Azure Event Hubs for Azure)
  • NO: Continue to Question 2

Question 2: Is strict message ordering required?

  • YES and high-throughput: Kafka (partition key for ordering)
  • YES and moderate throughput: SQS FIFO, Azure Service Bus with sessions, RabbitMQ single queue
  • NO: Continue to Question 3

Question 3: Are you on a managed cloud with limited ops capacity?

  • AWS: Use SQS/SNS
  • Azure: Use Azure Service Bus or Event Hubs
  • GCP: Use Cloud Pub/Sub
  • On-premises or cloud-agnostic: Continue to Question 4

Question 4: Do you need complex routing without consumer-side filtering?

  • YES: RabbitMQ (Exchange-based routing is unmatched for flexibility)
  • NO: Continue to Question 5

Question 5: Do you need extreme throughput (millions per second)?

  • YES: Kafka
  • NO: RabbitMQ, SQS, or Redis Streams are all fine

Common Technology Choices by Use Case

Use CaseRecommended TechnologyReason
Event sourcing and event replayKafkaLong retention, replayable log
Microservices integration (AWS shop)SNS + SQSManaged, reliable, fan-out
Complex routing rulesRabbitMQExchange-based routing
Activity stream/audit logKafkaImmutable append-only log
High-volume IoT ingestionKafkaExtreme throughput
Background job processingSQS Standard or RabbitMQSimple, reliable, competing consumers
Strictly ordered payment processingSQS FIFO or Kafka (single partition)Ordering guarantee
Cache invalidation broadcastRedis Pub/Sub or Fanout ExchangeBroadcast, low latency
Real-time analytics pipelineKafka + Kafka StreamsStateful stream processing
Enterprise integration (.NET)Azure Service BusNative Azure, enterprise features
Low-latency in-app eventsRedis StreamsIn-memory speed
Data change capture (CDC)Kafka + DebeziumLog-based CDC, replay

The Hybrid Architecture

Most production systems use multiple messaging technologies for different purposes:

Production Architecture Example:

User Service -->  [Kafka Topic: user-events]
                  |
            +-----+------+
            |            |
  Email Service     Analytics Pipeline
  (Kafka Consumer)  (Kafka Streams)

Payment Service --> [SQS FIFO: payment-jobs]
                    |
              Payment Worker
              (ordered, exactly-once)

Image Upload --> [SQS Standard: image-resize-jobs]
                 |
           +-----+-----+
           |     |     |
         Worker1 Worker2 Worker3
         (competing consumers)

All Services --> [Redis Pub/Sub: cache-invalidation]
                 |
         All instances receive and invalidate cache

Each technology is doing what it is best at:

  • Kafka: event streaming, replay, analytics
  • SQS FIFO: ordered payment processing
  • SQS Standard: distributed background jobs
  • Redis: low-latency cache invalidation broadcast

Previous: Part 2 - Patterns and Architecture
Next: Part 4 - Advanced Concepts
Index: Message Queues Demystified - Index