WebSockets Demystified - Part 6: Interview Questions

Series: Index | Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 6

Most Frequently Asked Questions
Implementation and Code Questions
System Design Questions
Tricky and Deep Questions
Production and AWS Questions
Security Questions
Technical Architect Level Questions (2024-2026)
Common Follow-Up Questions and How to Handle Them

How to Use This Section

Questions are ordered by frequency and importance. For each question:

ANSWER - Core answer to give
DEPTH - Additional depth to show expertise
FOLLOW-UP - Common follow-up questions and how to answer them

1. Most Frequently Asked Questions

Q1. What is WebSocket and how does it differ from HTTP?

ANSWER:

WebSocket is a full-duplex, bidirectional, persistent communication protocol that operates over a single TCP connection. It starts as an HTTP request and upgrades to the WebSocket protocol using the HTTP 101 Switching Protocols response.

Key differences:

HTTP is request-response: client asks, server answers, connection closes.
WebSocket is persistent: connection stays open; either side can send data at any time.
HTTP has large header overhead (200-800 bytes per request).
WebSocket has minimal framing overhead (2-14 bytes per message).
HTTP is stateless; WebSocket is stateful.

DEPTH: HTTP/2 introduced server push, but it is only for resources the server already knows the client will need (like CSS/JS). It cannot push arbitrary application data. WebSocket fills this gap.

FOLLOW-UP: "Can you use WebSocket over HTTP/2?"

Not directly. The WebSocket upgrade mechanism (101 Switching Protocols) is a HTTP/1.1 concept. HTTP/2 uses a different mechanism (extended CONNECT method per RFC 8441) that browsers only started supporting recently. In practice, most production WebSocket deployments use HTTP/1.1 for the upgrade. AWS ALB terminates SSL and can forward HTTP/1.1 for WebSocket connections even when HTTP/2 is enabled for other traffic.

Q2. Explain the WebSocket handshake.

ANSWER:

The handshake is an HTTP upgrade process:

Client sends an HTTP GET with special headers:

GET /ws HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

Server validates the request, computes the accept key:

Accept = Base64(SHA1(clientKey + "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"))

Server responds with 101 Switching Protocols:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

After this, the TCP connection is no longer HTTP. Both sides communicate using WebSocket frames.

DEPTH: The magic GUID 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 prevents non-WebSocket servers from accidentally completing a WebSocket handshake. The Sec-WebSocket-Key prevents caching proxies from replaying responses.

FOLLOW-UP: "Why does the server compute an accept key? Why not just echo the client key?"

Echoing the key would allow an HTTP server that does not understand WebSocket to accidentally complete the handshake (it would just echo back whatever headers it received). The SHA1+GUID computation means only a server that knows the WebSocket spec and specifically implements the computation will return the correct accept key. This provides a positive confirmation that the server genuinely supports WebSocket.

Q3. What is STOMP and why is it used with WebSocket?

ANSWER:

STOMP (Simple Text Oriented Messaging Protocol) is a text-based messaging protocol that runs on top of WebSocket. It adds higher-level features that raw WebSocket lacks:

Destinations: Structured addressing (/topic/prices, /queue/orders).
Message routing: A broker routes messages from senders to subscribers.
Acknowledgments: Clients can confirm message receipt.
Transactions: Group multiple send/ack operations atomically.
Sub-protocol negotiation: During handshake, client and server agree on STOMP.

Without STOMP, you have a raw byte stream. You must implement your own message framing, routing, and pub/sub logic. STOMP gives you this for free.

Spring uses STOMP because Spring's @MessageMapping controller model mirrors the Spring MVC model - it maps destinations to methods, handles payloads, provides dependency injection.

FOLLOW-UP: "What are other WebSocket sub-protocols besides STOMP?"

MQTT - Lightweight protocol popular in IoT. Designed for constrained devices with limited bandwidth. Many IoT platforms use WebSocket as transport with MQTT as protocol.

WAMP (Web Application Messaging Protocol) - Combines RPC and pub/sub over WebSocket.

Socket.IO protocol - Custom protocol by the Socket.IO library, not a standard.

Raw JSON - Many applications define their own simple JSON message format without a formal sub-protocol.

Q4. How do you scale WebSocket servers horizontally?

ANSWER:

This is the core challenge of WebSocket at scale. The approach:

Sticky sessions on the load balancer: The ALB routes all WebSocket connections from a client to the same server instance. This ensures the TCP connection is maintained.
Redis Pub/Sub for cross-instance messaging: When Server A needs to send a message to a client connected to Server B:
- Server A publishes to a Redis channel.
- Server B (subscribed to that channel) receives the message.
- Server B delivers it to its locally connected client.
Stateless application logic: Business state (rooms, messages, presence) is stored in Redis or MySQL, not in server memory. When a client reconnects to a different server after a failure, the new server can reconstruct the session state from shared storage.

DEPTH: Sticky sessions are essential for initial connection establishment but not sufficient alone. If Server A dies, all its clients reconnect. They may connect to Server B. Redis ensures Server B can handle them. Sticky sessions only work while the server lives.

FOLLOW-UP: "What if the sticky session server goes down?"

When a server dies, all its connected clients receive disconnect events and begin reconnection. With exponential backoff and jitter, they reconnect to surviving servers over 5-60 seconds. Because all state (room membership, message history, presence) is in Redis and MySQL, the new server they connect to can immediately resume their session. The sticky session was for efficiency, not correctness.

Q5. How do you authenticate WebSocket connections?

ANSWER:

Authentication happens at two points:

1. During the HTTP Handshake (HandshakeInterceptor):

The JWT token is passed in the query string (since browsers don't allow custom headers in WebSocket connections):

wss://api.example.com/ws?token=eyJhbGciOiJIUzI1NiJ9...

A HandshakeInterceptor.beforeHandshake() validates the token. If invalid, returns 401. If valid, stores the userId in session attributes.

2. During STOMP CONNECT (ChannelInterceptor):

When the client sends the STOMP CONNECT frame, a ChannelInterceptor.preSend() sets the Spring Security Principal from the session attributes. This makes @AuthenticationPrincipal work in @MessageMapping methods.

DEPTH: Cookie-based authentication is preferred in some organizations because:

The cookie is not visible in logs (unlike query param tokens).
Browsers automatically send cookies with WebSocket upgrade requests.
This works with SockJS fallbacks.
However, cookie-based auth requires same-domain constraints and correct CORS configuration.

FOLLOW-UP: "What happens when the JWT expires during an active WebSocket session?"

Good question. The JWT is validated only at connection time in a naive implementation. The user's JWT could expire while the connection remains open, but they continue to send messages. Solutions:

Validate JWT claims on every SEND frame in the ChannelInterceptor. If token is expired, throw a SecurityException which sends an ERROR frame and closes the connection.

Implement JWT refresh: client sends a refresh request before expiry; server validates and returns a new token via a WebSocket message to /user/queue/token.refresh.

Set short connection durations matching token lifetime. After JWT expires, server closes connection gracefully. Client reconnects with a fresh token.

Q6. What is SockJS and when do you need it?

ANSWER:

SockJS is a JavaScript library that provides a WebSocket-like API but falls back to HTTP-based transports when WebSocket is not available or blocked:

WebSocket (best case)
HTTP Streaming (chunked transfer encoding, one-way from server)
XHR Polling (long polling via XMLHttpRequest)
JSONP Polling (cross-domain polling via script tags)

SockJS is needed when:

Corporate firewalls or proxies strip WebSocket upgrade headers.
Older browsers don't support WebSocket (mostly historical now, IE11 era).
Reverse proxies (old Nginx, HAProxy configs) don't forward upgrade headers.
The application is deployed in environments where network policies restrict WebSocket.

DEPTH: In modern environments (2024+), native WebSocket is supported everywhere. SockJS is primarily valuable as insurance against enterprise network restrictions. The overhead of SockJS negotiation (an extra /info request to determine transport capability) is acceptable.

Q7. What is the difference between /topic and /queue in STOMP?

ANSWER:

These represent different messaging semantics:

/topic - Publish/Subscribe:

Multiple subscribers can subscribe to the same topic.
Every message sent to the topic is delivered to ALL current subscribers.
If no one is subscribed when a message is sent, the message is dropped.
Used for: broadcast messages, room chat, live price feeds, system announcements.

/queue - Point-to-Point:

Typically one consumer per queue.
In Spring's usage with STOMP, /queue is used with /user prefix for user-specific delivery.
messagingTemplate.convertAndSendToUser(userId, "/queue/notifications", msg) routes to that user's personal queue.
Used for: private messages, personal notifications, per-user updates.

DEPTH: The distinction between topic and queue is more meaningful with a full external broker like RabbitMQ (which enforces these semantics at the broker level). With Spring's simple in-memory broker, /topic and /queue are primarily naming conventions. The user destination prefix (/user/queue/...) is where Spring adds actual routing logic.

Q8. How do you handle disconnections and reconnections?

ANSWER:

Server side:

SessionDisconnectEvent fires for any disconnection (graceful or abrupt).
Clean up presence, subscriptions, in-flight state.
Rely on heartbeat to detect dead connections that don't send close frames.

Client side:

Exponential backoff with jitter:
  attempt 1: wait 1 second + random(0-1s)
  attempt 2: wait 2 seconds + random(0-1s)
  attempt 3: wait 4 seconds + random(0-1s)
  ...
  max: 60 seconds + random
  max retries: 10

Message replay on reconnect:

Client stores last received message timestamp/sequence.
On reconnect, client sends "reconnected with last_seen_sequence=X".
Server replays all messages after sequence X (stored in Redis sorted set as replay buffer).

FOLLOW-UP: "How long should you keep the replay buffer?"

This depends on business requirements and cost constraints. Typically 24-48 hours for chat messages. After that, the user is expected to fetch history via REST API rather than a WebSocket replay. Redis sorted sets with TTL work well: the messages are automatically expired after the defined period. For important messages (notifications), they are persisted in MySQL and always available.

2. Implementation and Code Questions

Q9. Walk me through a Spring Boot WebSocket configuration.

ANSWER (code walkthrough):

Three key pieces:

1. @EnableWebSocketMessageBroker on @Configuration class
2. WebSocketMessageBrokerConfigurer implementation with two methods:
   - registerStompEndpoints(): where clients connect
   - configureMessageBroker(): how messages are routed

3. @Controller with @MessageMapping methods to handle incoming messages

Show the key choices:

setApplicationDestinationPrefixes("/app") - messages to /app/** go to @MessageMapping handlers.
enableSimpleBroker("/topic", "/queue") - in-memory broker for development.
setUserDestinationPrefix("/user") - prefix for user-specific destinations.
.withSockJS() - adds SockJS fallback.

FOLLOW-UP: "When would you replace enableSimpleBroker with a full external broker?"

When deploying multiple instances. The simple broker is in-memory and server-local. Messages sent on Server A's simple broker never reach clients on Server B. For multi-instance, you use:

enableStompBrokerRelay() pointing to RabbitMQ or ActiveMQ - these are real STOMP brokers that all server instances connect to.

Or a custom Redis Pub/Sub relay (lighter weight, no STOMP broker needed).
The choice depends on whether you need message durability (RabbitMQ) or just delivery (Redis).

Q10. How does @SubscribeMapping differ from @MessageMapping?

ANSWER:

@MessageMapping("/chat.send"):
  Triggered when a client SENDS a message to /app/chat.send.
  The return value (if any) is sent to the "default" response destination,
  or annotated with @SendTo to specify where.
  Primarily for processing client-to-server commands.

@SubscribeMapping("/chat.history/{roomId}"):
  Triggered when a client SUBSCRIBES to /app/chat.history/{roomId}.
  The return value is sent DIRECTLY and ONLY to the subscribing client.
  No broadcast.
  No persistence to broker.
  Ideal for: sending initial state to a newly subscribing client.

Real-world example: When a user opens a chat room, they subscribe to /app/chat.history/room1. The @SubscribeMapping immediately returns the last 50 messages, delivered only to that user. Then they also subscribe to /topic/room.room1 for new messages.

Q11. How do you send a message to a specific user?

ANSWER:

// Method 1: SimpMessagingTemplate
messagingTemplate.convertAndSendToUser(
    "userId123",           // The user's Principal name
    "/queue/private",      // Queue destination (without /user prefix)
    messagePayload         // The message object
);
// Internally routes to: /user/userId123/queue/private
 
// Method 2: From a @MessageMapping controller return value
@MessageMapping("/chat.send")
@SendToUser("/queue/response")   // Sends response to sender only
public ResponseDto handleMessage(@Payload RequestDto request, Principal principal) {
    return processRequest(request, principal.getName());
}

The Principal.getName() must match what CustomHandshakeHandler.determineUser() returns. Spring maintains an internal session registry mapping userId to active WebSocket sessions.

FOLLOW-UP: "What if the user has multiple browser tabs open (multiple sessions)?"

By default, convertAndSendToUser delivers to ALL active sessions for that user. This is the correct behavior for notifications (all tabs should see it). If you want to target one specific session (e.g., only the tab the user is actively using), you need to use the session-specific destination format and know the sessionId. This is rarely needed in practice.

3. System Design Questions

Q12. Design a real-time chat system for 1 million concurrent users.

ANSWER (structured approach):

Requirements clarification:

Message delivery: at-most-once vs at-least-once?
Message ordering: per room or globally?
Message persistence: yes (chat history)
Presence: online/offline status
Message types: text, media

High-level design:

[Mobile/Web Clients]
         |
    [AWS CloudFront + WAF]
         |
    [AWS ALB - sticky sessions]
         |
[WebSocket Cluster - 100 ECS tasks, 10,000 connections each]
         |
    [Redis Cluster - Pub/Sub]
         |
    [Kafka - durable message stream]
         |
    [Message Persistence Service]
         |
    [Aurora MySQL - messages, users, rooms]

Key decisions:

Separate WebSocket servers from message processing. WS servers only relay messages. Heavy processing (moderation, notifications) is done by separate services consuming from Kafka.
Redis for pub/sub within a data center. Each WS server subscribes to room channels on Redis.
Kafka for durable message ordering. All messages go to Kafka. Consumer service persists to MySQL. This guarantees ordering and allows replay.
Presence via Redis with TTL. Heartbeat refreshes TTL. Expired = offline.
Message ID generation. Use Snowflake IDs (k-sortable, distributed). This gives natural ordering without DB auto-increment.
Fan-out on read vs fan-out on write. For rooms with <1000 members: fan-out on write (push to all members). For large rooms (>1000): fan-out on read (broadcast to topic, clients pull).

Scale estimation:

1M users, 10% concurrently active = 100,000 WebSocket connections
At 10,000 connections per ECS task: 10 ECS tasks normally, auto-scale to 100 peak
Message rate: assume 1 msg/user/minute active = ~1,700 messages/second
Redis pub/sub easily handles this

Q13. How would you implement read receipts in a distributed WebSocket system?

ANSWER:

Data model:

CREATE TABLE message_receipts (
    message_id VARCHAR(36),
    user_id VARCHAR(36),
    status ENUM('DELIVERED', 'READ'),
    timestamp DATETIME(6),
    PRIMARY KEY (message_id, user_id)
);

Flow:

Server delivers message to client via WebSocket.
Client automatically sends ACK: SEND /app/message.ack/MSG_ID.
Server marks as DELIVERED in DB.
When user views the message (focus + scroll into view), client sends: SEND /app/message.read/MSG_ID.
Server marks as READ, notifies the original sender.
Sender receives: messagingTemplate.convertAndSendToUser(senderId, "/queue/receipts", receipt).

Distributed consideration: The sender might be on a different server. Use Redis pub/sub to relay the read receipt notification: publish to ws:user.{senderId}, all servers check if they have that user connected.

Optimization: Batch read receipts. Don't send one WS message per read. Accumulate for 500ms, then send batch: "Messages M1, M2, M3 read by user U at T".

4. Tricky and Deep Questions

Q14. Why must WebSocket frames from client to server be masked? What attack does this prevent?

ANSWER:

This is a security requirement from RFC 6455, Section 10.3, specifically to prevent cache poisoning attacks against HTTP intermediaries (proxies, CDNs).

The attack scenario:

An attacker crafts a WebSocket message whose binary content, when misinterpreted as an HTTP response, looks like:
```
HTTP/1.1 200 OK
Content-Type: text/html
[evil content]
```
A transparent caching proxy sitting between client and server might interpret this WebSocket data as an HTTP response.
The proxy caches the "response" for the target URL.
Future legitimate HTTP requests for that URL receive the attacker's poisoned content.

The fix (masking):
By requiring client frames to be XOR-masked with a random 32-bit key, the content of the frame is randomized. Even if an attacker crafts a message to look like HTTP, the random masking key (generated fresh for each message) makes it statistically impossible to reliably craft bytes that survive unmasking and still look like valid HTTP.

Why server-to-client frames are NOT masked:
The server sends from a known, trusted endpoint. A browser's same-origin policy prevents a web page from reading raw WebSocket frame data that could be used to construct an HTTP-like response for cache poisoning.

FOLLOW-UP: "Does masking provide confidentiality?"

No. Masking is NOT encryption. The masking key is sent in plaintext in the frame header. Anyone who can read the frame can unmask it. For confidentiality, you need WSS (WebSocket Secure = WebSocket over TLS). Masking's purpose is purely anti-cache-poisoning.

Q15. What is the "WebSocket C10K problem" and how do you solve it?

ANSWER:

The C10K problem (originally from 1999 by Dan Kegel) asks: how do you handle 10,000 concurrent connections on a single server? For WebSocket, it is even more challenging because connections are persistent.

The problem:

Thread-per-connection model: 10,000 threads × 1MB stack = 10 GB RAM just for thread stacks. Impractical.
Blocking I/O: Each thread blocks waiting for data. 10,000 blocked threads waste CPU.

The solution: Non-blocking I/O + Event Loop

Tomcat (Spring Boot's default) uses NIO (Non-blocking I/O) since Tomcat 6. With NIO:

A small thread pool handles many connections via Java's Selector.
When a WebSocket message arrives, a selector thread picks it up and dispatches to a worker thread pool.
WebSocket connections are maintained in memory structures, not threads.

Practical Spring Boot tuning for C10K:

server:
  tomcat:
    max-connections: 10000   # Tomcat NIO can handle this with few threads
    threads:
      max: 200               # 200 threads serve 10,000 connections (NIO multiplexing)

Memory per connection:

Each WebSocket session: ~50-100 KB (buffers, session object, subscription list)
10,000 connections: 500 MB - 1 GB
Plus JVM heap for application objects
Practical limit on a 4 GB container: ~30,000 connections safely

Q16. Explain the difference between WebSocket heartbeat and TCP keepalive. Can you rely on TCP keepalive alone?

ANSWER:

TCP Keepalive:

Operates at the TCP layer (OS level)
Periodically sends ACK packets to verify the underlying TCP connection is alive
Default: sends every 2 hours (Linux default), configurable
Detects: network path is still valid at TCP level
Does NOT detect: application-level issues (e.g., application deadlocked, not processing messages)

WebSocket Heartbeat (Ping/Pong):

Operates at the WebSocket application layer
The server sends a Ping frame, client must respond with Pong
Configured in application: setHeartbeatValue(new long[]{25000, 25000})
Detects: application is alive and responsive (processing frames)
Configurable interval (much shorter than TCP keepalive)

Can you rely on TCP keepalive alone?

No, for several reasons:

Default TCP keepalive (2 hours) is far too long. Zombie sessions accumulate for hours.
Some firewalls and load balancers (including AWS ALB with its 60-second idle timeout) drop connections before TCP keepalive triggers.
TCP keepalive tells you the network path is alive, but NOT whether the application is processing messages. A WebSocket server could have a deadlock or memory issue - TCP keepalive would still succeed.
ALB idle timeout resets on data, not just TCP keepalive packets (behavior varies).

Best practice: Use WebSocket heartbeat at 25-30 second intervals. This is application-level, reliable, and detects both network failures and application issues.

Q17. What happens if a WebSocket server crashes mid-broadcast? Is the message partially delivered?

ANSWER:

Yes, partial delivery is possible and is a real production concern.

Scenario:

Server has 1,000 clients subscribed to /topic/room.123.
Server starts broadcasting a message.
Server delivers to 500 clients successfully.
Server crashes (OOM, power failure).
Remaining 500 clients never receive the message.

This is a known limitation of at-most-once delivery.

Solutions:

At-least-once delivery with acknowledgment:

1. Server generates message ID: MSG-456
2. Server stores MSG-456 in Redis sorted set "pending:room.123" with timestamp score
3. Server broadcasts to all clients
4. Each client receives message, sends ACK to /app/message.ack/MSG-456
5. Server removes MSG-456 from pending set on ACK
6. On reconnect, client says "last seen: MSG-455"
7. Server replays MSG-456 from pending set

For truly critical messages (financial, medical):

Use Kafka as the message backbone. WS is delivery mechanism only.
Kafka guarantees the message is persisted.
On reconnect, fetch missed messages from Kafka consumer offset.
WS delivery is best-effort. Kafka is the source of truth.

Practical reality: Most chat applications accept at-most-once delivery (as WhatsApp's web client does). Users accept that messages sent during a network hiccup may need to be resent. At-least-once adds complexity. Choose based on your business requirements.

Q18. How does Spring's UserDestinationMessageHandler work internally?

ANSWER:

When you call messagingTemplate.convertAndSendToUser("user123", "/queue/notify", msg):

Spring prepends the user destination prefix: /user/user123/queue/notify
The UserDestinationMessageHandler intercepts this message.
It looks up all WebSocket sessions for "user123" in the SimpUserRegistry.
SimpUserRegistry maintains a mapping: {userId -> Set<SimpSession>}.
For each session, it transforms the destination to a session-specific address:
/queue/notify-user{sessionId}
Delivers the message to each session's subscription.

The internal session registry:

// SimpUserRegistry maps:
// "user123" --> [session "abc123", session "def456"]
// (multiple sessions if user has multiple tabs/devices)
 
// For each session, UserDestinationMessageHandler creates:
// /queue/notify-userabc123
// /queue/notify-userdef456
// and sends to each

In a clustered environment: The SimpUserRegistry is per-server. Server A doesn't know about sessions on Server B. This is why you need Redis pub/sub relay. When you publish to Redis with userId, every server checks its local SimpUserRegistry and delivers if it has a session for that user.

5. Production and AWS Questions

Q19. How do you configure AWS ALB for WebSocket? What are the critical settings?

ANSWER:

Critical settings:

Idle timeout: 3600 seconds (not the default 60 seconds)
- Default 60 seconds kills WebSocket connections after 60 seconds of no TCP data.
- Heartbeats count as data but set generous timeout.
- AWS Console: EC2 > Load Balancers > Attributes > Idle timeout.
Stickiness (session affinity): enabled
- Type: lb_cookie
- Duration: 86400 (24 hours)
- Ensures WebSocket connections consistently route to the same ECS task.
Health check path: /actuator/health
- HTTP 200 response required.
- Interval: 15 seconds.
- Unhealthy threshold: 3 (allows brief hiccups).
HTTPS listener on 443
- ALB terminates TLS. Backend is plain HTTP/8080.
- WSS from client connects to ALB.
- ALB decrypts and forwards as WS to ECS container.
Security group: allow 443 inbound only
- ECS tasks' security group allows only traffic from ALB security group.

FOLLOW-UP: "Why doesn't ALB support WebSocket natively without sticky sessions?"

ALB does support WebSocket (it passes through upgrade headers). Sticky sessions are needed because once the WebSocket connection is established, all subsequent WebSocket frames over that TCP connection go to the same backend. The connection is already "stuck" to a server. Sticky sessions are needed for reconnection attempts - you want the client to reconnect to the same server where its session context lives. Without Redis relay, sticky sessions are essential. With Redis relay, they're still useful for efficiency but not for correctness.

Q20. How do you perform zero-downtime deployments with WebSocket servers?

ANSWER:

Standard blue/green deployment drops all active WebSocket connections. To achieve zero downtime:

Strategy: Preemptive drain

Signal clients before deregistration:
Send a shutdown notice to all connected clients:

messagingTemplate.convertAndSend("/topic/server.events",
    new ShutdownNotice("reconnect", 15));
// Clients receive this and prepare to reconnect in 15 seconds

Deregister from load balancer gradually:
Don't remove all tasks at once. Remove one at a time, wait for clients to reconnect.
ECS rolling update with minimumHealthyPercent=100, maximumPercent=200 ensures old tasks stay until new ones are healthy.
ECS task stop timeout:
Set stopTimeout: 60 in task definition. ECS waits 60 seconds for graceful shutdown before SIGKILL.

Spring graceful shutdown:

server:
  shutdown: graceful
spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s

Client reconnection with jitter:
Clients reconnect over a spread of 1-60 seconds, not all at once.

Result: Old connections drain slowly. New connections go to new tasks. No visible disruption for most users.

6. Security Questions

Q21. How do you prevent WebSocket-based DDoS?

ANSWER:

Multiple layers of defense:

1. AWS WAF rules:

Rate limit WebSocket connection establishment per IP.
Block connections from known malicious IPs.
Geographic restrictions if applicable.

2. Connection limits per user:

// In ChannelInterceptor, on CONNECT:
// Check how many sessions this userId already has
int activeSessions = userRegistry.getUser(userId).getSessions().size();
if (activeSessions >= MAX_SESSIONS_PER_USER) {
    throw new MaxSessionsExceededException();
}

3. Rate limiting per connection:
Token bucket limiter: 30 messages per 10 seconds per user.

4. Message size limits:

registration.setMessageSizeLimit(64 * 1024); // 64 KB max

5. Authentication requirement:
No anonymous WebSocket connections accepted. JWT required.

6. Heartbeat timeouts:
Dead connections cleaned up quickly (within 40 seconds via heartbeat).

7. CloudWatch alarm on abnormal connection counts:
Auto-trigger WAF rules or notify on-call when connections spike abnormally.

Q22. Is it safe to pass a JWT in the WebSocket URL (query parameter)?

ANSWER:

It is a trade-off with valid concerns on both sides.

Risks:

JWT appears in server access logs, ALB access logs, browser history.
Token can be leaked via Referer header if the page links to another site.
Shoulder surfing (someone sees the URL bar).

Why it is often acceptable:

WebSocket URL is typically not typed by humans - it's generated by code.
Browser history rarely shows WebSocket URLs.
ALB access logs should be in a secured S3 bucket with restricted access.
The JWT has a short expiry (15-30 minutes) in well-designed systems.
No better option exists for browser WebSocket (browsers don't allow custom headers).

Mitigation:

Use short-lived tokens (15 minutes max) specifically for WebSocket connection establishment.
Rotate the WS token independently of the API JWT.
Disable access log storage of full URLs, or redact the token from logs.

Better alternatives (when possible):

HttpOnly, Secure cookie containing the session token. Browser sends cookie automatically with WebSocket upgrade. Works if WebSocket server is same-domain as login page.
Two-step: client fetches a one-time WebSocket ticket via authenticated REST API, uses that ticket in the WS URL. The ticket is single-use (invalidated after first use) so even if logged, it cannot be replayed.

7. Technical Architect Level Questions (2024-2026)

ANSWER (architect perspective):

This is a build vs buy decision with these factors:

Choose API Gateway WebSocket when:

Scale target > 500,000 concurrent connections (hard to manage ECS fleet at this scale).
Team has limited DevOps expertise for ECS + Redis + ALB management.
Architecture is already serverless (Lambda functions).
Variable, unpredictable traffic patterns (pay-per-connection is cost-efficient).
Compliance requires fully managed infrastructure with AWS SLAs.
Time-to-market is critical and operational complexity must be minimized.

Choose self-managed Spring Boot when:

Team already has Spring Boot expertise (lower learning curve).
Existing Spring microservices ecosystem (easier integration, shared libraries).
STOMP protocol features needed (subscription management, user destinations).
Cost optimization at medium scale: 10,000-100,000 connections is cheaper on ECS than API Gateway billing.
Need for SockJS fallback (API Gateway only supports pure WebSocket).
Message broadcast patterns are more natural with STOMP topics.

API Gateway limitations to highlight:

Fan-out requires storing all connectionIds and iterating over them to send to each.
No native pub/sub or topic model.
Cold starts with Lambda can add latency to first message after inactivity.
No STOMP support.

My recommendation in practice:
Up to 100,000 concurrent connections: Spring Boot on ECS. Well-understood, STOMP-native, cost-effective. Beyond that: API Gateway or custom Go/Rust-based WS servers.

Q24. How would you design a WebSocket-based system that guarantees message ordering even across multiple server instances?

ANSWER:

This is a distributed systems ordering problem. Approaches ranked by complexity:

Option 1: Single-writer per partition (Kafka-based)

1. Each chat room is a Kafka partition.
2. All messages for a room go to the same partition (room_id as partition key).
3. Kafka maintains total order within a partition.
4. WS delivery service consumes from Kafka in order.
5. Delivers to room subscribers in guaranteed order.

Tradeoff: Latency increases by ~10-50ms (Kafka batch flushing).
Best for: Large systems where ordering guarantee is worth the latency.

Option 2: Sequence numbers with client-side reordering

1. Redis INCR generates monotonically increasing sequence numbers per room.
   sequence = INCR seq:room:{roomId}
2. Server assigns sequence to each message before broadcasting.
3. Client receives messages and buffers out-of-order ones.
4. Client renders messages in sequence order when gap fills.

Tradeoff: Client complexity. Gaps can cause delayed rendering if a message is lost.
Best for: Medium-scale systems, can tolerate at-most-once with sequence-ordered display.

Option 3: Distributed lock per room (conservative)

1. Per-room distributed lock in Redis (SETNX).
2. Only one server processes a message for a given room at a time.
3. Strict FIFO within the lock.

Tradeoff: Serialization limits throughput. Lock acquisition adds latency.
Best for: Low-volume, high-correctness requirements (financial order books).

What Facebook/Slack do: Vector clocks or logical timestamps per room. Client resolves conflicts. This allows high throughput while maintaining "approximately" correct ordering with last-write-wins conflict resolution.

Q25. How do you handle WebSocket in a microservices architecture where the service emitting events and the service maintaining WebSocket connections are different?

ANSWER:

This is a common real-world architecture problem. The WebSocket gateway is a separate service from the business logic services.

Architecture:

OrderService (Spring MVC REST)
    --> publishes "ORDER_SHIPPED" event to Kafka
         --> WebSocketGatewayService consumes from Kafka
              --> pushes to connected client via WebSocket

Implementation options:

Option A: Event-Driven via Kafka (recommended):

// OrderService publishes:
kafkaTemplate.send("order-events", orderId,
    new OrderEvent(orderId, userId, "SHIPPED", ...));
 
// WebSocketGatewayService consumes:
@KafkaListener(topics = "order-events")
public void onOrderEvent(OrderEvent event) {
    messagingTemplate.convertAndSendToUser(
        event.getUserId(), "/queue/order.status", event);
}

Option B: Direct HTTP call:

// OrderService calls WS Gateway REST endpoint:
wsGatewayClient.push(new PushRequest(userId, "/queue/order.status", event));
 
// WS Gateway exposes:
@PostMapping("/internal/push/user/{userId}")
public void pushToUser(@PathVariable String userId, @RequestBody PushEvent event) {
    messagingTemplate.convertAndSendToUser(userId, event.getDestination(), event.getPayload());
}

Challenges to highlight:

The WS Gateway must be scaled separately from business logic services.
With multiple WS Gateway instances, the push endpoint alone is insufficient (which instance has the user's connection?). Need Redis relay.
Kafka approach decouples services cleanly and provides buffering for offline users.
Use idempotency keys to prevent duplicate deliveries if Kafka consumer retries.

Q26. How do you design a WebSocket system that handles regional failover?

ANSWER:

Multi-region WebSocket is architecturally complex because WebSocket connections are inherently long-lived and region-bound.

Architecture:

Global DNS (Route53) with latency-based routing
    |
    +-- us-east-1: WS Cluster + ElastiCache + RDS Aurora
    |
    +-- eu-west-1: WS Cluster + ElastiCache + RDS Aurora
    |
    Aurora Global Database (cross-region replication, <1s lag)

Failover design:

Clients connect to nearest region via Route53 latency routing.
Message persistence uses Aurora Global Database. Both regions can read. Only primary region writes.
Cross-region pub/sub: NOT recommended (network latency). Room traffic is region-local. Users in EU talk to EU servers only.
On regional failure:
- Route53 health check detects outage.
- DNS failover routes EU clients to US region (within 60-120 seconds DNS TTL).
- Clients reconnect to US region.
- US region's WebSocket servers pull message history from Aurora Global Database (which already has EU data via replication).
- Users can resume conversations.
Presence across regions: Difficult. Options:
- Accept that presence is region-local in normal operation.
- Use DynamoDB Global Tables for presence (built-in multi-region sync).

RPO/RTO estimates:

RPO: Near zero for messages (Aurora Global Database syncs continuously).
RTO: 2-5 minutes for full failover (DNS propagation + client reconnection).

8. Common Follow-Up Questions and How to Handle Them

Follow-Up Pattern 1: "How would you test this?"

Framework for answering:

Three levels of testing:

1. Unit tests:
   - Test @MessageMapping methods in isolation (mock ChatService).
   - Test ChannelInterceptor security logic.
   - Test message serialization/deserialization.

2. Integration tests:
   - Use @SpringBootTest with RANDOM_PORT.
   - WebSocketStompClient to connect, subscribe, send, and verify.
   - Test authentication flow.
   - Test error handling.

3. Load tests:
   - Artillery or Gatling for 10,000+ connection simulations.
   - Test broadcast performance.
   - Test reconnection behavior under load.
   - Test memory profile over time (detect leaks).

Follow-Up Pattern 2: "What monitoring would you put in place?"

Standard answer:

Four categories:

1. Connection metrics:
   - Active connections (gauge, per instance and total)
   - Connection establishment rate (connections/second)
   - Connection error rate
   - Connection duration distribution

2. Message metrics:
   - Messages received per second
   - Messages sent per second
   - Message processing latency (P50, P95, P99)
   - Message size distribution

3. Infrastructure metrics:
   - JVM heap usage and GC pause times
   - Thread pool utilization
   - Redis pub/sub throughput and latency
   - ALB unhealthy host count

4. Business metrics:
   - Messages per room per hour
   - Active users per room
   - Message delivery success rate
   - Reconnection frequency per user

Alerts: Active connections nearing capacity, error rate > 1%, P99 latency > 1 second, unhealthy ECS tasks.

Follow-Up Pattern 3: "What would you do differently if starting over?"

Show architectural maturity:

1. Start with SSE for notifications - simpler, works everywhere.
   Add WebSocket only for bidirectional use cases.

2. Design the presence system earlier as a separate microservice.
   Presence is hard to retrofit.

3. Use Kafka from day one for message persistence.
   WebSocket as a delivery layer only, not a data layer.

4. Invest in client-side reconnection logic early.
   It's often an afterthought and causes production issues.

5. Set strict message size limits and rate limits from day one.
   Retrofitting these without breaking clients is painful.

6. Build observability (metrics, tracing) before features, not after.
   WebSocket debugging without metrics is painful.

Rapid-Fire Terminology Check

Question	Answer
What HTTP status code means WebSocket upgrade succeeded?	101 Switching Protocols
What is the WebSocket RFC number?	RFC 6455
What is the default ALB idle timeout?	60 seconds (change to 3600 for WebSocket)
What port does WSS use?	443
What port does WS use?	80
What is the STOMP heartbeat format?	`heart-beat:send-interval,receive-interval` in milliseconds
What close code means normal closure?	1000
What close code means abnormal closure (no close frame)?	1006
What masking key size does WebSocket use?	32 bits (4 bytes)
What is the minimum WebSocket frame size?	2 bytes (header only, for a ping or empty frame)
What Spring annotation maps a STOMP destination to a method?	`@MessageMapping`
What fires when a WebSocket session disconnects?	`SessionDisconnectEvent`
What Spring class pushes messages to WebSocket clients?	`SimpMessagingTemplate`
What prefix is used for user-specific destinations?	`/user` (configurable)
What does SockJS fall back to?	HTTP long-polling or streaming

End of Series. Return to Index

Series: Web Sockets Demystified

WebSockets Demystified - Part 6: Interview Questions

Table of Contents

How to Use This Section

1. Most Frequently Asked Questions

Q1. What is WebSocket and how does it differ from HTTP?

Q2. Explain the WebSocket handshake.

Q3. What is STOMP and why is it used with WebSocket?

Q4. How do you scale WebSocket servers horizontally?

Q5. How do you authenticate WebSocket connections?

Q6. What is SockJS and when do you need it?

Q7. What is the difference between /topic and /queue in STOMP?

Q8. How do you handle disconnections and reconnections?

2. Implementation and Code Questions

Q9. Walk me through a Spring Boot WebSocket configuration.

Q10. How does @SubscribeMapping differ from @MessageMapping?

Q11. How do you send a message to a specific user?

3. System Design Questions

Q12. Design a real-time chat system for 1 million concurrent users.

Q13. How would you implement read receipts in a distributed WebSocket system?

4. Tricky and Deep Questions

Q14. Why must WebSocket frames from client to server be masked? What attack does this prevent?

Q15. What is the "WebSocket C10K problem" and how do you solve it?

Q16. Explain the difference between WebSocket heartbeat and TCP keepalive. Can you rely on TCP keepalive alone?

Q17. What happens if a WebSocket server crashes mid-broadcast? Is the message partially delivered?

Q18. How does Spring's UserDestinationMessageHandler work internally?

5. Production and AWS Questions

Q19. How do you configure AWS ALB for WebSocket? What are the critical settings?

Q20. How do you perform zero-downtime deployments with WebSocket servers?

6. Security Questions

Q21. How do you prevent WebSocket-based DDoS?

Q22. Is it safe to pass a JWT in the WebSocket URL (query parameter)?

7. Technical Architect Level Questions (2024-2026)

Q23. When would you recommend AWS API Gateway WebSocket over self-managed Spring Boot WebSocket?

Q24. How would you design a WebSocket-based system that guarantees message ordering even across multiple server instances?

Q25. How do you handle WebSocket in a microservices architecture where the service emitting events and the service maintaining WebSocket connections are different?

Q26. How do you design a WebSocket system that handles regional failover?

8. Common Follow-Up Questions and How to Handle Them

Follow-Up Pattern 1: "How would you test this?"

Follow-Up Pattern 2: "What monitoring would you put in place?"

Follow-Up Pattern 3: "What would you do differently if starting over?"

Rapid-Fire Terminology Check