WebSockets Demystified - Part 1: Fundamentals
Series: Index | Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 6
Table of Contents
- What Is a WebSocket?
- The Problem WebSockets Solve
- HTTP vs WebSocket - A Deep Comparison
- How the WebSocket Handshake Works
- RFC 6455 - The WebSocket Protocol
- WebSocket Frames and Data Transfer
- WebSocket vs SSE vs Long Polling vs Short Polling
- Where WebSockets Fit in System Design
- When to Use and When NOT to Use WebSockets
- Core Terminology Reference
1. What Is a WebSocket?
The Simple Explanation
Imagine you are watching a cricket match. With traditional HTTP, you would be like a person who runs to the stadium, asks "what is the score?", gets an answer, runs home, waits 5 seconds, and runs back again to ask. This is exhausting, slow, and wastes enormous resources.
WebSocket is like installing a telephone line directly to the stadium. Once the call is established, both you and the commentator can speak to each other at any time without needing to re-dial. The connection stays open. Either side can send information whenever they want.
The Technical Definition
A WebSocket is a communication protocol that provides a full-duplex, bidirectional, persistent communication channel over a single TCP connection. It was standardized by the IETF as RFC 6455 in 2011 and is supported by all modern browsers and servers.
Key properties:
- Full-duplex: Both client and server can send messages at the same time without waiting for the other to finish.
- Bidirectional: Data flows in both directions - client-to-server AND server-to-client.
- Persistent: The connection remains open until explicitly closed by either party.
- Low overhead: After the initial handshake, each message has only 2-14 bytes of framing overhead (vs hundreds of bytes for HTTP headers).
- Same port: Uses port 80 (WS) or 443 (WSS), so it passes through firewalls that would block other ports.
What It Is NOT
- WebSocket is not HTTP. It starts as HTTP but then upgrades.
- WebSocket is not a REST API. There are no resources, methods, or status codes per message.
- WebSocket is not automatically encrypted. You need WSS (WebSocket Secure) for that.
- WebSocket is not always the right tool. Many use cases are better served by SSE or polling.
2. The Problem WebSockets Solve
The Traditional Web Model
The web was built on HTTP, which is a request-response protocol. A client asks, the server answers, and the connection closes. This is perfect for loading web pages and fetching data. However, it creates severe limitations when you need real-time, server-initiated communication.
Real-World Problems Without WebSockets
Problem 1: Chat Applications
A user sends a message. Another user should see it instantly. With HTTP:
User A sends message --> POST /messages --> Server saves message
User B checks for messages --> GET /messages --> Server returns messages
User B only sees the message when they refresh or poll. If they poll every second, that is 60 HTTP requests per minute, per user, and 99% of them return nothing new.
Problem 2: Live Stock Prices
A trading dashboard shows 200 stock prices. Prices change hundreds of times per second. With HTTP polling every second, you need 200 GET requests per second, per client. With 1000 clients, that is 200,000 requests per second just for price updates.
Problem 3: Online Gaming
A multiplayer game needs player positions updated 30 times per second. Each update needs to be received within 50 milliseconds. HTTP round trips (including headers and connection overhead) easily exceed this latency. The game becomes unplayable.
Problem 4: Collaborative Editing
Google Docs-style editing requires that every keystroke from one user appears instantly on all other users' screens. With HTTP, you cannot push a single character without creating a full HTTP transaction.
What WebSockets Give You
| Need | Without WebSocket | With WebSocket |
|---|---|---|
| Server pushes data | Impossible without polling | Native, direct |
| Low latency | 50-300ms per round trip | 1-5ms after connection |
| Header overhead | 200-800 bytes per message | 2-14 bytes per frame |
| Bi-directional flow | Two separate HTTP calls | Single connection |
| Real-time feel | Simulated (expensive) | Genuine real-time |
3. HTTP vs WebSocket - A Deep Comparison
Connection Model
HTTP Request-Response Model:
-----------------------------------------
Client Server
| |
|-- GET /data -----------> |
| | (processes)
| <-------- 200 OK --------|
| (connection closed) |
| |
|-- GET /data -----------> | (next request, new connection)
| |
| <-------- 200 OK --------|
| (connection closed) |
WebSocket Full-Duplex Model:
-----------------------------------------
Client Server
| |
|-- HTTP Upgrade --------> |
| <------- 101 Switching---| (ONE handshake, connection stays open)
| |
|-- message -------------> | (client to server)
| <-------- message -------| (server to client, SAME connection)
|-- message -------------> | (client again)
| <-------- message -------| (server pushes without being asked)
| ... |
|-- CLOSE ---------------> | (explicitly closed)
Header Size Comparison
Typical HTTP request headers (300+ bytes):
GET /api/live-price HTTP/1.1
Host: api.example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
Accept: application/json
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
Cache-Control: no-cache
Connection: keep-alive
Cookie: session_id=abc123; preferences=dark_mode
WebSocket data frame (minimum 6 bytes for a small payload):
FIN=1, RSV=0, Opcode=0x1, Mask=1, Payload Length=13
Masking Key: [4 bytes]
Payload: "Hello, World!"
Latency Comparison
HTTP Polling (every 1 second):
Timeline: ----[req]--[resp]--[wait]----[req]--[resp]--[wait]----
Typical event detection lag: 0ms to 1000ms (average 500ms)
WebSocket:
Timeline: [handshake]---[msg]---[msg]---[msg]---[msg]---
Typical event detection lag: 1ms to 10ms
Scalability Comparison
Scenario: 10,000 users receiving 1 update per second
HTTP Polling (1 second interval):
- Requests per second: 10,000
- Avg request size: 400 bytes headers + 50 bytes body = 450 bytes
- Total bandwidth IN to server: 4.5 MB/s JUST for requests
- Total bandwidth OUT to server: similar
WebSocket:
- Active connections: 10,000 (persistent, but TCP is cheap after establishment)
- Per message overhead: 6 bytes framing
- Total bandwidth IN (client to server heartbeats): negligible
- Total bandwidth OUT (server pushing updates): 10,000 * 56 bytes = 560 KB/s
Feature Comparison Table
| Feature | HTTP/1.1 | HTTP/2 | WebSocket | SSE |
|---|---|---|---|---|
| Server push | No | Limited (push promises) | Yes | Yes |
| Client push | Yes | Yes | Yes | No |
| Full duplex | No | No | Yes | No |
| Overhead per message | High | Medium | Very Low | Low |
| Connection reuse | Keep-Alive | Multiplexed | Persistent | Persistent |
| Firewall friendly | Yes | Yes | Yes (port 80/443) | Yes |
| Browser support | Universal | Modern | Modern | Modern |
| Binary support | Yes | Yes | Yes | Text only |
| Auto reconnect | N/A | N/A | Manual | Built-in |
| Ideal use case | APIs, CRUD | Web pages | Chat, gaming | Notifications |
4. How the WebSocket Handshake Works
The Upgrade Process Step by Step
WebSocket does not create a completely new connection. It reuses the existing HTTP connection and upgrades it. This is clever engineering - it means WebSocket works everywhere HTTP works, including through most firewalls and proxies.
Step 1 - Client sends HTTP Upgrade Request
GET /ws/chat HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Sec-WebSocket-Extensions: permessage-deflate
Sec-WebSocket-Protocol: chat, superchat
Origin: https://example.com
Key headers explained:
| Header | Purpose |
|---|---|
Upgrade: websocket | Tells server this is a protocol upgrade request |
Connection: Upgrade | Must be present for the Upgrade to be processed |
Sec-WebSocket-Key | A base64-encoded random 16-byte value generated by the client |
Sec-WebSocket-Version: 13 | Must be 13 per RFC 6455 |
Sec-WebSocket-Protocol | Optional: application-level sub-protocols the client supports |
Sec-WebSocket-Extensions | Optional: extensions like compression |
Step 2 - Server validates and responds
The server must:
- Verify it can handle WebSocket upgrades
- Validate
Sec-WebSocket-Versionis 13 - Compute the accept key
- Return a
101 Switching Protocolsresponse
Accept key computation:
Accept Key = Base64( SHA1( Sec-WebSocket-Key + "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" ) )
The magic string 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 is a fixed GUID from RFC 6455. This prevents non-WebSocket HTTP servers from accidentally accepting WebSocket connections.
Java code for key computation (for understanding):
import java.security.MessageDigest;
import java.util.Base64;
public class WebSocketHandshakeUtil {
private static final String MAGIC_GUID = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";
/**
* Computes the Sec-WebSocket-Accept value from the client-provided key.
* This is done internally by all WebSocket servers.
* Shown here for educational purposes per RFC 6455 Section 4.2.2.
*/
public static String computeAcceptKey(String clientKey) throws Exception {
String combined = clientKey + MAGIC_GUID;
MessageDigest sha1 = MessageDigest.getInstance("SHA-1");
byte[] hash = sha1.digest(combined.getBytes("UTF-8"));
return Base64.getEncoder().encodeToString(hash);
}
public static void main(String[] args) throws Exception {
String clientKey = "dGhlIHNhbXBsZSBub25jZQ==";
String acceptKey = computeAcceptKey(clientKey);
System.out.println("Accept key: " + acceptKey);
// Expected: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
}
}Server response:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: chat
Step 3 - Protocol switches
After the 101 response, the TCP connection is no longer HTTP. The same TCP socket is now used for WebSocket frames. No more HTTP parsing happens on either end.
Before handshake: TCP socket carrying HTTP/1.1
After handshake: Same TCP socket, now carrying WebSocket frames
Common Handshake Failure Reasons
| HTTP Response Code | Reason | Fix |
|---|---|---|
| 400 Bad Request | Missing required headers, wrong version | Check client code |
| 401 Unauthorized | Missing or invalid authentication | Pass token in query param or via cookie |
| 403 Forbidden | Invalid origin, access denied | Configure allowed origins |
| 404 Not Found | Wrong WebSocket endpoint path | Check endpoint registration |
| 426 Upgrade Required | Server only accepts WebSocket, not HTTP | Correct, expected behavior |
| 502 Bad Gateway | Proxy not forwarding upgrade headers | Configure proxy for WebSocket upgrade |
5. RFC 6455 - The WebSocket Protocol
Why RFC 6455 Matters
RFC 6455, published in December 2011, is the definitive specification for the WebSocket protocol. Every WebSocket implementation you use - whether Spring's, Node.js's, or a browser's built-in - conforms to this standard. Understanding it helps you debug issues that appear at the network layer.
Key Sections of RFC 6455
Frame Format (Section 5)
Every piece of data sent over WebSocket is wrapped in a frame. A frame has this binary structure:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | (if payload len<mark class="obsidian-highlight">126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - -+
| Extended payload length continued, if payload len </mark> 127 |
+ - - - - - - - - - - - - - - -+-------------------------------+
| |Masking-key, if MASK set to 1 |
+-------------------------------+-------------------------------+
| Masking-key (continued) | Payload Data |
+-------------------------------- - - - - - - - - - - - - - - -+
: Payload Data continued ... :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| Payload Data continued ... |
+---------------------------------------------------------------+
Field explanations:
| Field | Bits | Description |
|---|---|---|
| FIN | 1 | 1 if this is the final fragment of the message |
| RSV1, RSV2, RSV3 | 3 | Reserved for extensions (e.g., RSV1 is used by compression) |
| Opcode | 4 | What type of data this frame carries |
| MASK | 1 | 1 if payload is masked (required for client-to-server) |
| Payload Length | 7 | 0-125: actual length; 126: next 2 bytes are length; 127: next 8 bytes are length |
| Masking Key | 32 | Present only if MASK=1; used to XOR the payload |
| Payload Data | variable | The actual data |
Opcodes
| Opcode | Value | Description |
|---|---|---|
| Continuation | 0x0 | Continuation frame of a fragmented message |
| Text | 0x1 | UTF-8 text data |
| Binary | 0x2 | Binary data |
| Close | 0x8 | Connection close |
| Ping | 0x9 | Heartbeat ping (server can send this) |
| Pong | 0xA | Heartbeat pong (must respond to ping) |
Why Clients Must Mask Frames
A key requirement from RFC 6455: all frames sent from the browser client to the server MUST be masked. The server MUST NOT mask frames it sends to clients. This is a security requirement to prevent cache poisoning attacks on HTTP caches and proxies that might misinterpret WebSocket data as HTTP responses.
Closing Handshake
A clean WebSocket closure requires a closing handshake:
Either party: --[Close frame, code=1000]-->
Other party: <--[Close frame, code=1000]--
TCP connection closes
Common close codes:
| Code | Meaning |
|---|---|
| 1000 | Normal closure |
| 1001 | Going away (server shutdown or browser navigating away) |
| 1002 | Protocol error |
| 1003 | Unsupported data type |
| 1006 | Abnormal closure (no close frame was received) |
| 1007 | Invalid data (e.g., non-UTF-8 in text frame) |
| 1008 | Policy violation |
| 1009 | Message too large |
| 1011 | Internal server error |
6. WebSocket Frames and Data Transfer
Text vs Binary Messages
WebSocket natively supports two types of payloads:
Text frames (Opcode 0x1):
- Must be valid UTF-8
- Used for JSON, XML, plain text
- Most common in web applications
Binary frames (Opcode 0x2):
- Raw bytes, any format
- Used for images, audio, video, Protocol Buffers, MessagePack
- More efficient when you need to avoid JSON encoding overhead
Message Fragmentation
Large messages can be split into multiple frames (fragmentation). This allows sending large messages without buffering them entirely:
Frame 1: FIN=0, Opcode=0x1 (text), payload="Hello "
Frame 2: FIN=0, Opcode=0x0 (continuation), payload="World"
Frame 3: FIN=1, Opcode=0x0 (continuation), payload="!"
The receiver reassembles: "Hello World!"
Ping/Pong Heartbeat Mechanism
WebSocket has a built-in heartbeat mechanism:
Server --[Ping frame]--> Client
Client --[Pong frame]--> Server
If a server sends a Ping and does not receive a Pong within a timeout, it considers the connection dead and closes it. This detects network failures, crashed clients, and zombie connections (clients that disappeared without sending a Close frame).
Real-world scenario: A mobile user's phone goes into airplane mode.
- No Close frame is sent.
- TCP connection stays "open" on server (half-open connection).
- Without heartbeat: server has no idea, wastes memory for hours.
- With heartbeat: server detects dead connection in seconds, cleans up.
7. WebSocket vs SSE vs Long Polling vs Short Polling
Short Polling
What it is: Client repeatedly asks the server "anything new?" at fixed intervals.
Client: GET /updates?since=1234 [every 5 seconds]
Server: "No" or "Yes, here is data"
Pros:
- Extremely simple to implement
- Works everywhere
- Easy to cache and load balance
Cons:
- High latency (up to interval length)
- Wasteful when nothing changes
- High server load at scale
When to use: Very infrequent updates (once per minute or less), simple use cases
Examples: Dashboard that updates every 30 seconds, cron-style status checks
Long Polling
What it is: Client asks the server, server holds the request open until data is available.
Client: GET /updates?since=1234 [sends request]
Server: [holds for up to 30 seconds, returns when data available or timeout]
Client: [immediately sends another request after receiving response]
Pros:
- Lower latency than short polling
- Works through all firewalls/proxies
- Relatively simple
Cons:
- One HTTP connection per client still
- Server must manage suspended requests
- Reconnection adds latency
- Heavy memory usage for many concurrent clients
When to use: When WebSocket is blocked, moderate real-time requirements
Examples: Legacy systems, some chat apps, notification systems
Server-Sent Events (SSE)
What it is: One-way, server-to-client stream over HTTP.
Client: GET /events (Accept: text/event-stream)
Server: [keeps connection open, pushes events as text/event-stream format]
data: {"type": "price", "value": 145.50}
data: {"type": "price", "value": 145.55}
Pros:
- Built into browsers natively (EventSource API)
- Automatic reconnection built in
- Works over HTTP/2 (can multiplex many streams)
- Simpler than WebSocket for server-to-client use case
- HTTP headers and cookies work normally (authentication is easy)
Cons:
- One direction only (server to client)
- Text only (binary requires base64 encoding)
- Limited to 6 connections per domain in HTTP/1.1 (solved in HTTP/2)
When to use: Live feeds, notifications, dashboards where client only reads
Examples: Twitter feed, stock tickers, news alerts, deployment logs
WebSocket
What it is: Full-duplex, bidirectional, persistent TCP channel.
Client <--> Server: simultaneous, low-overhead, any-format messages
Pros:
- True full-duplex (both sides can send simultaneously)
- Extremely low overhead after handshake
- Native binary support
- Sub-5ms latency achievable
- Supports sub-protocols (STOMP, MQTT, etc.)
Cons:
- Complex infrastructure (stateful, needs sticky sessions or pub/sub)
- Stateful (harder to load balance than HTTP)
- Some proxies and firewalls block WebSocket upgrades
- No built-in reconnection (must implement in client code)
- More complex to implement correctly
When to use: Chat, gaming, collaborative tools, financial trading, live sports
Decision Chart
Do you need the client to send data to the server frequently?
YES --> WebSocket
NO --> Continue...
Does the server need to push data to clients?
NO --> Regular HTTP REST (no real-time needed)
YES --> Continue...
Is one-way (server to client) sufficient?
YES --> SSE (simpler, works over HTTP/2, built-in reconnect)
NO --> WebSocket
Is WebSocket blocked by network/proxies?
YES --> Long Polling (with SockJS as fallback)
NO --> WebSocket
8. Where WebSockets Fit in System Design
Architecture Overview
+------------------+ WSS +-------------------+ Redis Pub/Sub +------------------+
| Browser Client | <---------> | WebSocket Server | <------------------> | Message Broker |
+------------------+ | (Spring Boot) | | (ElastiCache) |
+-------------------+ +------------------+
| |
| JPA / JDBC |
v v
+-------------------+ +------------------+
| MySQL Database | | Other WS Server |
+-------------------+ | (horizontal |
| scaling) |
+------------------+
Where WebSocket Servers Live
WebSocket servers are stateful by nature - they hold open connections. This has major implications:
- They cannot be trivially load-balanced like stateless HTTP servers.
- They have memory proportional to active connections. 1 million connections = significant memory.
- They need a cross-server messaging mechanism (like Redis Pub/Sub) so that a message sent to Server A can be delivered to a client connected to Server B.
- They require graceful shutdown to not abruptly drop all client connections.
The Role of a Message Broker
When you have multiple WebSocket server instances (horizontal scaling), you need a way for them to share messages:
User A [connected to Server 1] sends message to User B [connected to Server 2]
Without broker: Message arrives at Server 1, Server 2 never knows.
With Redis Pub/Sub:
1. Server 1 receives message from User A
2. Server 1 publishes to Redis channel "room:general"
3. Server 2 is subscribed to "room:general"
4. Server 2 receives the message from Redis
5. Server 2 delivers it to User B
9. When to Use and When NOT to Use WebSockets
Perfect Use Cases for WebSockets
| Use Case | Why WebSocket Is Right |
|---|---|
| Real-time chat | Bidirectional, low latency, persistent session |
| Multiplayer gaming | Sub-10ms latency required, constant bidirectional data |
| Collaborative editing | Every keystroke must propagate instantly to all users |
| Live financial trading | Price updates, order status, market depth must be millisecond-fast |
| Live sports scores | Frequent server pushes, clients react to events |
| IoT device dashboards | Devices push telemetry; dashboard pushes commands |
| Real-time notifications | Instant push, no polling required |
| Customer support chat | Same as real-time chat |
| Live auction systems | Bid updates, countdown timers, competitive real-time state |
Cases Where WebSocket Is Overkill
| Use Case | Better Alternative | Why |
|---|---|---|
| Order status tracking (hourly updates) | Short polling or email | Frequency does not justify connection overhead |
| Report generation status | SSE or polling | One-way push only needed |
| News feed updates | SSE | Server-to-client only, SSE is simpler |
| Search suggestions | HTTP | Request-response, stateless, cacheable |
| File upload progress | SSE | One-way status push |
| Low-traffic notification | FCM/APNs (push notifications) | Server-initiated, works even when app closed |
The "Real-Time" Trap
Many developers reach for WebSocket whenever they hear "real-time." Ask these questions first:
- Does the update frequency actually require sub-second latency? If updates come once per minute, polling every 30 seconds is perfectly fine.
- Does the client need to send data back, or just receive? If receive only, SSE is simpler.
- How many concurrent users do you expect? 100 users polling once per second is often cheaper than maintaining 100 WebSocket connections.
- Can the data be cached? HTTP responses can be cached. WebSocket messages cannot.
10. Core Terminology Reference
| Term | Definition |
|---|---|
| WebSocket | Full-duplex TCP-based protocol for real-time communication |
| WSS | WebSocket Secure - WebSocket over TLS/SSL |
| Handshake | The HTTP-to-WebSocket upgrade exchange |
| Frame | Smallest unit of WebSocket data |
| Opcode | Type identifier in a WebSocket frame |
| Masking | XOR obfuscation applied to client-to-server frames |
| STOMP | Simple Text Oriented Messaging Protocol - a higher-level messaging protocol that runs over WebSocket |
| SockJS | JavaScript library providing WebSocket-like API with HTTP fallbacks |
| Heartbeat | Periodic ping/pong to keep connection alive and detect failures |
| Broker | Component (e.g., in-memory or Redis) that routes messages between clients |
| Destination | STOMP concept - a path like /topic/prices or /queue/notifications that messages are sent to |
| Topic | A pub/sub destination where all subscribers receive each message |
| Queue | A point-to-point destination where only one subscriber receives each message |
| SimpMessagingTemplate | Spring class for sending messages to WebSocket clients from server-side code |
| Sticky Session | Load balancer configuration that routes all requests from a client to the same server instance |
| Redis Pub/Sub | Redis messaging mechanism used to coordinate messages across multiple WebSocket server instances |
| Connection Upgrade | The HTTP mechanism that converts an HTTP connection to a WebSocket connection |
| RFC 6455 | The IETF standard that defines the WebSocket protocol |
| permessage-deflate | A WebSocket extension for per-message compression |
| Subprotocol | An application-level protocol layered over WebSocket (e.g., STOMP, MQTT, chat) |