← Back to Articles
6/6/2026Admin Post

websockets part1 fundamentals

WebSockets Demystified - Part 1: Fundamentals

Series: Index | Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 6


Table of Contents

  1. What Is a WebSocket?
  2. The Problem WebSockets Solve
  3. HTTP vs WebSocket - A Deep Comparison
  4. How the WebSocket Handshake Works
  5. RFC 6455 - The WebSocket Protocol
  6. WebSocket Frames and Data Transfer
  7. WebSocket vs SSE vs Long Polling vs Short Polling
  8. Where WebSockets Fit in System Design
  9. When to Use and When NOT to Use WebSockets
  10. Core Terminology Reference

1. What Is a WebSocket?

The Simple Explanation

Imagine you are watching a cricket match. With traditional HTTP, you would be like a person who runs to the stadium, asks "what is the score?", gets an answer, runs home, waits 5 seconds, and runs back again to ask. This is exhausting, slow, and wastes enormous resources.

WebSocket is like installing a telephone line directly to the stadium. Once the call is established, both you and the commentator can speak to each other at any time without needing to re-dial. The connection stays open. Either side can send information whenever they want.

The Technical Definition

A WebSocket is a communication protocol that provides a full-duplex, bidirectional, persistent communication channel over a single TCP connection. It was standardized by the IETF as RFC 6455 in 2011 and is supported by all modern browsers and servers.

Key properties:

  • Full-duplex: Both client and server can send messages at the same time without waiting for the other to finish.
  • Bidirectional: Data flows in both directions - client-to-server AND server-to-client.
  • Persistent: The connection remains open until explicitly closed by either party.
  • Low overhead: After the initial handshake, each message has only 2-14 bytes of framing overhead (vs hundreds of bytes for HTTP headers).
  • Same port: Uses port 80 (WS) or 443 (WSS), so it passes through firewalls that would block other ports.

What It Is NOT

  • WebSocket is not HTTP. It starts as HTTP but then upgrades.
  • WebSocket is not a REST API. There are no resources, methods, or status codes per message.
  • WebSocket is not automatically encrypted. You need WSS (WebSocket Secure) for that.
  • WebSocket is not always the right tool. Many use cases are better served by SSE or polling.

2. The Problem WebSockets Solve

The Traditional Web Model

The web was built on HTTP, which is a request-response protocol. A client asks, the server answers, and the connection closes. This is perfect for loading web pages and fetching data. However, it creates severe limitations when you need real-time, server-initiated communication.

Real-World Problems Without WebSockets

Problem 1: Chat Applications

A user sends a message. Another user should see it instantly. With HTTP:

User A sends message  -->  POST /messages  -->  Server saves message
User B checks for messages  -->  GET /messages  -->  Server returns messages

User B only sees the message when they refresh or poll. If they poll every second, that is 60 HTTP requests per minute, per user, and 99% of them return nothing new.

Problem 2: Live Stock Prices

A trading dashboard shows 200 stock prices. Prices change hundreds of times per second. With HTTP polling every second, you need 200 GET requests per second, per client. With 1000 clients, that is 200,000 requests per second just for price updates.

Problem 3: Online Gaming

A multiplayer game needs player positions updated 30 times per second. Each update needs to be received within 50 milliseconds. HTTP round trips (including headers and connection overhead) easily exceed this latency. The game becomes unplayable.

Problem 4: Collaborative Editing

Google Docs-style editing requires that every keystroke from one user appears instantly on all other users' screens. With HTTP, you cannot push a single character without creating a full HTTP transaction.

What WebSockets Give You

NeedWithout WebSocketWith WebSocket
Server pushes dataImpossible without pollingNative, direct
Low latency50-300ms per round trip1-5ms after connection
Header overhead200-800 bytes per message2-14 bytes per frame
Bi-directional flowTwo separate HTTP callsSingle connection
Real-time feelSimulated (expensive)Genuine real-time

3. HTTP vs WebSocket - A Deep Comparison

Connection Model

HTTP Request-Response Model:
-----------------------------------------
Client                    Server
  |                          |
  |-- GET /data -----------> |
  |                          | (processes)
  | <-------- 200 OK --------|
  |    (connection closed)   |
  |                          |
  |-- GET /data -----------> |   (next request, new connection)
  |                          |
  | <-------- 200 OK --------|
  |    (connection closed)   |

WebSocket Full-Duplex Model:
-----------------------------------------
Client                    Server
  |                          |
  |-- HTTP Upgrade --------> |
  | <------- 101 Switching---|   (ONE handshake, connection stays open)
  |                          |
  |-- message -------------> |   (client to server)
  | <-------- message -------|   (server to client, SAME connection)
  |-- message -------------> |   (client again)
  | <-------- message -------|   (server pushes without being asked)
  |          ...             |
  |-- CLOSE ---------------> |   (explicitly closed)

Header Size Comparison

Typical HTTP request headers (300+ bytes):

GET /api/live-price HTTP/1.1
Host: api.example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
Accept: application/json
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
Cache-Control: no-cache
Connection: keep-alive
Cookie: session_id=abc123; preferences=dark_mode

WebSocket data frame (minimum 6 bytes for a small payload):

FIN=1, RSV=0, Opcode=0x1, Mask=1, Payload Length=13
Masking Key: [4 bytes]
Payload: "Hello, World!"

Latency Comparison

HTTP Polling (every 1 second):
Timeline: ----[req]--[resp]--[wait]----[req]--[resp]--[wait]----
Typical event detection lag: 0ms to 1000ms (average 500ms)

WebSocket:
Timeline: [handshake]---[msg]---[msg]---[msg]---[msg]---
Typical event detection lag: 1ms to 10ms

Scalability Comparison

Scenario: 10,000 users receiving 1 update per second

HTTP Polling (1 second interval):
- Requests per second: 10,000
- Avg request size: 400 bytes headers + 50 bytes body = 450 bytes
- Total bandwidth IN to server: 4.5 MB/s JUST for requests
- Total bandwidth OUT to server: similar

WebSocket:
- Active connections: 10,000 (persistent, but TCP is cheap after establishment)
- Per message overhead: 6 bytes framing
- Total bandwidth IN (client to server heartbeats): negligible
- Total bandwidth OUT (server pushing updates): 10,000 * 56 bytes = 560 KB/s

Feature Comparison Table

FeatureHTTP/1.1HTTP/2WebSocketSSE
Server pushNoLimited (push promises)YesYes
Client pushYesYesYesNo
Full duplexNoNoYesNo
Overhead per messageHighMediumVery LowLow
Connection reuseKeep-AliveMultiplexedPersistentPersistent
Firewall friendlyYesYesYes (port 80/443)Yes
Browser supportUniversalModernModernModern
Binary supportYesYesYesText only
Auto reconnectN/AN/AManualBuilt-in
Ideal use caseAPIs, CRUDWeb pagesChat, gamingNotifications

4. How the WebSocket Handshake Works

The Upgrade Process Step by Step

WebSocket does not create a completely new connection. It reuses the existing HTTP connection and upgrades it. This is clever engineering - it means WebSocket works everywhere HTTP works, including through most firewalls and proxies.

Step 1 - Client sends HTTP Upgrade Request

GET /ws/chat HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Sec-WebSocket-Extensions: permessage-deflate
Sec-WebSocket-Protocol: chat, superchat
Origin: https://example.com

Key headers explained:

HeaderPurpose
Upgrade: websocketTells server this is a protocol upgrade request
Connection: UpgradeMust be present for the Upgrade to be processed
Sec-WebSocket-KeyA base64-encoded random 16-byte value generated by the client
Sec-WebSocket-Version: 13Must be 13 per RFC 6455
Sec-WebSocket-ProtocolOptional: application-level sub-protocols the client supports
Sec-WebSocket-ExtensionsOptional: extensions like compression

Step 2 - Server validates and responds

The server must:

  1. Verify it can handle WebSocket upgrades
  2. Validate Sec-WebSocket-Version is 13
  3. Compute the accept key
  4. Return a 101 Switching Protocols response

Accept key computation:

Accept Key = Base64( SHA1( Sec-WebSocket-Key + "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" ) )

The magic string 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 is a fixed GUID from RFC 6455. This prevents non-WebSocket HTTP servers from accidentally accepting WebSocket connections.

Java code for key computation (for understanding):

import java.security.MessageDigest;
import java.util.Base64;
 
public class WebSocketHandshakeUtil {
 
    private static final String MAGIC_GUID = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";
 
    /**
     * Computes the Sec-WebSocket-Accept value from the client-provided key.
     * This is done internally by all WebSocket servers.
     * Shown here for educational purposes per RFC 6455 Section 4.2.2.
     */
    public static String computeAcceptKey(String clientKey) throws Exception {
        String combined = clientKey + MAGIC_GUID;
        MessageDigest sha1 = MessageDigest.getInstance("SHA-1");
        byte[] hash = sha1.digest(combined.getBytes("UTF-8"));
        return Base64.getEncoder().encodeToString(hash);
    }
 
    public static void main(String[] args) throws Exception {
        String clientKey = "dGhlIHNhbXBsZSBub25jZQ==";
        String acceptKey = computeAcceptKey(clientKey);
        System.out.println("Accept key: " + acceptKey);
        // Expected: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
    }
}

Server response:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: chat

Step 3 - Protocol switches

After the 101 response, the TCP connection is no longer HTTP. The same TCP socket is now used for WebSocket frames. No more HTTP parsing happens on either end.

Before handshake:  TCP socket carrying HTTP/1.1
After handshake:   Same TCP socket, now carrying WebSocket frames

Common Handshake Failure Reasons

HTTP Response CodeReasonFix
400 Bad RequestMissing required headers, wrong versionCheck client code
401 UnauthorizedMissing or invalid authenticationPass token in query param or via cookie
403 ForbiddenInvalid origin, access deniedConfigure allowed origins
404 Not FoundWrong WebSocket endpoint pathCheck endpoint registration
426 Upgrade RequiredServer only accepts WebSocket, not HTTPCorrect, expected behavior
502 Bad GatewayProxy not forwarding upgrade headersConfigure proxy for WebSocket upgrade

5. RFC 6455 - The WebSocket Protocol

Why RFC 6455 Matters

RFC 6455, published in December 2011, is the definitive specification for the WebSocket protocol. Every WebSocket implementation you use - whether Spring's, Node.js's, or a browser's built-in - conforms to this standard. Understanding it helps you debug issues that appear at the network layer.

Key Sections of RFC 6455

Frame Format (Section 5)

Every piece of data sent over WebSocket is wrapped in a frame. A frame has this binary structure:

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-------+-+-------------+-------------------------------+
     |F|R|R|R| opcode|M| Payload len |    Extended payload length    |
     |I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
     |N|V|V|V|       |S|             |   (if payload len<mark class="obsidian-highlight">126/127)   |
     | |1|2|3|       |K|             |                               |
     +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - -+
     |     Extended payload length continued, if payload len </mark> 127  |
     + - - - - - - - - - - - - - - -+-------------------------------+
     |                               |Masking-key, if MASK set to 1  |
     +-------------------------------+-------------------------------+
     | Masking-key (continued)       |          Payload Data         |
     +-------------------------------- - - - - - - - - - - - - - - -+
     :                     Payload Data continued ...                :
     + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
     |                     Payload Data continued ...                |
     +---------------------------------------------------------------+

Field explanations:

FieldBitsDescription
FIN11 if this is the final fragment of the message
RSV1, RSV2, RSV33Reserved for extensions (e.g., RSV1 is used by compression)
Opcode4What type of data this frame carries
MASK11 if payload is masked (required for client-to-server)
Payload Length70-125: actual length; 126: next 2 bytes are length; 127: next 8 bytes are length
Masking Key32Present only if MASK=1; used to XOR the payload
Payload DatavariableThe actual data

Opcodes

OpcodeValueDescription
Continuation0x0Continuation frame of a fragmented message
Text0x1UTF-8 text data
Binary0x2Binary data
Close0x8Connection close
Ping0x9Heartbeat ping (server can send this)
Pong0xAHeartbeat pong (must respond to ping)

Why Clients Must Mask Frames

A key requirement from RFC 6455: all frames sent from the browser client to the server MUST be masked. The server MUST NOT mask frames it sends to clients. This is a security requirement to prevent cache poisoning attacks on HTTP caches and proxies that might misinterpret WebSocket data as HTTP responses.

Closing Handshake

A clean WebSocket closure requires a closing handshake:

Either party:  --[Close frame, code=1000]-->
Other party:   <--[Close frame, code=1000]--
TCP connection closes

Common close codes:

CodeMeaning
1000Normal closure
1001Going away (server shutdown or browser navigating away)
1002Protocol error
1003Unsupported data type
1006Abnormal closure (no close frame was received)
1007Invalid data (e.g., non-UTF-8 in text frame)
1008Policy violation
1009Message too large
1011Internal server error

6. WebSocket Frames and Data Transfer

Text vs Binary Messages

WebSocket natively supports two types of payloads:

Text frames (Opcode 0x1):

  • Must be valid UTF-8
  • Used for JSON, XML, plain text
  • Most common in web applications

Binary frames (Opcode 0x2):

  • Raw bytes, any format
  • Used for images, audio, video, Protocol Buffers, MessagePack
  • More efficient when you need to avoid JSON encoding overhead

Message Fragmentation

Large messages can be split into multiple frames (fragmentation). This allows sending large messages without buffering them entirely:

Frame 1: FIN=0, Opcode=0x1 (text), payload="Hello "
Frame 2: FIN=0, Opcode=0x0 (continuation), payload="World"
Frame 3: FIN=1, Opcode=0x0 (continuation), payload="!"

The receiver reassembles: "Hello World!"

Ping/Pong Heartbeat Mechanism

WebSocket has a built-in heartbeat mechanism:

Server --[Ping frame]--> Client
Client --[Pong frame]--> Server

If a server sends a Ping and does not receive a Pong within a timeout, it considers the connection dead and closes it. This detects network failures, crashed clients, and zombie connections (clients that disappeared without sending a Close frame).

Real-world scenario: A mobile user's phone goes into airplane mode.
- No Close frame is sent.
- TCP connection stays "open" on server (half-open connection).
- Without heartbeat: server has no idea, wastes memory for hours.
- With heartbeat: server detects dead connection in seconds, cleans up.

7. WebSocket vs SSE vs Long Polling vs Short Polling

Short Polling

What it is: Client repeatedly asks the server "anything new?" at fixed intervals.

Client: GET /updates?since=1234  [every 5 seconds]
Server: "No" or "Yes, here is data"

Pros:
- Extremely simple to implement
- Works everywhere
- Easy to cache and load balance

Cons:
- High latency (up to interval length)
- Wasteful when nothing changes
- High server load at scale

When to use: Very infrequent updates (once per minute or less), simple use cases
Examples: Dashboard that updates every 30 seconds, cron-style status checks

Long Polling

What it is: Client asks the server, server holds the request open until data is available.

Client: GET /updates?since=1234 [sends request]
Server: [holds for up to 30 seconds, returns when data available or timeout]
Client: [immediately sends another request after receiving response]

Pros:
- Lower latency than short polling
- Works through all firewalls/proxies
- Relatively simple

Cons:
- One HTTP connection per client still
- Server must manage suspended requests
- Reconnection adds latency
- Heavy memory usage for many concurrent clients

When to use: When WebSocket is blocked, moderate real-time requirements
Examples: Legacy systems, some chat apps, notification systems

Server-Sent Events (SSE)

What it is: One-way, server-to-client stream over HTTP.

Client: GET /events (Accept: text/event-stream)
Server: [keeps connection open, pushes events as text/event-stream format]

data: {"type": "price", "value": 145.50}

data: {"type": "price", "value": 145.55}

Pros:
- Built into browsers natively (EventSource API)
- Automatic reconnection built in
- Works over HTTP/2 (can multiplex many streams)
- Simpler than WebSocket for server-to-client use case
- HTTP headers and cookies work normally (authentication is easy)

Cons:
- One direction only (server to client)
- Text only (binary requires base64 encoding)
- Limited to 6 connections per domain in HTTP/1.1 (solved in HTTP/2)

When to use: Live feeds, notifications, dashboards where client only reads
Examples: Twitter feed, stock tickers, news alerts, deployment logs

WebSocket

What it is: Full-duplex, bidirectional, persistent TCP channel.

Client <--> Server: simultaneous, low-overhead, any-format messages

Pros:
- True full-duplex (both sides can send simultaneously)
- Extremely low overhead after handshake
- Native binary support
- Sub-5ms latency achievable
- Supports sub-protocols (STOMP, MQTT, etc.)

Cons:
- Complex infrastructure (stateful, needs sticky sessions or pub/sub)
- Stateful (harder to load balance than HTTP)
- Some proxies and firewalls block WebSocket upgrades
- No built-in reconnection (must implement in client code)
- More complex to implement correctly

When to use: Chat, gaming, collaborative tools, financial trading, live sports

Decision Chart

Do you need the client to send data to the server frequently?
  YES --> WebSocket
  NO  --> Continue...

Does the server need to push data to clients?
  NO  --> Regular HTTP REST (no real-time needed)
  YES --> Continue...

Is one-way (server to client) sufficient?
  YES --> SSE (simpler, works over HTTP/2, built-in reconnect)
  NO  --> WebSocket

Is WebSocket blocked by network/proxies?
  YES --> Long Polling (with SockJS as fallback)
  NO  --> WebSocket

8. Where WebSockets Fit in System Design

Architecture Overview

+------------------+     WSS     +-------------------+    Redis Pub/Sub    +------------------+
|   Browser Client | <---------> |  WebSocket Server | <------------------> |  Message Broker  |
+------------------+             |   (Spring Boot)   |                      |  (ElastiCache)   |
                                 +-------------------+                      +------------------+
                                          |                                          |
                                          | JPA / JDBC                               |
                                          v                                          v
                                 +-------------------+                      +------------------+
                                 |   MySQL Database  |                      |  Other WS Server |
                                 +-------------------+                      |   (horizontal    |
                                                                            |    scaling)      |
                                                                            +------------------+

Where WebSocket Servers Live

WebSocket servers are stateful by nature - they hold open connections. This has major implications:

  1. They cannot be trivially load-balanced like stateless HTTP servers.
  2. They have memory proportional to active connections. 1 million connections = significant memory.
  3. They need a cross-server messaging mechanism (like Redis Pub/Sub) so that a message sent to Server A can be delivered to a client connected to Server B.
  4. They require graceful shutdown to not abruptly drop all client connections.

The Role of a Message Broker

When you have multiple WebSocket server instances (horizontal scaling), you need a way for them to share messages:

User A [connected to Server 1] sends message to User B [connected to Server 2]

Without broker: Message arrives at Server 1, Server 2 never knows.
With Redis Pub/Sub:
  1. Server 1 receives message from User A
  2. Server 1 publishes to Redis channel "room:general"
  3. Server 2 is subscribed to "room:general"
  4. Server 2 receives the message from Redis
  5. Server 2 delivers it to User B

9. When to Use and When NOT to Use WebSockets

Perfect Use Cases for WebSockets

Use CaseWhy WebSocket Is Right
Real-time chatBidirectional, low latency, persistent session
Multiplayer gamingSub-10ms latency required, constant bidirectional data
Collaborative editingEvery keystroke must propagate instantly to all users
Live financial tradingPrice updates, order status, market depth must be millisecond-fast
Live sports scoresFrequent server pushes, clients react to events
IoT device dashboardsDevices push telemetry; dashboard pushes commands
Real-time notificationsInstant push, no polling required
Customer support chatSame as real-time chat
Live auction systemsBid updates, countdown timers, competitive real-time state

Cases Where WebSocket Is Overkill

Use CaseBetter AlternativeWhy
Order status tracking (hourly updates)Short polling or emailFrequency does not justify connection overhead
Report generation statusSSE or pollingOne-way push only needed
News feed updatesSSEServer-to-client only, SSE is simpler
Search suggestionsHTTPRequest-response, stateless, cacheable
File upload progressSSEOne-way status push
Low-traffic notificationFCM/APNs (push notifications)Server-initiated, works even when app closed

The "Real-Time" Trap

Many developers reach for WebSocket whenever they hear "real-time." Ask these questions first:

  1. Does the update frequency actually require sub-second latency? If updates come once per minute, polling every 30 seconds is perfectly fine.
  2. Does the client need to send data back, or just receive? If receive only, SSE is simpler.
  3. How many concurrent users do you expect? 100 users polling once per second is often cheaper than maintaining 100 WebSocket connections.
  4. Can the data be cached? HTTP responses can be cached. WebSocket messages cannot.

10. Core Terminology Reference

TermDefinition
WebSocketFull-duplex TCP-based protocol for real-time communication
WSSWebSocket Secure - WebSocket over TLS/SSL
HandshakeThe HTTP-to-WebSocket upgrade exchange
FrameSmallest unit of WebSocket data
OpcodeType identifier in a WebSocket frame
MaskingXOR obfuscation applied to client-to-server frames
STOMPSimple Text Oriented Messaging Protocol - a higher-level messaging protocol that runs over WebSocket
SockJSJavaScript library providing WebSocket-like API with HTTP fallbacks
HeartbeatPeriodic ping/pong to keep connection alive and detect failures
BrokerComponent (e.g., in-memory or Redis) that routes messages between clients
DestinationSTOMP concept - a path like /topic/prices or /queue/notifications that messages are sent to
TopicA pub/sub destination where all subscribers receive each message
QueueA point-to-point destination where only one subscriber receives each message
SimpMessagingTemplateSpring class for sending messages to WebSocket clients from server-side code
Sticky SessionLoad balancer configuration that routes all requests from a client to the same server instance
Redis Pub/SubRedis messaging mechanism used to coordinate messages across multiple WebSocket server instances
Connection UpgradeThe HTTP mechanism that converts an HTTP connection to a WebSocket connection
RFC 6455The IETF standard that defines the WebSocket protocol
permessage-deflateA WebSocket extension for per-message compression
SubprotocolAn application-level protocol layered over WebSocket (e.g., STOMP, MQTT, chat)

Next: Part 2 - Spring Boot Implementation