Coursify

System Design for Software Engineers

Horizontal Scaling and Statelessness

Horizontal Scaling and Statelessness

To scale a system to millions of users, you must be able to add more servers to your pool. This is Horizontal Scaling. However, adding servers is only half the battle. Your application must be designed to work correctly when a user's requests are distributed across many different servers.

The key to seamless horizontal scaling is Statelessness.

Stateless vs. Stateful Architecture

Stateful Architecture

In a stateful architecture, a server remembers data about a client's session (like a shopping cart or login state) in its local memory (RAM).

  • The Problem: If a user logs in on Server A, but their next request is routed to Server B (by a load balancer), Server B won't know who they are. The user is effectively logged out or loses their progress.

Stateless Architecture

In a stateless architecture, servers do not store any client session data. Every request must contain all the information needed to process it, or the server must fetch the session data from a shared external store.

  • The Solution: Use a shared data store (like Redis or a Database) to keep session data. Now, no matter which server receives the request, it can fetch the necessary state from the shared store.

Challenges of Horizontal Scaling

  1. Database Bottlenecks: While you can easily add 10 more web servers, scaling a single relational database is much harder. We will cover this in the "Data Management" module.
  2. Data Consistency: When multiple servers are updating the same shared state, you need to handle race conditions and locking.
  3. Internal Communication: As the number of servers grows, the overhead of them talking to each other (or to shared services) increases.

Transitioning to a Stateless Architecture

  1. 1
    Step 1

    Audit your code for any data stored in global variables, local files, or in-memory caches that isn't replicated. Common examples: session objects, uploaded temp files, or local image thumbnails.

  2. 2
    Step 2

    Move session storage from the application memory to a fast, external key-value store like Redis or Memcached. Most web frameworks (like Express, Django, or Spring) have plugins to do this with a few lines of configuration.

  3. 3
    Step 3

    Instead of saving user uploads to the local disk of a server, use a Distributed File System or Object Storage like Amazon S3 or Google Cloud Storage. Ensure all servers have access to this shared storage.

  4. 4
    Step 4

    Consider using JWT (JSON Web Tokens). Since the token itself contains the user identity and is cryptographically signed, any server can verify the user without even checking a database or session store.

  5. 5
    Step 5

    Once your app is stateless, you can disable 'Sticky Sessions' (IP Hash) on your load balancer. Use a simpler, more efficient algorithm like Round Robin for perfectly even distribution.

Why Statelessness Wins

  • Infinite Scalability: You can add 1,000 servers just as easily as you added 2.
  • Resilience: If Server A crashes, the user's next request goes to Server B, and they never notice because their state is safe in the shared Redis store.
  • Simplified Deployment: You can take servers down for maintenance or updates without worrying about "draining" active sessions.

Common Mistakes

  • Thinking 'Local': Assuming that a file written to /tmp/ on one request will be there for the next request.
  • Sticky Session Crutch: Using sticky sessions to avoid fixing a stateful bug. This leads to uneven load and makes it impossible to scale down safely.
  • Overwhelming the Shared Store: If every single request hits Redis for session data, Redis can eventually become your new bottleneck. Use local caching for non-sensitive, static data.

Recap

  • Horizontal Scaling requires adding more machines, not just bigger machines.
  • Statelessness means no client data is stored on individual application servers.
  • Use Shared Data Stores (Redis) or Tokens (JWT) to manage state across a cluster.
  • Stateless apps are more resilient and easier to manage at scale.

Knowledge Check

Question 1 of 3
Q1Single choice

What is the main problem with a stateful architecture in a horizontally scaled system?