System Design for Software Engineers

Caching Fundamentals: Why, Where, and How

In system design, Caching is one of the most effective ways to improve application performance and scalability. A cache is a high-speed data storage layer which stores a subset of data, typically transient in nature, so that future requests for that data are served faster than is possible by accessing the data's primary storage location.

Why Cache?

Reduce Latency: Accessing data from RAM (cache) is orders of magnitude faster than fetching it from a disk (database).
Reduce Load on Origin: By serving frequent requests from the cache, you prevent your database or API from being overwhelmed.
Save Costs: Reducing the load on expensive database instances can lead to significant cost savings.
Improve Availability: Even if the primary database is briefly down, the cache might still be able to serve the most common requests.

Where Can You Cache?

Caching can happen at almost every layer of a modern application stack:

Client-side: Browsers cache HTML, CSS, JS, and images.
CDN (Edge): As discussed in previous modules, CDNs cache static assets closer to the user.
Load Balancer: Some LBs can cache responses for specific URL paths.
Application Layer: In-memory caches (like a local hash map) within your server code.
Distributed Cache: A separate service like Redis or Memcached shared by all application servers.
Database Layer: Databases have internal caches for frequently accessed rows and query results.

Identifying Caching Opportunities

1
Step 1
Identify your slowest queries. If a query takes 200ms and is called 1,000 times a minute, it is a prime candidate for caching.
2
Step 2
Ask: 'How often does this data change?' Static content (like a blog post) is great for caching. Highly dynamic content (like a stock price or a user's bank balance) is much harder to cache safely.
3
Step 3
For simple static files, use a CDN. For frequently accessed API responses or session data, use a Distributed Cache like Redis.
4
Step 4
Choose a unique and consistent key for your cached data (e.g., user_profile:123). Ensure the key includes all parameters that affect the output.
5
Step 5
Never cache data indefinitely. Set an expiration time (TTL) based on how 'stale' the data is allowed to be. 5 minutes? 1 hour? 24 hours?

The Two Hard Things in Computer Science

As Phil Karlton famously said: "There are only two hard things in Computer Science: cache invalidation and naming things."

Cache Invalidation is the process of updating or deleting a cached value when the source data changes. If you get this wrong, your users will see old, incorrect data.

Common Caching Terms

Cache Hit: The requested data is found in the cache.
Cache Miss: The requested data is NOT found in the cache and must be fetched from the source.
Cache Hit Ratio: The percentage of requests served from the cache (Hits / (Hits + Misses)). A higher ratio is usually better.
Stale Data: Data in the cache that no longer matches the source data.

Common Mistakes

Caching Sensitive Data: Accidentally storing a user's PII (Personally Identifiable Information) in a shared cache without proper encryption or isolation.
Indefinite Caching: Forgetting to set a TTL, leading to the cache filling up with old data that is never used.
Thundering Herd: When a popular cache key expires, and thousands of concurrent requests all hit the database at the exact same time to refresh it.

Recap

Caching improves performance by storing data in high-speed RAM.
Caching can be applied at the Client, Edge, App, and Database layers.
The Cache Hit Ratio is the key metric for measuring cache effectiveness.
Cache Invalidation is the hardest part of caching.

Knowledge Check

Question 1 of 3

Q1Single choice

Which layer of caching is closest to the end-user?

Distributed Cache (Redis)

CDN (Edge)

Client-side (Browser)

Database Buffer Pool

The CAP Theorem Revisited

Caching Strategies: Patterns for Performance