Performance work often begins with the assumption that code must be made faster. A more powerful idea is that some code need not run at all. Caching stores the result of an expensive computation or fetch so that subsequent requests return the stored answer rather than repeating the work. The principle is simple, but applying it well requires understanding where caches live, how clients and servers negotiate freshness, and how to avoid serving stale or incorrect data.
Where Caches Live
A cache is any layer that holds a previously computed result close to where it is needed. In a web stack, caches appear at several levels.
The browser cache stores responses on the user’s device, eliminating network requests entirely for repeat visits. A shared cache, such as a CDN edge node or a reverse proxy, sits between many clients and the origin server and serves a single stored copy to all of them. On the server, an application cache such as Redis or Memcached holds computed values, query results, or rendered fragments in memory so the application avoids recomputation. Each layer trades a small amount of staleness for a large reduction in latency and load.
How HTTP Caching Works
HTTP defines a precise vocabulary for caching, and understanding it is essential because much caching happens automatically once the correct headers are present.
The central header is Cache-Control. A response marked Cache-Control: max-age=3600 may be reused for one hour without consulting the server. The public directive permits shared caches to store the response, while private restricts storage to the individual browser. The no-cache directive is frequently misunderstood: it does not forbid storage but requires the cache to revalidate with the server before reuse. To prevent storage altogether, no-store is the correct directive.
When a cached response expires, the client need not always download a fresh copy. Validation allows a cache to ask whether its stored copy is still current. With an ETag, the server returns an opaque identifier for the content; on revalidation the client sends If-None-Match, and if the content is unchanged the server replies 304 Not Modified with an empty body. The Last-Modified and If-Modified-Since pair serves the same purpose using timestamps. Validation saves bandwidth even when freshness has lapsed, because an unchanged resource is confirmed rather than retransmitted.
The Vary header deserves attention. It tells caches which request headers cause the response to differ, for example Vary: Accept-Encoding. Omitting it can cause a cache to serve a compressed response to a client that cannot decode it, or to serve the wrong language to a user.
Invalidation and Its Difficulties
The hard part of caching is not storing data but knowing when to discard it. A cache that never expires will eventually serve incorrect content; one that expires too eagerly provides little benefit.
Several strategies address this. Time-based expiry sets a lifetime and accepts bounded staleness, which suits content that changes predictably. Event-based invalidation removes or updates a cached entry when the underlying data changes, which is precise but requires the writing code to know every key affected. A particularly robust technique is cache busting through versioned URLs: static assets are named with a content hash, so a change produces a new URL and the old cached copy is simply never requested again. This permits very long lifetimes with no risk of staleness.
A common pitfall is caching data that is specific to a user behind a shared cache, which can leak one user’s content to another. Marking such responses private, or excluding them from shared caching entirely, prevents this class of error.
Conclusion
Caching is among the most effective performance techniques available because it removes work rather than accelerating it. Its value depends on placing caches at the right layers, expressing freshness and validation correctly through HTTP headers such as Cache-Control and ETag, and choosing an invalidation strategy matched to how the data changes. The discipline lies less in storing results than in retiring them at the right moment, ensuring that the fastest code, the code never run, is also the correct code.