HTTP Caching
with Varnish
Drupal Mountain Camp, Davos
March 8th, 2024
© David Buchmann
David Buchmann - david@liip.ch
PHP Engineer, Liip AG, Switzerland
But... why?
- Response time
- Scale with load
- Save resources and money
What is a reverse proxy again?
What could possibly go wrong?
- Nothing gets cached (HTTP Headers)
- Too much gets cached (also HTTP Headers)
- Editors see no changes (cache invalidation)
- Caches get mixed up (personalized content)
httpstatusdogs.com
Overview
- HTTP refresher
- HTTP cache control
- Varnish
- Advanced Topics
- Wrap-Up
HTTP Refresher
HTTP: Browser chats with webserver
Request
GET /path
Accept-Encoding: text/html
Response
HTTP/1.1 200 OK
Content-Type: text/html
<html>...</html>
HTTP verbs
- GET
- HEAD
- POST
- PUT
- DELETE
- ...
HTTP response codes
- 1xx hold on
- 2xx here you go
- 3xx go away
- 4xx you fucked up
- 5xx I fucked up
twitter.com/stevelosh/status/372740571749572610
HTTP Cache Control
Cache control headers
- Cache Expiration Model
- Cache Validation Model
HTTP 1.1, RFC 2616, Sections 13.2 and 13.3
Cache Expiration
Cache-Control: s-maxage=3600, max-age=900
Expires: Sat, 10 Mar 2024 08:00:00 GMT
- s-maxage
- max-age
- Expires (HTTP 1.0 - avoid!)
- Default to default_ttl if nothing specified
max-age
- max-age per block and other page elements
- Application layer cache for individual elements
- Drupal builds the HTTP header from lowest encountered max-age
- http_response_header module for rule based HTTP headers (but mostly the Drupal caching framework provides enough tools)
Cache validation
- Application specific hash value on response
- On request, cheap check if hash changed
ETag: 82901821233
If-None-Match: 82901821233
304 Not Modified
etag
- Drupal core sets etag on cacheable responses (using the last modified timestamp)
- Drupal handles the If-None-Match header
- Drupal redundantly also does Last-Modified and If-Modified-Since
Do not cache
Cache-Control: s-maxage=0, private, no-cache
- s-maxage=0: Do not keep in cache
- private: For specific user, do not cache on proxies
- no-cache: Needs to be validated each time
- no-store: May never be placed in cache storage
Surrogate Control
Header specific for your reverse proxy, different from third party caches
Cache-Control: no-store
Surrogate-Control: max-age=3600
Resilience
Response
Cache-Control: stale-while-revalidate=3600;
Cache-Control: stale-if-error=3600;
Request
Cache-Control: must-revalidate;
Default Varnish behaviour
- Only attempt to cache GET and HEAD request
- Never cache request with cookies / authorization
- Never cache response with set-cookie
-
Only cache safe responses (status 200, 203, 300, 301, 302, 307, 404, 410)
Keep variants apart
Response content depends on request headers
Requests
GET /resource
Accept: application/json
GET /resource
Accept: text/xml
Response
Vary: Accept
Varnish does what you tell it
Think carefully and test thoroughly
Varnish Configuration Language
- Read and write header values
- If conditions, but no loops
- Regular expression and invalidation instructions
- No return values, only state changes
- No variables, only store information in headers
- Varnish modules, Inline C code
- //, # and /* foo */ for comments
VCL: Debug time to live
sub vcl_backend_response {
set beresp.http.TTL = beresp.ttl;
}
VCL: Two applications
backend default {
.host = "127.0.0.1"; .port = "8080";}
backend legacy {
.host = "127.0.0.1"; .port = "8000";}
sub vcl_recv {
if (req.url ~ "^/archive/") {
set req.backend_hint = legacy;
} else {
set req.backend_hint = default;
}
}
VCL can do a lot of things
- Add, alter and remove headers from request or response
- Decide when and how to cache
- Rewrite request URLs
But first make your application behave correctly!
Advanced topics
- Cache Invalidation
- Cache Tagging
- Edge Side Includes
Cache Invalidation
There are two hard things in computer science:
- Naming things
- Cache invalidation
- Off by one errors
Cache busting
- Very long cache lifetime for assets
- Append ?version to asset links
- Query string to miss the cache
<link rel="stylesheet" href="/css/style.css?v1" type="text/css"/>
...
<script src="/js/scripts.js?v1"></script>
Explicit cache invalidation
- Long cache lifetime on Varnish
- Explicitly tell Varnish to invalidate cached URLs
- For changes that are not trackable: low lifetime
Invalidation flavors
- Purge: URL and all variants
- Refresh: forced cache miss, then update cache
- Ban: batch invalidation with regular expression
- Tagging: batch invalidation based on tags
Communicating invalidation
- varnishadm command line tool
- Custom VCL and web requests
- Messaging: Unknown Varnish instances (cloud)
purge module to invalidate external caches.
Plugin for specific reverse proxies and CDNs varnish_purge
Custom configuration for purge
acl invalidators {
"localhost";
}
if (req.method == "PURGE") {
if (!client.ip ~ invalidators) {
return (synth(405, "Not allowed"));
}
return (purge);
}
...
Custom configuration for refresh
acl invalidators {
"localhost";
}
if (req.http.Cache-Control ~ "no-cache"
&& client.ip ~ invalidators
) {
set req.hash_always_miss = true;
}
...
Banning
- Regular expression matching
- On any request headers, not only path
- Too many ban instructions will overload Varnish
vcl_backend_response {
set beresp.http.X-Url = bereq.url;
set beresp.http.X-Host = bereq.http.host;
}
vcl_recv {
if (req.method == "BAN") {
if (!client.ip ~ invalidators) {
return (synth(405, "Not allowed"));
}
ban("obj.http.X-Host ~ " + req.http.X-Host
+ " && obj.http.X-Url ~ " + req.http.X-Url
);
}
}
Cache Tagging
- xkey vmod
- xkey headers for each content item with the id
- Custom VCL to invalidate with xkey.purge()
$response->withHeader('xkey', 'node:2 node:44');
xkey.purge(req.http.xkey-purge);
You can also use BAN, but its much less efficient
Cache Tagging
- The purge module can handle cache tags.
- The varnish purge module documents tag invalidation with BAN - somebody should contribute xkey documentation ;-)
Edge Side Includes
Use Edge Side Includes
Like server side include, but on Varnish:
- Content embeds URLs to fragments
- Varnish fetches and caches elements separatly
- Individual caching rules per fragment
- E.g. only some elements vary on cookie, different TTL, ...
ESI error handling
- Resilience and ESI combine
- ESI spec esi:try, esi:attempt, esi:except
- Not supported by Symfony Cache nor Varnish
- Varnish: VCL
Use Edge Side Includes
- You could build something with the placeholder mechanism
- The Advanced Varnish module provides ESI blocks (among other nice features)
- There is an unmaintained ESI module.
Wrap-Up
Take-Aways
- Know your HTTP
- Varnish is powerful
- Varnish is dangerous
- KISS VCL!
- Make your application behave correctly first
- Understand what you do, test what you do
Thank you!
@dbu@phpc.social
Caching lists of content
Element | weight | <-> | Element | weight |
A | 9 | | D | 22 |
B | 8 | | A | 9 |
C | 7 | | B | 8 |
D | 6 | | C | 7 |
E | 5 | | E | 5 |
F | 4 | | F | 4 |
G | 3 | | G | 3 |
H | 2 | | H | 2 |
I | 1 | | I | 1 |
Element | weight | <-> | Element | weight |
A | 9 | | A | 9 |
B | 8 | | B | 8 |
C | 7 | | D | 6 |
D | 6 | | E | 5 |
E | 5 | | F | 4 |
F | 4 | | G | 3 |
G | 3 | | H | 2 |
H | 2 | | I | 1 |
I | 1 | | | |
Caching and Sessions
Strategies when Caching with Sessions
- Avoid Session, remove when no longer needed
- Cache lookup despite cookies
- Prevent caching when specific
- Vary on Cookies header
- User Context: Cache by group