HTTP Caching 
with Varnish
            
            Drupal Mountain Camp, Davos
March 8th, 2024
            
            © David Buchmann
         
        
            
            
David Buchmann - david@liip.ch
            PHP Engineer, Liip AG, Switzerland
        
        
            But... why?
            
                - Response time
- Scale with load
- Save resources and money
 
        
        
            
                What is a reverse proxy again?
            
             
         
        
            What could possibly go wrong?
            
                - Nothing gets cached (HTTP Headers)
- Too much gets cached (also HTTP Headers)
- Editors see no changes (cache invalidation)
- Caches get mixed up (personalized content)
 httpstatusdogs.com
 httpstatusdogs.com
         
        
            Overview
            
                - HTTP refresher
- HTTP cache control
- Varnish
- Advanced Topics
- Wrap-Up
 
        
            
HTTP Refresher
        
        
            HTTP: Browser chats with webserver
            Request
            
GET /path
Accept-Encoding: text/html
            
            Response
            
HTTP/1.1 200 OK
Content-Type: text/html
<html>...</html>
            
         
        
        
            HTTP verbs
            
                - GET
- HEAD
- POST
- PUT
- DELETE
- ...
 
        
            HTTP response codes
            
                - 1xx hold on
- 2xx here you go
- 3xx go away
- 4xx you fucked up
- 5xx I fucked up
twitter.com/stevelosh/status/372740571749572610
         
        
            
HTTP Cache Control
        
        
            Cache control headers
            
                - Cache Expiration Model
- Cache Validation Model
HTTP 1.1, RFC 2616, Sections 13.2 and 13.3
         
        
            Cache Expiration
            
Cache-Control: s-maxage=3600, max-age=900
Expires: Sat, 10 Mar 2024 08:00:00 GMT
            
            
                - s-maxage
- max-age
- Expires (HTTP 1.0 - avoid!)
- Default to default_ttl if nothing specified
 
        
            max-age
            
            
                - max-age per block and other page elements
                
- Application layer cache for individual elements
                
- Drupal builds the HTTP header from lowest encountered max-age
                
- http_response_header module for rule based HTTP headers (but mostly the Drupal caching framework provides enough tools)
            
 
         
        
            Cache validation
            
                - Application specific hash value on response
- On request, cheap check if hash changed
 
            
            
                ETag: 82901821233
                
                If-None-Match: 82901821233
                
                304 Not Modified
            
            
         
        
            etag
            
            
                - Drupal core sets etag on cacheable responses (using the last modified timestamp)
                
- Drupal handles the If-None-Match header
                
- Drupal redundantly also does Last-Modified and If-Modified-Since
            
 
         
        
            Do not cache
            
Cache-Control: s-maxage=0, private, no-cache
            
            
                - s-maxage=0: Do not keep in cache
- private: For specific user, do not cache on proxies
- no-cache: Needs to be validated each time
- no-store: May never be placed in cache storage
 
        
            Surrogate Control
            Header specific for your reverse proxy, different from third party caches
            
Cache-Control: no-store
Surrogate-Control: max-age=3600
            
         
        
            Resilience
            Response
            
Cache-Control: stale-while-revalidate=3600;
Cache-Control: stale-if-error=3600;
            
            Request
            
Cache-Control: must-revalidate;
            
         
        
            Default Varnish behaviour
            
                - Only attempt to cache GET and HEAD request
- Never cache request with cookies / authorization
- Never cache response with set-cookie
 Only cache safe responses (status 200, 203, 300, 301, 302, 307, 404, 410) Only cache safe responses (status 200, 203, 300, 301, 302, 307, 404, 410)
 
        
            Keep variants apart
            Response content depends on request headers
            Requests
            
GET /resource
Accept: application/json
            
            
GET /resource
Accept: text/xml
            
            Response
            
Vary: Accept
            
         
        
        
        
            Varnish does what you tell it
            
            
                Think carefully and test thoroughly
            
         
        
            Varnish Configuration Language
            
                - Read and write header values
- If conditions, but no loops
- Regular expression and invalidation instructions
- No return values, only state changes
- No variables, only store information in headers
- Varnish modules, Inline C code
- //, # and /* foo */ for comments
 
        
            VCL: Debug time to live
            
sub vcl_backend_response {
    set beresp.http.TTL = beresp.ttl;
}
            
         
        
            VCL: Two applications
            
backend default {
    .host = "127.0.0.1"; .port = "8080";}
backend legacy {
    .host = "127.0.0.1"; .port = "8000";}
sub vcl_recv {
    if (req.url ~ "^/archive/") {
        set req.backend_hint = legacy;
    } else {
        set req.backend_hint = default;
    }
}
            
         
        
            
                VCL can do a lot of things
            
            
                - Add, alter and remove headers from request or response
- Decide when and how to cache
- Rewrite request URLs
But first make your application behave correctly!
         
        
            Advanced topics
            
                - Cache Invalidation
- Cache Tagging
- Edge Side Includes
 
        
            Cache Invalidation
            There are two hard things in computer science:
            
                - Naming things
- Cache invalidation
- Off by one errors
 
        
            Cache busting
            
                - Very long cache lifetime for assets
- Append ?version to asset links
- Query string to miss the cache
<link rel="stylesheet" href="/css/style.css?v1" type="text/css"/>
...
<script src="/js/scripts.js?v1"></script>
            
         
        
            Explicit cache invalidation
            
                - Long cache lifetime on Varnish
- Explicitly tell Varnish to invalidate cached URLs
- For changes that are not trackable: low lifetime
 
        
            Invalidation flavors
            
                - Purge: URL and all variants
- Refresh: forced cache miss, then update cache
- Ban: batch invalidation with regular expression
- Tagging: batch invalidation based on tags
 
        
            Communicating invalidation
            
                - varnishadm command line tool
- Custom VCL and web requests
- Messaging: Unknown Varnish instances (cloud)
                purge module to invalidate external caches.
                Plugin for specific reverse proxies and CDNs varnish_purge
            
         
        
            Custom configuration for purge
            
acl invalidators {
    "localhost";
}
if (req.method == "PURGE") {
    if (!client.ip ~ invalidators) {
        return (synth(405, "Not allowed"));
    }
    return (purge);
}
...
            
         
        
            Custom configuration for refresh
            
acl invalidators {
    "localhost";
}
if (req.http.Cache-Control ~ "no-cache"
    && client.ip ~ invalidators
) {
    set req.hash_always_miss = true;
}
...
            
         
        
            Banning
            
                - Regular expression matching
- On any request headers, not only path
- Too many ban instructions will overload Varnish
 
        
            
vcl_backend_response {
    set beresp.http.X-Url = bereq.url;
    set beresp.http.X-Host = bereq.http.host;
}
vcl_recv {
  if (req.method == "BAN") {
    if (!client.ip ~ invalidators) {
      return (synth(405, "Not allowed"));
    }
    ban("obj.http.X-Host ~ " + req.http.X-Host
      + " && obj.http.X-Url ~ " + req.http.X-Url
    );
  }
}
            
         
        
            Cache Tagging
            
                - xkey vmod
- xkey headers for each content item with the id
- Custom VCL to invalidate with xkey.purge()
$response->withHeader('xkey', 'node:2 node:44');
            
            
xkey.purge(req.http.xkey-purge);
            
            You can also use BAN, but its much less efficient
         
        
            Cache Tagging
            
                
                - The purge module can handle cache tags.
                
- The varnish purge module documents tag invalidation with BAN - somebody should contribute xkey documentation ;-)
            
 
         
        
        
            
Edge Side Includes
        
        
            
                Use Edge Side Includes
            
            Like server side include, but on Varnish:
            
                - Content embeds URLs to fragments
- Varnish fetches and caches elements separatly
- Individual caching rules per fragment
- E.g. only some elements vary on cookie, different TTL, ...
 
        
            ESI error handling
            
                - Resilience and ESI combine
                
- ESI spec esi:try, esi:attempt, esi:except
                
- Not supported by Symfony Cache nor Varnish
                
- Varnish: VCL
            
 
        
            
                Use Edge Side Includes
            
            
            
                - You could build something with the placeholder mechanism
                
- The Advanced Varnish module provides ESI blocks (among other nice features)
                
- There is an unmaintained ESI module.
            
 
         
        
            
Wrap-Up
        
        
            Take-Aways
            
                - Know your HTTP
- Varnish is powerful
- Varnish is dangerous
- KISS VCL!
- Make your application behave correctly first
- Understand what you do, test what you do
 
        
            Thank you!
            
            @dbu@phpc.social
            
            
         
        
            
Caching lists of content
        
        
            
                | Element | weight | <-> | Element | weight | 
|---|
                                                      | A | 9 |  | D | 22 | 
                                                      | B | 8 |  | A | 9 | 
                | C | 7 |  | B | 8 | 
                                                      | D | 6 |  | C | 7 | 
                                                      | E | 5 |  | E | 5 | 
                | F | 4 |  | F | 4 | 
                                                      | G | 3 |  | G | 3 | 
                                                      | H | 2 |  | H | 2 | 
                                                      | I | 1 |  | I | 1 | 
            
         
        
            
                | Element | weight | <-> | Element | weight | 
|---|
                                                      | A | 9 |  | A | 9 | 
                                                      | B | 8 |  | B | 8 | 
                | C | 7 |  | D | 6 | 
                                                      | D | 6 |  | E | 5 | 
                                                      | E | 5 |  | F | 4 | 
                | F | 4 |  | G | 3 | 
                                                      | G | 3 |  | H | 2 | 
                                                      | H | 2 |  | I | 1 | 
                                                      | I | 1 |  |  |  | 
            
         
        
            
Caching and Sessions
        
        
            Strategies when Caching with Sessions
            
                - Avoid Session, remove when no longer needed
- Cache lookup despite cookies
                    
                        - Prevent caching when specific
- Vary on Cookies header
 
- User Context: Cache by group