HTTP Caching
with Varnish
Confoo Vancouver
December 7, 2016
© David Buchmann
What is a reverse proxy again?
What could possibly go wrong?
- Nothing gets cached (HTTP Headers)
- Too much gets cached (also HTTP Headers)
- Editors see no changes (cache invalidation)
- Caches get mixed up (personalized content)
httpstatusdogs.com
Overview
- HTTP refresher
- HTTP cache control
- Varnish
- Advanced Topics
- Wrap-Up
HTTP Refresher
HTTP is simple
Request
GET /path
Accept-Encoding: text/html
Response
HTTP/1.1 200 OK
Content-Type: text/html
<html>...</html>
HTTP verbs
- GET
- HEAD
- POST
- PUT
- DELETE
- ...
HTTP response codes
- 1xx hold on
- 2xx here you go
- 3xx go away
- 4xx you fucked up
- 5xx I fucked up
twitter.com/stevelosh/status/372740571749572610
HTTP Cache Control
Cache control headers
- Cache Expiration Model
- Cache Validation Model
HTTP 1.1, RFC 2616, Sections 13.2 and 13.3
Cache Expiration
Cache-Control: s-maxage=3600, max-age=900
Expires: Thu, 15 May 2014 08:00:00 GMT
- s-maxage
- max-age
- Expires (HTTP 1.0 - avoid!)
- Default to default_ttl if nothing specified
Cache validation
- Application specific hash value on response
- On request, cheap check if hash changed
ETag: 82901821233
If-None-Match: 82901821233
304 Not Modified
Default Varnish behaviour
- Only attempt to cache GET and HEAD request
- Never cache request with cookies / authorization
- Never cache response with set-cookie
-
Only cache safe responses (status 200, 203, 300, 301, 302, 307, 404, 410)
Keep variants apart
Content depending on request headers
GET /resource
Accept: application/json
GET /resource
Accept: text/xml
Vary: Accept
Varnish does what you tell it
Think carefully and test thoroughly
Varnish Configuration Language
- Read and write header values
- If conditions, but no loops
- Functions: state change, regexp and invalidation
- No return values, only state changes
- No variables, only store information in headers
- Inline C code
- //, # and /* foo */ for comments
VCL: Debug time to live
sub vcl_backend_response {
set beresp.http.TTL = beresp.ttl;
}
VCL: Debug cache hit / miss
sub vcl_deliver {
# If X-Varnish contains only 1 id, we have
# a miss, if it contains more (and
# therefore a space), we have a hit.
if (resp.http.X-Varnish ~ " ") {
set resp.http.Debug-Cache = "HIT";
} else {
set resp.http.Debug-Cache = "MISS";
}
}
VCL: Two applications
backend default {
.host = "127.0.0.1"; .port = "8080";}
backend legacy {
.host = "127.0.0.1"; .port = "8000";}
sub vcl_recv {
if (req.url ~ "^/archive/") {
set req.backend_hint = legacy;
} else {
set req.backend_hint = default;
}
}
VCL can do a lot of things
- Add, alter and remove headers from request or response
- Decide when and how to cache
- Rewrite request URLs
But first make your application behave correctly!
Advanced topics
- Cache Invalidation
- Cache Tagging
- Edge Side Includes
- Caching and Sessions
Cache Invalidation
There are two hard things in computer science:
- Naming things
- Cache invalidation
- Off by one errors
Cache busting
- Very long cache lifetime for assets
- Append ?version to asset links
- Query string to miss the cache
<link rel="stylesheet" href="/css/style.css?v1" type="text/css"/>
...
<script src="/js/scripts.js?v1"></script>
Explicit cache invalidation
- Long cache lifetime on Varnish
- Explicitly tell Varnish to invalidate cached URLs
- For changes that are not trackable: low lifetime
Invalidation flavors
- Purge: URL and all variants
- Refresh: remove cache for this exact request and warm cache
- Ban: batch invalidation with regular expression, e.g. subpath
- Tagging: batch invalidation based on tags
Communicating invalidation
- varnishdm command line tool
- Custom VCL and web requests
- Messaging: Unknown Varnish instances (cloud)
Custom configuration for purge
acl invalidators {
"localhost";
}
if (req.method == "PURGE") {
if (!client.ip ~ invalidators) {
return (synth(405, "Not allowed"));
}
return (purge);
}
...
Banning
- Regular expression matching
- On any request headers, not only path
vcl_backend_response {
set beresp.http.X-Url = bereq.url;
set beresp.http.X-Host = bereq.http.host;
}
vcl_recv {
if (req.method == "BAN") {
if (!client.ip ~ invalidators) {
return (synth(405, "Not allowed"));
}
ban("obj.http.X-Host ~ " + req.http.X-Host
+ " && obj.http.X-Url ~ " + req.http.X-Url
);
}
}
Cache Tagging
- Custom header with id of each content item
- BAN request on that header for changed id
$response->withHeader('X-Cache-Tags', 'id-42');
ban("obj.http.x-cache-tags ~ "
+ req.http.x-cache-tags
);
Edge Side Includes
Use Edge Side Includes
Like server side include, but on Varnish:
- Content embeds URLs to parts of the content
- Varnish fetches and caches elements separatly
- Individual caching rules per fragment
- E.g. only some elements vary on cookie, different TTL, ...
Caching and Sessions
Strategies when Caching with Sessions
- Avoid Session, remove when no longer needed
- Cache lookup despite cookies
- Prevent caching when specific
- Vary on Cookies header
- User Context: Cache by group
Wrap-Up
Take-Aways
- Varnish is powerful
- Varnish is dangerous
- KISS VCL!
- Make your application behave correctly first
- Understand what you do, test what you do
Outlook: Use libraries
- Use Varnish plugins (vmods) for special things like basic authentication
- Frameworks provide better models for HTTP request/response, see PSR-7
- FOSHttpCache: Invalidation, integration tests with Varnish
- If you use Symfony, FOSHttpCacheBundle makes your life easier
Outlook: Where to go from here
Outlook: There is more than caching