php, webdev

ZendCon 2010 – Caching on the Edge

PHP Caching on the Edge

The goal is to never generate the same response twice.
This is another ‘notes post’ – nothing too organize here (for now).

Changing the HTTP headers with php:
header(‘Content-Type: text/plain’);

Caching  spec – read it on the night you feel like you can’t sleep.

1. HTTP expiration

Is the data fresh? only when the version is stale we will go to the server to get a new data.
HTTP Headers for Expiration

2. HTTP validation

Last-Modified / If-Modified-Since
ETag / If-None-Match

(!) Http cache headers only work with ‘safe’ HTTP methods (GET/HEAD) – meaning, these method won’t change the application state.

Expires – give the date/time after which the response is considered stale.
$expires – gmdate(‘D, j M Y H:i:s T’, time()+5); // expire in 5sec
header(‘Expires: ‘ . $expires);

* With expiration of less then few days you might hit problems because the clock in the web server and the clock in the client is not showing the same time.
* HTTP/1.1 spec says that you are not allowed to send expire for more then 1 year in the future. WHY?

A better way:
* Use cache-control
header (‘Cache-Control: max-age=5’);

(!) So use expire only for things you want to cache for very long time (more then 48h).
On other cases, use cache-control.

Usage of ETag

You can compute the etag on your own or use a tool that do the work for you (some good framework).

* Expiration wins over Validation – first we checking the expiration.
* Expiration allows you to scale as less requests hit your server.
Validation saves bandwidth.

PHP and cache

$_SESSION['foo'] = 'bla';

(!) By default – when you set a cookie… php set for YOU no-cache/no-store/must-revalidate
Because if you have a cookie, you don’t want to cache the page.
It’s more a ‘safe’ default. So no one else could take this information from you.

header(‘Cache-Control: private/public, max-age=5’);
private – will prevent any proxy from cache this page.

Types of Caches

1. Browser cache – for example it’s good for images.

2. Proxy cache – inside a company/organization. It still on the ‘client side’ and it will mask all the clients that site behind it.
We can find it in big companies and lots of ISPs. It’s public.

3. Gateway cache (=reverse proxy or Http Accelerator or Surrogate Cache) – This is just like a proxy BUT it is on the server side.
Shared cache on the server side. Make your site more scalable, reliable and improve performance.

Gateway cache

In practice, most caches avoid anything with: cache-control, cookie, www-authenticate, post/Put, 302/307 status codes.
Cache-Control in php save us with a safe default of ‘private’.

A gateway cache won’t cache anything ‘private’ or carrying a cookie – so how can we use Google Analytic and still have cache?
use varnish… or any other proxy that remove+add the cookies before the request hit the server.

In cases where you want your code to run with reverse proxy like Akamai/varnish (esi tag is supported) you should use:

Surrogate-Capability: abc=”Surrogate/1.0 ESI/1.0″
Surrogate-Control: content =”ESI/1.0″ <– this will let the reverse proxy to parse the tags.

(!) You need to set these features in Varnish (they are not there by default).

Symfony2 – got a reverse proxy that is written in PHP. You might want to use it if you can’t afford to use varnish (it’s free – but let say your hosting company won’t install it)

In the end of the day, you want to hit the application as less as possible.
You can do it with HTTP headers and ESI.

Last but not least, please remember that, the one who picks his targets carefully, acts quietly, and achieves his objectives — he wins wars.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s