Caching and cache busting

Caching is incredibly important to making our application run fast. There’s no faster form of data retrieval than client-side caching, and server caching is often far superior to having to request and calculate the same information over again. There are many places in our SPA that have the potential to cache data and thus speed up that part of our application. We’ll go through them all:

■ Web storage

■ HTTP caching

■ Server caching

■ Database caching

It’s crucial to think about data freshness when caching. We don’t want to be serving stale data to our applications users, but at the same time we want to be responding to requests as quickly as possible.

9.3.1 Caching opportunities

Each of these caches has different responsibilities and interacts with the client to speed up the application in different ways.

■ Web storage stores strings in the client and is accessible to the application. Use these to store finished HTML from data already retrieved from the server and processed.

■ HTTP caching is client-side caching that stores responses from the server. There’s a lot of detail to learn in order to properly control this style of caching, but after learning and implementing it, we’ll get a lot of caching almost for free.

■ Server caching with Memcached and Redis are often used to cache processed server responses. This is the first form of caching that can store data for different users so that if one user requests some information, it’s already cached the next time someone else requests it, saving a trip to the database.

■ Database caching, or query caching, is used by databases to cache the results of a query so that if it’s turned on, subsequent identical queries return the cache instead of gathering the data again.

Figure 9.2 shows a typical request/response cycle with all of the caching opportunities. We can see how each level of caching can speed up the response by shortcutting the cycle at various stages. HTTP caching and database caching are the simplest to implement, usually only requiring the setting of some configurations, whereas web storage and server caching are more involved, requiring more effort on the part of the developer.

323 Caching and cache busting

9.3.2 Web storage

Web storage, also known as DOM storage, comes in two types: local and session storage.

They’re supported by all modern browsers, including IE8+. They’re simple key/value stores where both the key and the value must be a string. Session storage only stores the data for the current tab session—closing the tab will close the session and clear the data.

Local storage will keep the storage cached with no expiration date. In either case, the data is only available to the web page that stored it. For the SPA, this means that the entire site has access to the storage. One excellent way to use web storage is to store processed HTML strings, enabling a request to bypass the entire request/response cycle and proceed directly to displaying the result. Figure 9.3 shows the details.

We use local storage to store non-sensitive information that we want to persist beyond the current browser session. We use session storage to store data that won’t persist beyond the current session.

User action

Local / session storage

HTTP cache?

Database cache?

Query database

Process query Process

data Process

response Display

Yes Yes

No No No No

Yes Yes

Respond to client Server- side cache?

Cached for everyone

Slower

Cached for individual

Fastest

No cache

Slowest

Figure 9.2 Shortcutting the request/response cycle with caching

User action

HTTP cache?

Database cache?

Query database

Process query Process

data Process

response Yes

No No No No

Yes Yes

Respond to client Server- side cache?

Cached for everyone

Slower

No cache

Slowest

Local / session storage

Display Yes

Cached for individual

Fastest

Figure 9.3 Web storage

Since web storage can only save string values, typically JSON or HTML is saved. Saving JSON is redundant with using an HTTP cache in an SPA, which we’ll discuss in the next section, and still requires some processing to be used. Often it’s better practice to store an HTML string so we can save the client the processing required to create it in the first place. This kind of storage can be abstracted into a JavaScript object, which handles the particulars for us.

Session storage only stores data for the current session, so we can sometimes get away with not thinking too much about the stale data problem—but not always. When we do need to worry about stale data, one method used to force a data refresh is to encode the time into the cache key. If we want data to expire every day, we can include the day’s date in the key. If we want the data to expire every hour, we can encode the hour in there as well. This won’t handle every scenario, but is probably the simplest in terms of execution, as shown in listing 9.2:

SPA.storage = (function () {

var generateKey = function ( key ) { var date = new Date(),

datekey = new String() + date.getYear() + date.getMonth() + date.getDay();

return key + datekey;

};

return {

'set': function ( key, value ) {

sessionStorage.setItem( generateKey( key ), value );

'get': function ( key ) {

return sessionStorage.getItem( generateKey( key ) );

'remove': function ( key ) {

sessionStorage.removeItem( generateKey( key ) );

'clear': function () { sessionStorage.clear();

} } })();

9.3.3 HTTP caching

HTTP caching occurs when the browser caches data sent to it from the server, according to some attributes the server set in the header or according to an industry stan- dard set of default caching guidelines. Though it can be slower than web storage

Listing 9.2 Encoding the time in the cache key

Appends the current date on the key, forcing the session to only cache the data for one day. It’s a quick trick to make sure that the cached data isn’t returned after a certain interval.

These methods abstract the sessionStorage, so that we can replace it with localStorage (or anything else) at a later date without having to change all of our code. They also call generateKey to append the date, so we don’t have to code that in to every storage usage.

325 Caching and cache busting

because the results still need to be processed, it’s often much simpler and still faster than server-side caching. Figure 9.4 shows where HTTP caching sits in the request/

response cycle.

HTTP caching is used to store server responses in the client, to keep from doing another round trip. There are two patterns that it can follow:

1 Serve directly from cache without checking the server for freshness.

2 Check the server for freshness and serve from cache if fresh, and from server response if stale.

Serving directly from cache without checking for freshness of data is the quickest, because we forgo a round trip to the server. This is safer to do for images, CSS, and JavaScript files, but we can also set our application up so that it’ll cache data for a length of time as well. For example, if we have an application that only updates some kinds of data once a day at midnight, then we could direct clients to cache data until just after midnight.

Sometimes that doesn’t provide up-to-date enough information. In those cases the browser can be instructed to check back with the server to see if the data is still fresh.

Let’s get down to the nitty-gritty and see how this caching works. HTTP caching works by having the client look at the headers of the response sent from the server.

There are three primary attributes that the client looks for: max-age, no-cache, and last-modified. Each of these contributes toward telling the client how long to cache the data.

MAX-AGE

In order for the client to use data from its cache without attempting to contact the server, the header of the initial response must have the max-age set in the Cache-Control header. This value tells the client how long to cache the data before making another request. The max-age value is in seconds. This is both a powerful capability and a poten- tially dangerous one. It’s powerful because it’s the quickest possible way to access data;

Local / session storage

Display Yes User

action

Database cache?

Query database

Process query Process

data

No No No No

Yes Yes

Respond to client Server- side cache?

Cached for everyone

Slower

No cache

Slowest

Display Yes User

action

Database cache?

Query database

Process query Process

data

No No No

Yes Yes

Respond to client

Cached for everyone

Slower

No cache

Server- side cache?

Slowest

Local / session storage

Cached for individual

Fastest

HTTP cache?

Process response Yes

Figure 9.4 HTTP caching

apps running with data cached in this way will be very fast once the data has been loaded.

It’s dangerous because the client no longer checks with the server for changes, so we’ll have to be deliberate with them.

When using Express, we can set the Cache-Control header with the max-age attribute.

res.header("Cache-Control", "max-age=28800");

Once the cache is set in this way, the only way to bust the cache and force the client to make a new request is to change the name of the file.

Obviously, changing the names of files every time we push to production isn’t desirable. Fortunately, changing parameters passed in to the file will bust the cache.

This is typically done by appending a version number or some integer that our build system increments with every deployment. There are many ways to accomplish this, but the one we prefer is to have a separate file that has our incrementing value in it and append that number onto the end of our filename. Because the index page is static, we can set up our deployment tool to generate the finished HTML file and include the version number on the end of our includes. Let’s take a look at listing 9.3 for an example of what the cache buster would look like in the finished HTML.

<html>

<head>

href="/path/to/css/file?version=1.1 /><

</head>

<body>

</body>

</html>

Another use of max-age is to set it to 0, which tells the client that the content should always be revalidated. When this is set, the client will always check with the server to make sure that the content is still valid, but the server is still free to reply with a 302 response, informing the client that the data isn’t stale and should be served from cache. A side effect of setting max-age=0 is that intermediate servers—those servers sit- ting between the client and the end server—can still respond with a stale cache as long as they also set a warning flag on the response.

Now, if we wish to prevent the intermediate servers from ever using its cache, then we’ll want to look into the no-cache attribute.

NO-CACHE

The no-cache attribute, according to the spec, works in a manner similar enough to setting max-age=0 to be confusing. It tells the client to revalidate with the server before using the data in cache, but it also tells intermediate servers that they can’t serve up stale content, even with a warning message. An interesting situation has come

Listing 9.3 Bust the max-age cache

The cache buster, version=1.1

327 Caching and cache busting

up in the last few years, because IE and Firefox have started to interpret this setting to mean they shouldn’t cache this data under any circumstances. That means the client won’t even ask the server if the data it last received is fresh before reserving it; the client won’t ever store the data in its cache. That can make resources loaded with the no-cache header to be unnecessarily slow. If the desired behavior is to prevent clients from caching the resource, then the no-store attribute should be used instead.

NO-STORE

The no-store attribute informs clients and intermediate servers to never store any information about this request/response in their cache. Though this helps improve the privacy of such transmissions, it’s by no means a perfect form of security. In properly implemented systems, any trace of the data will be gone; there’s a chance that the data could pass through improperly or maliciously coded systems and is vulnerable to eavesdropping.

LAST-MODIFIED

If no Cache-Control is set, then the client depends on an algorithm based on the last-modified date to determine how long to cache the data. Typically this is one- third of the time since the last-modified date. So, if an image file was last modified three days ago, when it’s requested, the client will default to serving it from cache for one day before checking with the server again. This results in a largely random amount of time a resource will be served from cache, dependent on how long it has been since the file was last pushed to production.

There are many other attributes dealing with caches, but mastering these basic attributes will significantly speed up application load time. HTTP caching enables clients of our application to serve up resources it has seen before without needing to request the information again, or with a minimum of overhead in asking the server if the resource is still fresh. This speeds up our application on subsequent requests, but what about identical requests made by other clients? HTTP caching doesn’t help there; instead the data will need to be cached on the server.

9.3.4 Server caching

The fastest way for a server to respond to a client-side request with dynamic data is to serve it from a cache. This removes the processing time it takes to query the database and marshal the query response into a JSON string. Figure 9.5 shows where server caching fits into the request/response cycle.

Two popular methods of caching data on the server are Memcached and Redis.

According to memcached.org, “Memcached is an in-memory key-value store for small chunks of arbitrary data.” It’s purpose-built as a temporary cache of data retrieved from a database, API call, or processed HTML. When the server runs out of memory, it’ll automatically start dropping data based on a least recently used (LRU) algorithm.

Redis is an advanced key-value store and can be used to store more complex data struc- tures, such as strings, hashes, lists, sets, and sorted sets.

The overall idea for the cache is to reduce server load and speed response time. When a request for data is received, the application first checks whether the response for this query has been stored in cache. If the application finds the data, it serves it to the client. If the data isn’t cached, it instead makes a comparatively expensive database query and transforms the data into JSON. It then stores the data in the cache and replies to the client with the results.

When we use a cache, we must consider when the cache needs to be “busted.” If only our application writes to the cache, then it can either clear or regenerate the cache when the data changes. If other applications also write to the cache, then we need them to update the cache as well. There are a few methods to work around this:

1 We can invalidate caches after a set length of time and force a refresh of the data.

If we do this once an hour then there will be up to 24 times throughout the day with a cache-free response. Obviously, this won’t work for all applications.

2 We can check the last updated time of the data, and if it’s the same or earlier than the cache timestamp. This will take longer to process than the first option, but it may not take as long as a complex request takes, and we’ll be assured that the data is fresh.

Which option we choose is dependent on the needs of our application.

Server caching is overkill for our SPA. MongoDB offers excellent performance for our sample data set. And we don’t process the MongoDB response—we just pass it along to the client.

So when should we consider adding server caching to our web application? When we find our database or web server is becoming a bottleneck. Usually it’ll reduce the load on both the server and the database, and improve response time. It’s certainly worth trying before purchasing an expensive new server. But remember that server caching requires another service (like Memcached or Redis) that will need to be mon- itored and maintained, and it also adds complexity to our application.

Cached for individual

Fastest

HTTP cache?

Process response Yes Local /

session storage

Display Yes User

action

Database cache?

Query database

Process query Process

data

No No No No

Yes

No cache

Slowest

Yes

Respond to client Server-

side cache?

Cached for everyone

Slower

Database cache?

Server- No side cache?

Cached for everyone

Slower

Figure 9.5 Server caching

329 Caching and cache busting

Node.js has drivers for both Memcached and Redis. Let’s add Redis to our application and use it to cache data about our users. We can visit http://redis.io and follow the instructions to install Redis on our system. Once installed and running, we can confirm it’s available by starting the Redis shell with the command redis-cli.

Let’s update the npm manifest to install the Redis driver as shown in listing 9.4.

Changes are shown in bold:

{ "name" : "SPA",

"version" : "0.0.3",

"private" : true,

"dependencies" : {

"express" : "3.2.x",

"mongodb" : "1.3.x",

"socket.io" : "0.9.x",

"JSV" : "4.0.x",

"redis" : "0.8.x"

} }

Before we get started, let’s think about what we’ll need to be able to do with a cache.

Two things that come to mind are setting a cache key-value pair and getting the cache value by key. We also will probably want to be able to delete a cache key. With that, let’s set up the node module by creating a cache.js file in the lib directory and filling it in with the node module pattern and methods to get, set, and delete from the cache.

See listing 9.5 for how to connect Node.js to Redis and set up the skeleton of the cache file.

* cache.js - Redis cache implementation

/*jslint node : true, continue : true, devel : true, indent : 2, maxerr : 50, newcap : true, nomen : true, plusplus : true, regexp : true, sloppy : true, vars : false, white : true

/*global */

// --- BEGIN MODULE SCOPE VARIABLES --- 'use strict';

var

redisDriver = require( 'redis' ),

redisClient = redisDriver.createClient(), makeString, deleteKey, getValue, setValue;

// --- END MODULE SCOPE VARIABLES --- // --- BEGIN PUBLIC METHODS --- deleteKey = function ( key ) {};

Listing 9.4 Update the npm manifest to include redis—webapp/package.json

Listing 9.5 Start the redis cache—webapp/cache.js

Definition, a little history, and some focus

Advanced variable hoisting and the execution context object