The days of static HTML websites are slowly coming to an end. This is, of course, good news for businesses that want to simplify the process of managing their sites. But one thing even the least tech-savvy user may notice is the longer load times.
These load times are (usually, though not always) a result of a page being built dynamically every time the page is loaded. Rather than grabbing a single file and showing it in the user's browser, the system needs to jump through several hoops before being able to show anything. Imagine, for example, that you have a page that shows a list of products including a title, image, description and a quick testimonial from a satisfied customer. These products may be pre-determined by you, or they may be tailored to each user.
The steps to build this look something like this:
And this is just an overview. There are plenty of intermediate steps involving more database queries, template files etc.
To combat this, we have a technique called caching. Caching is essentially just remembering what your page, or even parts of your page looked like the last time they were loaded. The situation was sort of like me with my wi-fi password. I had a particular device that wouldn't store my wifi password if I used a different connection. So every time I'd need to reconnect this device to my home's wifi, I'd have to go upstairs and ask my sister for the password. She'd write it down for me, I'd connect my device, then lose that piece of paper and have to ask my sister for the password again later.
Now, if I had just remembered the damn password, I wouldn't need to keep asking my sister, so I could skip all the steps of going up stairs, bothering my sister, having her look up the wifi password and going back down stairs. Of course, if we ever changed the wi-fi password, I'd need to go through the process again. But at that point I could just remember the new password, so time would still be saved.
The situation is similar with web pages that have dynamic content. If your system knows that a particular piece of the site hasn't changed, it can simply 'remember' what it looked like before by checking its cache. This sounds simple enough, and conceptually it is. Those who have done web development though will realise that it's it gets quite complex fairly quickly.
What we should cache:
The first thing we'll need to know what data is appropriate to cache and what isn't. As a general rule, the best pages to cache are the ones that change the least. If you have something that updates every hour, or even every five minutes, you can cache this for better performance. Pages that are different every time the page is loaded cannot be cached. This could be caused by something like a timestamp, which updates every second. A less obvious example is the same page being loaded by different users. In the above example, the products printed out are printed for each specific user, even though each user is loading the same page.
A few caching strategies
Cache the entire page
The simplest way to use the cache is to cache the page's entire HTML source. When you do this, you basically cut out all steps after "Load the page itself", so no jumping through hoops to get all the pieces you need. This obviously sounds nice, but it's often not realistic. Remember that whenever a change is made to the page you are caching, the cache has to be refreshed. When you cache an entire page, everything down to the last letter in every comment is saved, and once this is modified, you need to cache this again. This means when a user named Harry loads the page and the greeting at the top says "Hello Harry," when Sally loads the same page, the greeting is changed to "Hello Sally" and the page would need to be re-cached. Also, if your products are different for each user, that will also prevent you from caching.
Still, this provides us with a place to start.
Another simple solution: cache anonymous pages
Although some of the major thorns in the side of page caching are user specific pages, there's no reason we can't cache these pages for anonymous users. Essentially, when a page is loaded, if the user is not logged in, it loads the cached version. If not, it loads the page without looking at the cache. This definitely sounds pretty good in theory, but in practice it actually makes a huge difference.
Recently, one of our trusty co-op students performed some load testing on the home page of our own Drupal Framework. Below are the response times for the homepage whilst being bombarded by requests from sixty concurrent users, first without Anonymous Page Caching, then with Pnonymous Page Caching:
|jmeter response time (ms)||Siege Trans/Sec|
|No Caching||17, 658||2.90 (16 failed)|
|Anonymous Page Caching||2046||28.28|
Clearly Anonymous page caching makes a pretty big difference.
The more general strategy
Although it's great to improve performance for your anonymous users, you'd probably like to speed up your site for all users. After all, is it really fair to seduce newcomers with your blazing fast site, then leave them in the dog house of the reality of your slow site? Naturally, there are solutions available even for those pages that are harder to cache. You may have been wondering, "Why don't we just cache the parts that don't change and load the rest of the page normally?" And this is exactly what you'd want to do. Going back to the example product page, let's say we cached each of the products and loaded each of these products and loaded just the header with the "Hello User" greeting. Now our tree looks something like this (the boxes bordered in blue are cached):
So here we see that the work our server has to do is greatly reduced. This is also where it starts to get complicated. Of course you might want to go through every section of every page on your site, picking out bits to cache, but as this isn't really practical, you'll need to come up with a more general solution. If you're using Drupal, there are a few solutions that have been conceived that you can make use of, but for now I'll just go over the problem.
Going back to the product page, let's first assume that the the products are chosen by the site's administrator.
No problem. But let's say that the products are specific for each user. We obviously can't cache them togther, but if we were to cache every product on the site, we'd only need to check find out which products to print on the page. So if
we have 50 products in total, and products 2, 8 and 27 are recommended for a particular user, the graph would look like this:
So that's not so bad either. But let's make things even more complicated by assuming these are a list of products recently purchased by the user, and we want to show the user how many times he's purchased this product. Now the work for loading each product is...
Now we'd have load the number of times purchased and cache the rest. I'm sure I could make this even more complicated, but it's pretty clear now that caching is an endeavor whose complexity grows along with that of your site.
So of course we could just load the number of times purchased on each page load and cache the rest, but you can see that this is getting more and more complicated.
Do you need this at all?
Perhaps now you feel like all this caching stuff isn't much fun to try to solve. Well, something else you might take out of this might be that it may be best to leave certain things off of your site if they really aren't necessary. Do you really need a counter to inform your user how many times he's bought your product when it's causing you so much grief? Maybe that "Hello user" greeting sounds nice, but in the grand scheme of things just doesn't need to be there.