201: HTML: Caching Generated Output For Speed.

Now that you can generate HTML, why would you ever want to go back to the old way of unchanging HTML?

It takes some amount of work to generate HTML. I’m not talking about the work to program this. Make sure to listen to the previous episode for more information. I’m talking about the work that the computer itself needs to do in order to generate HTML. You might think your computer is busy when it’s starting up an application or when uploading or downloading files, but this is called input/output or just IO for short. The computer might be busy but the processor itself will usually be just sitting around waiting for the application to be read from your hard drive or for your files to be sent over the communication lines. IO can slow down what you’re trying to do but it still leaves the processor with time to cool down.

Generating HTML has the potential to be heavily IO bound but it does need some amount of work from the processor. It shouldn’t require too much work. And the time it needs to wait around for database calls to complete so it has the basic information it needs to put the HTML files together is more than enough to let the processor cool down.

Your web site visitors certainly won’t notice a few extra microseconds as your web server figures out who the visitor is and what content should be returned as HTML.
It seems like a perfect solution. You can put the text of your website articles and pages in a database along with information about how you want your website to look such as what each menu item will do. And you can also store information about the users so your site will know which visitors should get which content.

This whole system is known as a content management system of CMS for short. That’s because you can build up your website by focusing on the content and how it should behave instead of creating all the HTML files yourself. You can let the CMS software running on your web server do all the work. You’ll probably not have to create a single HTML file yourself.

For a normal request, the CMS will generate HTML and send to the visitor. The web server forgets all about the HTML that was just sent. If another visitor lands on the same page, then the same work needs to be done all over again. Anytime you find yourself writing or using code that does anything over and over again, that’s a good opportunity to cache the results. And setting up a reasonable cache system in your CMS will help your web server survive a massive increase in visitors. That extra work needed to generate HTML pages may not be much but if your site suddenly gets popular and you get millions of visitors, then your web server just won’t be able to keep up with the total extra work and your whole website will go down. Caching HTML pages for regular visitors will save a lot of work when they would have all gotten the same HTML files anyway.

Listen to the full episode for more details including advice on when and how to invalidate the cache so that visitors will be able to get updated web pages.

No comments yet.

Leave a Reply