Cache Only, Please

If you wanted a bag of Ruffles. potato chips, chances are you wouldn’t drive to Frito-Lay headquarters in Plano, Texas. Unless you live in Plano, or are really bad with driving directions, you’d probably wind up in aisle four of the local supermarket, snooping around the pretzels.

Yet if potato chips were like web pages, most of us would find ourselves wandering the northern suburbs of Dallas whenever we had a hankering for something to go with our onion dip. As an artifact of the web’s original design, nearly all requests for web content require a direct connection with the original source, or origin server.

Up until now, this content distribution model has been problematic but sufficient for many Internet businesses. But as Internet traffic and the demand for higher-bandwidth content grow exponentially, it will not scale for businesses counting on greater levels of site accessibility and uptime.

Would you like some help back to your car with that MPEG?

Like many other companies with global markets, Frito-Lay recognized the efficiency of shipping goods to a network of distribution points long ago. In this model, consumers need only pick up the last mile in the delivery chain by dealing with the most convenient distribution point — e.g., local supermarkets and retailers.

Recognizing the potential for similar efficiencies on the web, a growing number of content providers are looking to distributed caching solutions to better consolidate and manage aggregate demand for Internet content. By caching content at distribution points “closer” to the end user, web sites can improve performance and availability while reducing bottlenecks at their origin servers. Web sites can also better manage traffic spikes — offloading event-driven traffic surges to a wide network of servers.

Mirror, Mirror on the Web

Some distributed caching solutions include web site mirroring, which was globally implemented as early as 1994 when Sun Microsystems sponsored the first Winter Olympics on the web. Mirroring involves hosting replicated versions of a web site at various points of presence.

While mirroring is one of the oldest forms of distributed content caching, it can also be one of the most difficult and expensive. Mirroring solutions are frequently proprietary, requiring custom development and the overhead of running remote data centers. This isn’t good if you’re still hocking grandma’s jewelry to cover just one web hosting facility.

Recognizing a market opportunity, a number of vendors have entered the fray with a variety of competitive — and collaborative — offerings. These include software-based solutions such as Inktomi’s Traffic Server, appliances such as Cisco’s Cache Engine, and services such as Akamai’s Freeflow and Sandpiper’s Footprint. Of these, business web sites will relate most to the services rather than the products.

The Full-Service Network

In the past year, distributed caching services have evolved from invasive applications requiring custom software installation to transparent services with supporting tools for monitoring and control. While only a few sites have yet adopted these services, Yahoo now delivers all of its advertisements through Akamai.

The better services run a sophisticated version of the Domain Name System, or DNS. These systems direct web page requests towards the best distribution points for the user based on up-to-the-minute conditions, such as network traffic and server load.

Referring back to our Ruffles. analogy, it’s as if your request for potato chips directed you to the ideal supermarket based on location, traffic conditions, and the current length of the checkout lines.

After the content is initially requested from the origin server, caching servers located at these distribution points retain a copy for local redistribution. Content updates can also be pushed out to these points in anticipation of local demand when these services are alternatively run in what’s called reverse proxy mode. In either case, the caching server then delivers this local copy to the subset of Internet users who connect best to that server.

But Can It Slice a Tomato?

Despite these promising benefits, distributed caching services have their limitations. While the latest generation supports streaming media and consolidated (and advertising-friendly) logging, in practice, these services don’t always measure up to their anticipated performance gains.

Another major drawback is the cost. Many charge three times as much as the local ISP for the equivalent bandwidth, justifying it with improved user experience and a reduced load on your origin servers. (“Sorry, grandma — I never heard of that pawn shop.”)Furthermore, these services are typically much less practical for dynamic or personalized content than for static content shared among many users. For example, distributed caching couldn’t have saved eBay from its recent meltdowns. And while data cached as XML files might find more flexible reuse within personalized applications, XML proliferation is still a ways off.

However, the market is improving in this regard. Sandpiper currently supports cookie proxying to mediate a level of targeting between the user and the origin server. A future release of their service promises to support the assembly of customized web page components at the caching server.

There are also satellite-based caching services, such as SkyCache and iBeam. But these services are founded on the heavily oversold idea of using satellites for Internet traffic. They don’t offer much practical value to content providers for the expense, and they typically appeal more to ISPs.

Take it from Greg, who developed satellite communication systems in the late 1980s: Satellites are not the future of two-way media like the Internet.

Cache Test Dummies

Internet Service Providers were the first to use distributed caching — primarily to save on bandwidth connecting the ISP to the outside world. Now the major online content providers are poised to become the next adopters.

Broad adoption of distributed caching will require a subtle shift in the value web sites place on their visitors. Web sites must be willing to pay for improved access to their customers — not unlike Internet users who pay for improved access to content via DSL or cable modems.

So what does this mean for your web site? Distributed caching isn’t for everyone, but it becomes attractive for sites that meet one or more of the following criteria:

  • Heavy traffic loads

  • Widely distributed user base (especially international users)
  • Broadband content
  • Support for wireless devices
  • Frequent online events and webcasts

“How come My Yahoo looks just like Your Yahoo?”

Distributed caching is far from new to the Internet. Internet newsgroups have used a similar model of content distribution for nearly two decades. Therefore, it’s no surprise that the web has evolved to where there’s a real demand for caching.

Until now, most web sites have experienced network caches as problems to work around. Problems include unrecorded ad impressions and multiple customers who receive someone else’s personalized page. Similar to the email hoaxes that circle the Internet and return periodically like comets, your web content is at the mercy of the Internet’s distribution and caching mechanisms once it leaves your servers.

Even if you don’t take advantage of network caching, prepare to at least learn the rules of how it is used and plan for it in your site development. As more of these network caches are implemented across the Internet, web sites will be better off learning how to work with them rather than against them.

Related reading