How to Make Robots Cry With Faceted Navigation

Usability engineers have spoken at length about faceted navigation. Humans do not enjoy navigating through many levels of intermediate pages on category-based websites – where, at any point they may click the wrong thing and reach a dead end. Marketers enjoy this as much as they enjoy a bad case of the flu, and a faceted navigation scheme may just be the panacea.

Unfortunately, much less has been said about the search marketing and optimization problems that arise with this form of navigation.

Faceted navigation is an almost-universally positive experience for humans. Two of the most important benefits are:

  1. Facets permit users to combine selections to zero in on results.
  2. Facets permit users to make those selections in any order.

Many of the big box stores have already implemented it. This includes well-known brands such as Amazon, Home Depot, and B&H. may also be remembered as one of the earliest to take the plunge. Just as the so-called “Web directories” have all but evaporated from the thread of the Web, category-only navigation might soon do the same. Does anyone actually use a directory in real life? And not just for the links!

It turns out that many decision-making processes are not facilitated by a rigid category-only navigation scheme. A category tree assumes an exact process in an exact order. It can only be walked in that order – for example, “Home > Televisions > Flat Screen > LCD > 50.” The user might select different options such as LED instead of LCD, but the path for each decision would be the same.

That looks like it makes sense on the surface. But perhaps size matters more than underlying technology for some users – 60″-plus or no deal. The user might not care whether the television is LCD or LED – or DLP. And here, faceted navigation rises to the task. The combination of a shallow category tree for basic decisions – with a series of facets – may be the best approximation of a talented salesman discovered yet.

Humans don’t like to conform this way – we like choice. The problem is that while humans exhibit intelligence and intent, we only click combinations relevant to us; robots do not. A robot has nothing better to do, and often no other choice, than visit most everything. That’s a problem when a robot can combine selections and do so in any order. Do the math. Technologies that are often great for users like intra-facet multiple value selection only make the numbers far worse.

The result: a massive spider trap on any website with over a few thousand products.

Even a seasoned search marketer might immediately begin by throwing new technologies such as rel=”canonical” at the problem, thinking they’re the latest and greatest. Or she may surrender and conclude there is simply no way except to exclude the entire trap. However, this is not the case. Rather, as is the case much of the time, throwing technology at a problem – or making drastic decisions – often doesn’t lead to the best answer.

One option is a creative application of traditional robots.txt-based exclusion. Exclusion makes sense because the content generated by facet pages is only similar, not duplicate. We have many thousands of combinations of unique, but undesirable pages.

One might generate undesirable URLs with a prefix such as “/noindex/,” but not do so for the desirable ones. For example: would not be spidered, but: would be. This works because the robot knows a priori – based on the URL – which pages are useful and which are not. The method to determine which links should be built which way is not the concern of the robot, but one reasonable simple way is not to allow more than a few selections. This greatly reduces the size of the spider trap.

Note that meta-exclusion, though similar in purpose, is almost certainly the wrong decision – and for this same reason, canonicalization cannot work as well. Since it is on-page, a bot must crawl that same spider trap of offending pages to know about the exclusions. In fact, any on-page methodology is well-suited only for small-to-medium quantities of content to be excluded (or canonicalized).

Implemented with these concerns in mind, a website with a faceted navigation scheme can help to create a set of legitimate landing pages that may complement subcategory pages – for users as well as search engines and PPC campaigns. In fact, one could probably debate whether some facets are subcategories or vice versa. If this is true, it stands to reason that excluding everything wholesale cannot be optimal.

Implemented carefully, exposing many facets to robots will create high-quality landing pages for long-tail searches. Without care, however, faceted navigation will weave a giant spider trap with seemingly-infinite URL permutations of the same products. In fact, some faceted search implementations – B&H included – do “address” this concern by excluding their faceted navigation pages from search engines entirely.

Unfortunately, this is not a solution, and anything but optimal – as one might expect after observing the overlapping functionality and purpose of subcategories and facets. There may be no one universal solution, but the above approach may keep a robot from shedding blood, sweat, and tears while spidering. A best solution is elusive, and I’m open to new ideas and suggestions.

This column was originally published in SES Magazine, March 2011.

Related reading

Website landing page vector graphic