Get Real About Faux Precision in Web Analytics

Following my last column, many industry folks shared their pet peeves with respect to Web and audience analytic systems. All raised important issues on the subject. One really struck a chord, so I decided to make it the topic for this column. I’ll call it the Significant Figures Problem. It concerns the disconnect between the proclaimed precision in the tracking of Web behaviors and the incredible lack of precision in the underlying systems and methodologies.

Too many people in this industry use and talk about the output from their Web and audience analytics systems in misleading ways. It’s not uncommon to hear site publishers speak about their Web traffic with a purported precision: “We had 2.47 unique visitors per second last month on the Breaking News section in the 10-11 a.m. weekday daypart, up from 1.675 average uniques in the same month the year before.”

That’s not only mind-numbing, but it’s unfounded and unnecessary. They use decimal points and detailed trending analyses to express Web visitor counts, pages consumed, and time spent on pages in a manner that gives the impression Web consumers can be tracked that accurately. Get real!

Before we unleash spurious numbers on senior management teams, analysts, investors, and the trade press, try to remember middle-school science lessons and the concept of significant figures.

Remember significant figures? For those fortunate souls who’ve banish the notion of significant figures from their brains, a simple refresher (disclaimer: I was trained as a political scientist and lawyer, not a statistician).

Significant figures is a scientific rule for rounding numbers. It’s about accuracy when using numbers that are rounded up or down. Although the correct result of multiplying 2.5 by 2.5 is 6.25, significant figures says you should throw out the number in hundredths’ position before relying on the results. Why? Because neither of the numerators (the numbers multiplied) were as precise. Round the answer to 6.3 and forget the hundredths. (This will be on the final quiz!)

Too many people in our industry are too carried away with the incredibly precise numbers our Web and audience analytics systems generate. They don’t have any real sense of whether the perceived precision is justified or, more important, whether it matters.

Much so-called precision is entirely unjustified. The underlying systems and processes behind serving Web pages to browsers aren’t accurate enough to provide that degree of precision. Network latency and rogue spiders alone can skew server counts 10 to 20 percent. Who, with a straight face, can talk about counts and trending lines with decimal-point precision?

So much numerical precision in audience analytics doesn’t even matter. Even if such precise numbers could be proven true and correct, there’s not much you can do with them that you can’t do with exponentially less precise numbers. Giving an ad agency inventory projections with single-point precision for an ad campaign’s delivery doesn’t mean too much when the counts for the agency ad servers and the publisher’s servers typically differ by 10 percent.

In throwing about numbers with so much claimed precision, we as an industry are inflicting irreparable harm upon ourselves. We establish expectations we can never live up to. Moreover, we risk antagonizing those who follow simpler, more reliable rules related to delivering value in media and marketing (and that’s almost everyone with money).

Numbers you should never throw around loosely, and other issues to watch:

  • Unique visitors. This statistic appears to indicate how many different people visited a site in a given period of time. Yet most log-file analytics systems count one person who visits four times a day as four unique visitors. Most cookie-based systems can’t tell how many people use the same browser or how many people access a site from several different browsers. The former counts too few visitors, the latter too many.

  • Time on page. This statistic appears to indicate how much time a person spends on a page consuming content. Yet most server-log systems only indicate how much time passed since the last page was called. They never know if the user keeps the browser open in the background, under other active pages; makes a phone call; or hops downstairs to buy a coffee. Browsers such as NetCaptor often keep numerous pages open and tabbed as a convenience to the user.
  • Spiders and crawlers. The Internet’s full of ’em. They grab pages like humans but are robots on a mission. They can increase the Web page and click counts of some sites more than tenfold. They’re particularly prevalent in college towns, where every student seems to be in on the act. They create new spiders 10 times faster than industry groups and software filters can find them and figure out how to remove them from counts. Of course, many publishers would prefer to keep this dirty little secret and never filter them.

With the Internet, we have the most addressable and measurable marketing medium of all time. Let’s not screw it up by pretending it’s more exact than it is — particularly while it’s still in its early phase of development.

Related reading

site search hp
ga hp