We’ve finally come to a subject in this series near and dear to my heart- the subject of traffic analysis.
I guess what always fascinated me about traffic analysis is the knowledge that one can actually extract from all the information generated when someone surfs a web site. And how one turns that knowledge into executable strategy to further their marketing efforts.
Back when I got into this business, the metrics were pretty simple. We looked at things like how many times a page was viewed, or how many unique hosts (individual computers) accessed the site. Today, traffic analysis software can be so complex and return such voluminous amounts of information, that it requires a professional to make sense of it all.
Fortunately, there is an affordable middle ground. But let’s back up a minute and discuss the nature of the data we are interpreting. It helps if we understand how the data is created in the first place.
Whenever a web browser requests a file from a web site, certain pieces of information are passed back and forth. A record of that “conversation” is kept in a document on the server called the log file. That record is in the form of a single line of information, which consists of:
- The name of the computer retrieving the file.
- The date and time the request occurred.
- The name of the file requested.
- A numerical code indicating if the transaction was successful or not.
- How many bytes of data were actually transferred.
- The referrer (where the web browser was on the Internet before it made the request to the server; not recorded about 50 percent of the time).
- The type and version of web browser making the request.
- The operating system of the computer making the request.
And when I say this happens for each file, I mean EACH file a web browser asks for to build a web page. So if a particular web page has four graphics on it, that means FIVE lines get recorded in the log file — one line for each of the four graphics and one line for the actual page containing the HTML code that makes up the web page.
In other parlance, each one of these lines is also called a “hit.” You can probably now see why when someone spouts off that their site gets such-and-such number of “hits,” they are spewing out meaningless drivel. The number of hits a site gets is more dependent on how the site was designed than on how many people visit it.
One more thing that’s important to understand about this data is that your web server has no idea how each line in the log file relates to the others around it. All the server is saying to itself is: “Someone wants this file. I don’t know why they want it, or how they will use it. I’m just going to give it to them and record who asked for it, and when they asked for it.”
Drawing relationships between the lines in a log file is what log analysis software does. The software does this by grouping together lines in a log file it thinks were made by the same web browser. (That’s called a session, or more popularly, a visit.) The software then bases all its analyses from that point on out, on these groupings of lines-using them to extrapolate the number of people visiting a site.
Ok, now here is your reality check. The way the software draws that relationship is a bit of a black box. Each piece of software makes different assumptions. Thus, the exact same log file run through two different log analysis packages will produce two different sets of statistics, guaranteed. It is important to also note that the data itself has artifacts in it that affect the accuracy of any final report. How and why these artifacts occur could fill 100 ClickZ articles. Just suffice it to say that there is always a margin of error in any report.
Glen Fleishman, a friend of mine who’s studied this extensively, made a great statement regarding this issue. He says that even though the Internet is computerized from end-to-end and records every transaction, all we know for sure is the date and time a file was requested from our server.
So what do you do? Well you accept it and standardize on one log analysis package you like so your margin of error is consistent over time.
While looking at your access logs is one way to understand what’s going on at your site, there are plenty of others. A big trend in traffic analysis is to put Java Script on individual pages. More on this next week, as well as other traffic analysis techniques and my recommendations for your best traffic analysis solution.