AC.productions - The Zen of Serving Web Pages - The Philosophy of Hits, Views, and Visits

The Zen of Serving Web Pages

Web Server Know How © 2003 Christian Treber

The Philosophy of Hits, Views, and Visits

Christian Treber, Senior IT Consultant and Internet Services Specialist

What sound does a web server make when it gets hit? What is a hit anyway, and what is a view, or a visit? The first questions I can't explain for sure, but to the second I have some definite answers.

Hits. When you type in an URL in your browser and hit return, the browser requests the specified file from the web server. This access of a single file commonly is referred to as a "hit". But hits alone don't tell you that much about what is going on. Why?

Example: Page A contains three images. Page B consists of three frames, loads three style sheets, and contains 34 images. When a user accesses page A, a total of 4 hits results, one for the page (the HTML file) and and three for the embedded images. One access to page B creates 41 single hits! One for the main HTML file, three for the frames, three for the style sheets, and 34 for the images.

Views. In both cases just one page was viewed, which is the really interesting information. So a "page view" or "view" for short is a hit to a page. A page most often will be a static HTML file. Since more and more pages are generated dynamically, other extensions to look for (apart from ".html" and ".htm") are ".jsp", ".php", ".pl", and ".asp".

In case of pages that use frames the HTML file with the frame set definition is the page we're looking for. It's a bit hard to filter out the individual frames because they simple are HTML documents as well. Only some naming convention could help, in example appending "-top", "-left", "-right" etc. to the file name.

The hit information becomes interesting again when you are looking at transport volume. You're probably going to find out most volume is generated by images, and not by (text) pages.

Visits. You might be interested in how many people have been visiting, and what they did during their visit. A visit or session comprises of all actions between the first and the last access to a page. "Page visits" or "visits" for short record which pages have been viewed during a visit. Each page is only counted once per visit, which makes up the difference to a view.

Since more than one person might be browsing your pages at the same time their individual hits are nicely "mixed" over time in the web log. How do you determine which hits belong to one individual?

One solution is session tracking. This method is based on the assumption that accesses from the same address within a certain time span are coming from the same person. Session tracking provides some interesting data such as entry page, page transitions (from/to page), page view time, exit page, session duration, page views per session and more.

But session identification is based on the assumption "one host, one person", and that often is not true. In reality many users access the web through proxies, and all user requests that get served by the same proxy seem to be coming from that proxy address.

The result: many requests from different users get interpreted as coming from a single user, rendering the session information useless. Still, if you're getting not too many hits (through one proxy), the session information probably is quite good.

How could we identify the specific user behind all these proxies? Since the original address might get obscured we would need some information specific to the browser of that user that does not get changed on the way. This could be accomplished through cookies or URL "decoration" with a session ID (known as "URL mangling").

But, alas, Cookies don't get logged in the web server log, (actually, why not?), and URL mangling requires dickering with the server, and that might not be under you control. So we're left with the flawed method of associating host addresses with users. Beware!

© 2003 Christian Treber, www.ctreber.com

Back to main page.