How can you tell apart man and machine when looking at web logs? All these hits - was it somebody browsing your website, or what it a crawler collecting information for a search engine? An automatized tool scavenging email addresses for the next spam attack?
I used to think of a web log as a record of what people have downloaded how often from my server. Not quite so easy!
First off, not every request is a download. Maybe just the header has been requested (caching proxies do that to check if something has changed), or form data has been posted, or maybe someone used the web server as a proxy. This all depends on the operation, and, in case of proxy traffic, on the URL (if it starts with "http://", it's a proxy request).
Even if the request was a download of an URL (a GET operation) it might not have been successfull. Maybe the URL did not exist, or the user wasn't properly authorized, or the server had a bad day. The result code tells us how things went.
And after all that, the URL might not have been requested by a person (with a browser), but by a machine. Search services use crawlers to automatically download whole web sites and index them. Link checkers might probe for the correctness of external links to your site in other web pages. Spammers might try to extract email addresses from your pages.
If we want to answer the question; "what have people been looking at", we need to filter for requests that are
This is what "user filtered" reports are about. We are of course interested in requests that used other operations, employed the web server as a proxy, failed, or were initiated by a machine. But they are the subject of other, surely interesting reports!
© 2003 Christian Treber, www.ctreber.com
Back to main page.