The Zen of Serving Web Pages

Web Server Know How © 2003 Christian Treber

Words and Their Meaning

Christian Treber, Senior IT Consultant and Internet Services Specialist

An agent is any program that is used to access a web server. This includes browsers, crawlers, and link checkers.
A browser is a program that enables you to surf the web. On your behalf they requests URLs from web servers and display the transmitted document in a window. Links to other web pages are usually displayed as underlined text, and can be clicked upon.
Search engine crawlers or crawlers for short mechanically try to read each page on a web site for adding them to the data base of a search engine such as Google or AltaVista. Other varieties exist: there are crawlers which try to extract email addresses for spamming purposes.
Deep linking
Deep links are direct links to content deep in a web size (pretty much everything but the start page) that bypass the usual "access route". Some site operator don't want that since i.e. advertisment pages are bypassed.
The IP addresses can be broken down into host and domain. The domain identifies a network, while the host is an address on that network. While hosts with the same name might exist on different networks, the "fully qualified host name", consisting of host and domain, is always unique.

IP addresses in name form such as "" can be broken down into host "www" (this is the first part) and domain "" (all the rest). The domain itself is a hierarchy separated by dots. From the last to the first part of the domain the specification gets finer and finer. In example, the IP address "" can be read as "the host archimedes at the educational institution of the University of Hawaii at the campus of Manoa, Maths department".

IP addresses in name form such as ""

come in three different flavors, class A, B and C. Addresses with the first number in the 0 to 63 range are Class A networks. The first number is the domain, the rest is the host. The 64 to 191 range are Class B networks with the first two numbers indicating the domain and the last two the host. Class C networks range from 192 to 254. The first three numbers make up the network, while the last number is the host.

A download tool or downloader for short allows for the download of (typically large) files. The main advantage over downloads with most web browsers is that downloaders can stop and resume downloads. So even if you need to disconnect your Internet link or the connection gets broken, the download can be completed without having to start at the beginning.
See domain
Link checker
A link checker is a program which scans all the pages of a website for internal and external links that might be broken. Running such a program on your own web site creates a number of hits. When other people are running link checkers on their web site, the link checker might probe external links as well. This is how some of your pages get hit by a link checker though you didn't use one: someone has tested his links to your pages.

Name form
IP addresses in number form consist of four numbers in the range from 0 to 255 that are commonly written down separated by dots (i.e. ""). IP addresses in number form often can be translated into the name form by using the Domain Name Service (DNS), performing a "reverse lookup".
Number form
IP addresses in name form consist of names separated by dots (i.e. ""). Valid IP addresses in name form always can be translated into the number form by using the Domain Name Service (DNS), performing a "lookup".
Offline reader
Offline readers fetch pages on the users' behalf for later consumption. Some browsers such as Microsoft Internet Explorer offer offline reading capabilities.
In this context "platform" is just another word for "operating system" (on the machine the browser or agent runs on).
A proxy server is a server between a browser and a web server. The browser sends a request to the proxy, which in turn forwards that request to a web server (or even another proxy). The proxy can maintain an internal cache, making it a "caching proxy" or "cache" for short. When a request is made, the proxy checks if it has the page in the cache. If yes, it might perform a short check with the web server to see if the page is outdated. If not, it serves the page without having to transfer the whole page from the web server.

This procedure mainly serves two purposes: The requests can be served faster, and the network traffic from the proxy onwards is reduced.

A robot is an automatized agent that read and processes web pages for puposes such as indexing for a search engine. By convention they check the contents of the file "robots.txt" to determine if their presence is wanted or not, and to find out about how they should behave.
Top Level Domain, TLD
The last part of an IP address in name form is the top level domain or TLD. It might denote a class (as in "com" for "commercial") or a country (as in "fp" for French Polynesia).