Table of Contents
The WWW superservice supports four log file formats which makes it possible to support a wide range of web servers like Apache, IIS or Boa.
Common Log Format (CLF) is a standard log format that was originally implemented in the CERN httpd web server but that is supported nowadays by most web servers. Apache, IIS and Boa can be configured to log in that format.
The Common Log Format has the following format:
remotehost rfc931 authuser [date] "request" status byteswhere the fields have the following meaning:
The host that made the request. This can be given as an IP address or a hostname.
The result of an ident lookup on the host. This is usually never used.
The authenticated username.
The timestamp of the request.
The first line of the request. Usually in the format "method file protocol".
The result status of the request. i.e. 200, 301, 404, 500.
The size of the response sent back to the client.
Example of log lines in Common Log Format :
127.0.01 - - [11/03/2001 12:12:01 -0400] "GET / HTTP/1.0" 200 513 dsl1.myprovider.com - francis [11/03/2001 12:14:01 -0400] \ "GET /secret/ HTTP/1.0" 200 1256
The combined log format is an extension to the Common Log Format. It adds informations about the user agent and referer. It is also known as the extended common log format. It was first implemented in the NSCA httpd webs server but is now supported in many web servers. Apache can be configured to use this log format.
Two fields are added at the end of the common log lines :
"referer" "useragent"
The content of the Referer request's header. This usually reflects the page the user visited before this request.
The content of the User-Agent request's header. This usually reflects the browser that the user is using.
Mod_gzip is another extension to the common log format. It is used by the mod_gzip Apache extension which can be used to compress the result of the requests before sending them to the client.
mod_gzip is a module developed by RemoteCommunications, Inc. Sourcecode is freely available from http://www.RemoteCommunications.com/apache/mod_gzip/mod_gzip. More informations can be found in their FAQ.
mod_gzip can log informations about the compression of pages. To enable this, one can configure Apache to log using the 'gzip' format which can be defined as follows:
LogFormat "%h %l %u %t \"%r\" %>s %b %{mod_gzip_result}n \ %{mod_gzip_compression_ratio}n" gzip
This adds two fields at the end of each common log line:
gzip_result compression_ratio
The gzip result code. Usually OK.
The ratio by which the content was compressed. A number from 0 to 100.
The Referer log format is an old format that was implemented in the NSCA httpd server. It was used to log informations about the request's referer in a separate log file. The combined log format has made this log format obsolete.
Referer log files have the following format:
uridocument
The referring URI. This is the content of the Referer request's header which usually reflects the page where the user was before that request.
The local document that was referenced by that URI. This is the requested file without any query string.
You may encounter log files that have a field containing the virtual host for which the requests was at the beginning of the line. The rest of the line is usually in the common or combined log format. This kind of logging is typically seen on webservers hosting several virtual servers.
Example of such a line:
www.example.com 1.7.2.21 - - [13/Oct/2000:10:30:16 +0200] \ "GET / HTTP/1.0" 200 83
Altough Lire doesn't directly support such logs, it is easy to split those logs into many log files in the common or combined log format which can subsequently be processed by Lire.
Example doing this in a shell:
$ mkdir apache-common.log $ (while read virt rest; do echo $rest >> \ apache-common.log/$virt; done) < /var/log/apache/common.log $ for f in apache-common.log/*; do \ lr_log2mail -s "$f" www common joe@example.com < $f; done