Chapter 13. WWW Reports

Table of Contents

Supported Log Format
Common Log Format
Combined Log Format
CLF With mod_gzip Extensions
Referer Log Format
Logs With Virtual Host Information
W3C Extended Log Format
Reports' Descriptions and Configuration
Bytes By Day WWW Report
Bytes By Period WWW Report
Bytes Per Directory WWW Report
Bytes By HTTP Result By Day WWW Report
Bytes By HTTP Result By Period WWW Report
Bytes By HTTP Result WWW Report
Client Hosts by Day WWW Report
Client Hosts By Period WWW Report
Requests By Browser WWW Report
Number of Requests By Day WWW Report
Number of Requests By Period WWW Report
Requests By Browser Language WWW Report
Requests By HTTP Method WWW Report
Requests By OS WWW Report
Requests By Result By Day WWW Report
Requests By Result By Period WWW Report
Requests By HTTP Result WWW Report
Requests By Gzip Result WWW Report
Requests By Robot Report
Requests By Top Level Domain Report
Requests By Attack Report
Requests By Keywords Report
Requests By User Agent WWW Report
Number of Requests By Size WWW Report
Number of Requests By Timeslot WWW Report
Requests By HTTP Protocol Version WWW Report
Requests Summary WWW Report
Average Compression By File Type WWW Report
Most Averaged Compressed Requested File WWW Report
Top Client By HTTP Result WWW Report
Top Client WWW Report
Last Pages By Session WWW Report
First Pages By Session WWW Report
Most Requested Pages By Client Host WWW Report
Most Travelled Referer -> Page Connections WWW Report
Top Referring Pages By Requested Page WWW Report
Most Requested Pages WWW Report
Most Requested Tracked Pages By Client Host WWW Report
Requested Tracked Pages By Period WWW Report
Most Requested URLs By Client Host WWW Report
User Sessions By Period WWW Report
Finished and Unfinished Session WWW Report
Visit times User Session WWW Report
Page Counts User Session WWW Report
Filters' Descriptions and Configuration
Select URL Filter
Select Client Host Filter
Exclude URL Filter
Exclude Client Host Filter
Exclude Referer Filter

Supported Log Format

The WWW superservice supports four log file formats which makes it possible to support a wide range of web servers like Apache, IIS or Boa.

Common Log Format

Common Log Format (CLF) is a standard log format that was originally implemented in the CERN httpd web server but that is supported nowadays by most web servers. Apache, IIS and Boa can be configured to log in that format.

The Common Log Format has the following format:

remotehost rfc931 authuser [date] "request" status bytes
	    
where the fields have the following meaning:
remotehost

The host that made the request. This can be given as an IP address or a hostname.

rfc931

The result of an ident lookup on the host. This is usually never used.

authuser

The authenticated username.

date

The timestamp of the request.

request

The first line of the request. Usually in the format "method file protocol".

status

The result status of the request. i.e. 200, 301, 404, 500.

bytes

The size of the response sent back to the client.

Example of log lines in Common Log Format :

127.0.01 - - [11/03/2001 12:12:01 -0400] "GET / HTTP/1.0" 200 513
dsl1.myprovider.com - francis [11/03/2001 12:14:01 -0400] \
"GET /secret/ HTTP/1.0" 200 1256
	    

Combined Log Format

The combined log format is an extension to the Common Log Format. It adds informations about the user agent and referer. It is also known as the extended common log format. It was first implemented in the NSCA httpd webs server but is now supported in many web servers. Apache can be configured to use this log format.

Two fields are added at the end of the common log lines :

"referer" "useragent"
referer

The content of the Referer request's header. This usually reflects the page the user visited before this request.

useragent

The content of the User-Agent request's header. This usually reflects the browser that the user is using.

CLF With mod_gzip Extensions

Mod_gzip is another extension to the common log format. It is used by the mod_gzip Apache extension which can be used to compress the result of the requests before sending them to the client.

mod_gzip is a module developed by RemoteCommunications, Inc. Sourcecode is freely available from http://www.RemoteCommunications.com/apache/mod_gzip/mod_gzip. More informations can be found in their FAQ.

mod_gzip can log informations about the compression of pages. To enable this, one can configure Apache to log using the 'gzip' format which can be defined as follows:

LogFormat "%h %l %u %t \"%r\" %>s %b %{mod_gzip_result}n \
          %{mod_gzip_compression_ratio}n" gzip
	    

This adds two fields at the end of each common log line:

gzip_result compression_ratio
gzip_result

The gzip result code. Usually OK.

compression_ratio

The ratio by which the content was compressed. A number from 0 to 100.

Referer Log Format

The Referer log format is an old format that was implemented in the NSCA httpd server. It was used to log informations about the request's referer in a separate log file. The combined log format has made this log format obsolete.

Referer log files have the following format:

uridocument
uri

The referring URI. This is the content of the Referer request's header which usually reflects the page where the user was before that request.

document

The local document that was referenced by that URI. This is the requested file without any query string.

Logs With Virtual Host Information

You may encounter log files that have a field containing the virtual host for which the requests was at the beginning of the line. The rest of the line is usually in the common or combined log format. This kind of logging is typically seen on webservers hosting several virtual servers.

Example of such a line:

www.example.com 1.7.2.21 - - [13/Oct/2000:10:30:16 +0200] \
    "GET / HTTP/1.0" 200 83
	    

Altough Lire doesn't directly support such logs, it is easy to split those logs into many log files in the common or combined log format which can subsequently be processed by Lire.

Example doing this in a shell:

$  mkdir apache-common.log
$  (while read virt rest; do echo $rest >> \
 apache-common.log/$virt; done) < /var/log/apache/common.log
$  for f in apache-common.log/*; do \
 lr_log2mail -s "$f" www common joe@example.com < $f; done