Feedback and Hints, January 2005

This month features a new layout and a blog. Also the previous month's feedback column has been been formatted with the new layout.

Feedback:

Message from iman
CGI postgres/perl script
Sqworm -- what is it?
Re: Sqworm -- what is it?
Just Testing
PGTS website
PGTS website (2)
MSNbot from 207.46.98.35
Linknzbot
PGTS website (3)
"bad file" error in coldarch script
New Robot LimeBot
Getting file types from log file

Hints for this month:

Using awk in vi (again)

Message from iman

From: ssss@nodomain
Date: Sat,  1 Jan 2005 11:18:03 +1100 (EST)

Dear Webmaster,

 -- Message Follows --
thanks i have a picture sex
 -- Message Ends --
 From: 80.191.11.17
Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt; FunWebProducts)

Ed: Well I'm glad to hear that one of our readers is happy! Enjoy your
pictures, where ever you got them from ... Just be careful you don't go
blind!

Well he might be happy, but possibly a little confused. Since we don't
offer such pictures at this website ...

Perhaps he intended this message for someone else?

Iman (if that is his name) sent this message from a netblock that is
owned by Alborz Computer Co., in Tehran. His browser also has the
FunWebProducts add-in, which according to most MS pundits is spyware. He
might be getting more than he bargained for ...

(Oh what a tangled web we weave ...)

Back To Index

Using awk in vi (again)

Date: Sat, 15 Jan 2005 01:07:33 +1100
From: Gerry Patterson

Ed: Recently I started a blog The first entry contained a rather lengthy
programming tip. And then I thought it would be better to put it here
(rather than in the blog).

I was using vi (as I do most of the time when I am writing an email or
just about any document). I wanted to include a list of agents that had
visited the PGTS web site, and had the word 'Tiscali' in them. So I used
the following SQL to fetch a list of these in last_visit order:

	cust=> select agent_id,name,last_visit coalesce(ip_addr,'-')
	       from webagents
	       where agent_string like '%Tiscal%'
	       or agent_string like '%tiscal%' and robot_ind = 0
	       order by last_visit;

That SQL produces a report consisting of four columns seperated with a '|'
character. This can easily be turned into a bunch of URLs with an awk
one-liner ... The URLs point to the PGTS agent_string database.

Here is what the output looked like:

 agent_id |  name   |       last_visit       |     coalesce
----------+---------+------------------------+-------------------
     1892 | MSIE    | 2002-10-02 00:12:18+10 | -
     3912 | MSIE    | 2003-01-19 20:21:33+11 | -
     5119 | MSIE    | 2003-03-17 02:25:37+11 | -
     6604 | MSIE    | 2003-05-12 21:50:35+10 | -
     6780 | MSIE    | 2003-05-19 06:51:31+10 | -
    13479 | MSIE    | 2004-02-03 09:51:46+11 | 81.131.120.3/32
    14380 | MSIE    | 2004-02-14 08:21:40+11 | 205.188.209.71/32
    14355 | MSIE    | 2004-02-14 15:46:20+11 | 195.93.34.10/32
    14347 | MSIE    | 2004-02-15 21:22:12+11 | 62.252.0.4/32
    16726 | MSIE    | 2004-03-15 10:25:49+11 | 82.3.65.27
    16901 | MSIE    | 2004-04-07 03:42:42+10 | 82.84.196.145
    20300 | Tiscali | 2004-06-06 20:45:30+10 | 80.46.188.101
    21189 | MSIE    | 2004-07-09 07:07:12+10 | 81.152.109.248
    23415 | MSIE    | 2004-11-01 09:14:17+11 | 193.60.159.61
    24192 | MSIE    | 2004-12-07 20:15:21+11 | 196.30.113.163
     1450 | MSIE    | 2004-12-08 02:07:34+11 | 62.11.130.225
     3844 | MSIE    | 2004-12-13 07:20:37+11 | 81.26.104.195
(17 rows)

I then cut and pasted the output from the SQL into the vi session, that
I was running using the following command:

!}!awk '{print "<a href=|http://www.pgts.com.au/cgi-bin/psql?w0302=" $1 "|>" $3,$5,$NF "</a>"}'

(The first two keystrokes are the movement command ... and the script in
single quotes is an awk one-liner).

Then this was followed by this substitute command:

	:.,.+16s/|/"/g


(which changes the '|' character into '"')

Which created the following text:

<a href="http://www.pgts.com.au/cgi-bin/psql?w0302=1892">MSIE 2002-10-02 -</a>
<a href="http://www.pgts.com.au/cgi-bin/psql?w0302=3912">MSIE 2003-01-19 -</a>
<a href="http://www.pgts.com.au/cgi-bin/psql?w0302=5119">MSIE 2003-03-17 -</a>
<a href="http://www.pgts.com.au/cgi-bin/psql?w0302=6604">MSIE 2003-05-12 -</a>
<a href="http://www.pgts.com.au/cgi-bin/psql?w0302=6780">MSIE 2003-05-19 -</a>
<a href="http://www.pgts.com.au/cgi-bin/psql?w0302=13479">MSIE 2004-02-03 81.131.120.3/32</a>
<a href="http://www.pgts.com.au/cgi-bin/psql?w0302=14380">MSIE 2004-02-14 205.188.209.71/32</a>
<a href="http://www.pgts.com.au/cgi-bin/psql?w0302=14355">MSIE 2004-02-14 195.93.34.10/32</a>
<a href="http://www.pgts.com.au/cgi-bin/psql?w0302=14347">MSIE 2004-02-15 62.252.0.4/32</a>
<a href="http://www.pgts.com.au/cgi-bin/psql?w0302=16726">MSIE 2004-03-15 82.3.65.27</a>
<a href="http://www.pgts.com.au/cgi-bin/psql?w0302=16901">MSIE 2004-04-07 82.84.196.145</a>
<a href="http://www.pgts.com.au/cgi-bin/psql?w0302=20300">Tiscali 2004-06-06 80.46.188.101</a>
<a href="http://www.pgts.com.au/cgi-bin/psql?w0302=21189">MSIE 2004-07-09 81.152.109.248</a>
<a href="http://www.pgts.com.au/cgi-bin/psql?w0302=23415">MSIE 2004-11-01 193.60.159.61</a>
<a href="http://www.pgts.com.au/cgi-bin/psql?w0302=24192">MSIE 2004-12-07 196.30.113.163</a>
<a href="http://www.pgts.com.au/cgi-bin/psql?w0302=1450">MSIE 2004-12-08 62.11.130.225</a>
<a href="http://www.pgts.com.au/cgi-bin/psql?w0302=3844">MSIE 2004-12-13 81.26.104.195</a>

(Try it yourself ... Go back and cut the report out and then enter the
commands into vi)

Back To Index

CGI postgres/perl script

From: Sudheer
Date: Sat, 15 Jan 2005 17:08:08 +1100 (EST)

Dear Webmaster,

 -- Message Follows --
how to connect my postgres database to perl?
 i have used the following script in my perl program.
#!/usr/bin/perl
  use CGI qw(:standard);
  use DBI;
  use strict;

  use CGI::Carp qw/fatalsToBrowser/;
 $query = new CGI;
 print $query->header;
# my $dbname = 'test';

# Connect to database
# my $conn = Pg::connectdb("dbname=$dbname");
# if ($conn->status != PGRES_CONNECTION_OK) {
#    die "ERR: Couldn't open connection: ".$conn->errorMessage."\n";
#}


$dbh = DBI->connect("dbi:Pg:dbname=test", "postgres", "", {AutoCommit => 1, Rais
eError => 1});
 print $query->start_html("FIREWALL");
 print "<center><H1>FIREWALL</H1></center>\n";
 &print_prompt($query);
&do_work($query);
&print_tail;
$dbh->disconnect;
print $query->end_html

sub print_prompt {
   my($query) = @_;
   print $query->start_multipart_form;
   print "<BR>IP Adress:";
   print $query->popup_menu(-name=>'IP Adress',
                         -Values=>['10.0.0.1','10.0.0.2','10.0.0.3','10.0.0.4',1
0.0.0.5,10.0.0.6],
                            );

   print "Source IP Adress:";
   print $query->textfield('numeric');
   print "Destination IP Adress:</BR>";
   print $query->textfield('Dest');
   print $query->radio_group(-name=>'Disable Enable',
                       -Values=>['Disable','Enable'],
                       );
print "<P><center>",$query->submit('Action','save');

   print $query->submit('Action','cancel');

   print $query->endform;

        }


 -- Message Ends --
 From: 202.153.35.138
Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20040206

Back To Index

Sqworm -- what is it?

Date: Tue, 18 Jan 2005 11:39:49 +1100
From: Jeremy Hill

Ed: The following message was sent from the PGTS Postmaster for update
of the robots file.

Dear Webmaster,

---------------
    Name:Sqworm
 Version:2.9.85
   Email:
Comments: 'what is Sqworm? it was poking around my server.'
     URL:
---------------

Ed: I replied as follows:

G'day,

Not really sure what Sqworm is. When did you see it?

The last time I saw it was 2004-08-04 00:16:37 GMT

The netblock 66.194.6/24 is currently owned by Websense (delegated by
Time Warner Telecom)

This is all I can discover about them:
	OrgName:    Websense
	OrgID:      WEBSEN-1
	Address:    10240 Sorrento Valley Rd
	City:       San Diego
	StateProv:  CA
	PostalCode: 92121
	Country:    US

	NetRange:   66.194.6.0 - 66.194.6.255
	CIDR:       66.194.6.0/24
	NetName:    TWTC-NETBLK-4
	NetHandle:  NET-66-194-6-0-1
	Parent:     NET-66-192-0-0-1
	NetType:    Reassigned
	RegDate:    2004-01-26
	Updated:    2004-01-26

As far as I can tell, the agent they dispatched is a robot. It didn't
read robots.txt. It is not aggressive and does not appear impolite.

There have been far more hits to my site from people searching for
information about Sqworm, than there has been from Sqworm.

I find nothing suspicious about this organisation or their robot (except
the name, which is a bit peculiar)

Cheers,

Gerry Patterson

For more information see: below.

Back To Index

Re: Sqworm -- what is it?

From: Jeremy Hill
Date: Tue, 18 Jan 2005 04:13:35 GMT

Ed: Jeremy replied as follows:

Gerry,

I am new to web hosting. I set up an Abyss server on my pc 3 days ago,
and am just learning to look over my logs. Here is an excerpt from my
log file when Sqworm showed up. Directly after that hit I had some
unidentified hits poking around looking for scripts and executables.
This is somewhat disturbing to me.

-------------

66.194.6.4 - - [16/Jan/2005:08:37:31 -0600] "GET / HTTP/1.0" 200 9961 "" "Sqworm/2.9.85-BETA (beta_release; 20011115-775; i686-pc-linux-gnu)"
68.44.229.242 - - [16/Jan/2005:09:45:06 -0600] "GET /scripts/root.exe?/c+dir HTTP/1.0" 404 223 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:06 -0600] "GET /MSADC/root.exe?/c+dir HTTP/1.0" 404 223 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:06 -0600] "GET /c/winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 223 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:06 -0600] "GET /d/winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 223 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:07 -0600] "GET /scripts/..%255c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 225 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:07 -0600] "GET /_vti_bin/..%255c../..%255c../..%255c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 225 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:07 -0600] "GET /_mem_bin/..%255c../..%255c../..%255c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 225 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:07 -0600] "GET /msadc/..%255c../..%255c../..%255c/..%c1%1c../..%c1%1c../..%c1%1c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 225 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:07 -0600] "GET /scripts/..%c1%1c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 223 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:08 -0600] "GET /scripts/..%c0%2f../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 223 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:08 -0600] "GET /scripts/..%c0%af../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 223 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:08 -0600] "GET /scripts/..%c1%9c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 223 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:08 -0600] "GET /scripts/..%25%35%63../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 225 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:09 -0600] "GET /scripts/..%252f../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 225 "" ""

-----------

I don't know if the 2 IPs are related. I traced the Sqworm IP (using the
'Tracert' tool at www.DNSstuff.com) to a record for "Office of the
Future", apparently last registered in 1992. The IP goes to a UNIX
server {core-02-so-0-0-0-0.phnx.twtelecom.net.}, where it appears there
is a firewall blocking the 'Tracert' tool.

The other IP goes to a Microsoft server on Comcast
{pcp0010510861pcs.limstn01.de.comcast.net.}

As I said before I don't know if these two entries are related, but they
appear in my log exactly as recorded above.

Hopefully this is useful information.

Thanks,

Jeremy Hill

Ed: Jeremy,

It is unlikely that these entries are related.

The hits from 68.44.229.242 are from a worm which infects only IIS
servers. I believe it may be a variant of nimbda.

For more information about the worm see below.

Sqworm, despite the name, is actually a robot. On my site it behaves like
a robot.

It appears that someone else sent some feedback about sqworm here.
Unfortunately, at the time I was neglecting my website. So ... my
apologies to Martha. I have since done some research on Sqworm and
investigated Websense.

It appears that the robot Sqworm, is rigididge (that's Australian for
ok). The netblock is owned by Websense as previously noted. Websense are
a medium to large US corporation who sell Webfiltering software for
proxy servers. They claim to be able to categorise websites. This robot
must be part of their categorisation. They probably use it to build
their website database.

The name "worm" has come into considerable disrepute. And this no doubt
accounts for the general sense of unease that surrounds the Sqworm
robot. It is nevertheless, unfounded. Sqworm is only guilty of being
christened with what, at best, could be described as "an unfortunate
choice of name".

Also I believe that Websense could do a better job of explaining what
their robot does and how their categories work. The whole enterprise
seems too secretive.

If anyone knows different, please let me know.

There is more information about Sqworm in the PGTS blog.

Nimbda:
The second IP address that was in Jeremy's email was from a real worm.
This netblock is owned by Comcast Cable Communications, Inc. Cherry Hill
NJ, a large US cable company. The nethandle is NET-68-44-0-0-1, CIDR
68.44.0.0/15

Someone is running an IIS server on one of Comcast's addresses and it has
been infected.

This IP address returns 7 positives from the openRBL.

So their Internet experience is likely to be somewhat diminished. They
probably don't know any better.

Blars has listed them ... He's a cruel man. Cruel but fair.

Back To Index

Just Testing

From: Anonymous
Date: Wed, 19 Jan 2005 11:24:26 +1100 (EST)

Ed: I received the following notification from the PGTS postmaster:

Dear Webmaster,

My agent string is: Interwebthingy Exploder 8.0000001 (standards compatible; W3C 4.01; [en-GB])
My IP address is: 217.44.1.8
---------------
    Name:unknown
 Version:
Comments: This is a fake string I entered because I felt like it. And
          because it helps me analysing my own logfiles because I can
          filter my own visits.
   Owner:
      OS:

Ed: I use a combination of agent_string and ip address. That way
webmaster hits do not get counted in any stats.

BTW: Coming soon ... An updated version of the agent_string information
panels that will enable readers to add their own comments.

Back To Index

PGTS website

From: Stephen Hampson
Date: Wed, 19 Jan 2005 13:06:14 +1100

Hi Gerry,

I'm not sure if you are aware, but a lot of the time when I try to reach
your website it is unavailable. The DNS seems to not resolve. I get:

    "Ping request could not find host www.pgts.com.au/ Please check the
    name and try again." When I try to ping your hostname.

Thanks

Steve

Ed: I thought that this might be a network problem, upstream, since I
have accessed my website many times in the Melbourne metropolitan area,
without any obvious problems. It turned out that this was a network
problem, and the problem was my network!

More below ...

Back To Index

PGTS website (2)

Date: Thu, 20 Jan 2005 13:14:19 +1100
From: Stephen Hampson

Gerry,

I'm not sure what you are looking for with the ping's but I'll forward
them to you?

As I said be before the DNS server on 203.213.17.10 is either not
running or is unreachable.

If you log into that box and run:

Dig @127.0.0.1 www.pgts.com.au

What is the output?

Thanks
Steve

Ed: Unfortunately I did not read Steve's email carefully. I checked my
primary DNS, saw that it was working (of course) ... But he was asking
about my secondary DNS!

He was away the next day, and did not get my email, till Monday ...

Fortunately he persisted ... and in one of those awful (dohhh!) moments,
I realised that my seconday DNS was not running! I had altered some of
the startup scripts and restarted the server when I upgraded to postgres
7.4.5. Deep embarassment here! I started named.

(see Steve's next email here)

Back To Index

MSNbot from 207.46.98.35

From: Chemzoo
Date: Thu, 20 Jan 2005 17:51:25 +1100 (EST)

Dear Webmaster,

Previous Values
---------------
    Name:Msnbot
 Version:0.3
   Email:
Comments: I got ip address of msnbot as 207.46.98.35, which is not listed

Ed: Chemzoo, the address 207.46.98.35 belongs to Microsoft Corp, as does
the MSNbot

Back To Index

Linknzbot

From: "Dave"
Date: Sun, 23 Jan 2005 14:10:31 +1300

> ----- Original Message -----
> From: "PGTS Webmaster" <webmaster@pgts.com.au>
> To: <webmaster@linknz.co.nz>
> Sent: Sunday, January 23, 2005 12:10 PM
> Subject: Link With PGTS.com.au

Hi and thanks for listing my search engine on your web site, I have just
sent in my web spidering robot (linknzbot) to index your web site into
the search engine it reported the following searchable links from your
web site:

links found : 17
http://www.pgts.com.au/
http://www.pgts.com.au/page01.html
http://www.pgts.com.au/page02.html
http://www.pgts.com.au/page03.html
http://www.pgts.com.au/download/
http://www.pgts.com.au/page04.php
http://www.pgts.com.au/pgtsj/
http://www.pgts.com.au/cgi-bin/pgtsj?file=pgtsj0412
http://www.pgts.com.au/cgi-bin/pgtsj?file=pgtsj0211a
http://www.pgts.com.au/pgtsj/0501.html
http://www.pgts.com.au/cgi-bin/pgtsj?file=pgtsj0204
http://www.pgts.com.au/cgi-bin/pgtsj?file=pgtsj0204a
http://www.pgts.com.au/cgi-bin/pgtsj?file=pgtsj0204b
http://www.pgts.com.au/cgi-bin/psql?feedback
http://www.pgts.com.au/cgi-bin/pgtsj?file=pgtsj0309b
http://www.pgts.com.au/cgi-bin/psql?blog
http://www.pgts.com.au/cgi-bin/pgtsj?file=pgtsj0302a
Optimizing tables...
Indexing complete !

People who use the linknz search engine can now find and visit your web
site.

Many regards

Dave Andrews

P.S. you should find a new USER AGENT STRING in your log files for the
linknzbot which is our web spidering robot, written in C++.

Ed: G'day Dave,

Thanks for your feedback.

I started the Robots Collections a couple of years ago, as a sort of
hobby, because I was looking for things to write about, when I started
publishing articles online. It has turned out to be a very popular topic
with webmasters. And for that reason, I am trying to categorise Bots
when I see them. I have now settled on the following categories (not
mutually exclusive).

	Aggressive
	Hard-hitting
	Impolite
	Suspicious

The default category is blank, this implies that the Bot is polite,
obeys robots.txt instructions, does not attempt to cloak or disguise
its' behaviour, and is not associated with criminal activities or
spamming (which these days looks increasingly as if it will also be
considered criminal). Over the next couple of months, I will be updating
the collection to reflect these categories.

I almost classified your Bot as Hard-hitting. 366 hits in 12 minutes is
sort of borderline. However, since they were mostly HEADs rather than
GETs, I just made a note in the comments section.

Might I suggest that if you intend to do make your crawl sessions any
longer, you should introduce a delay.

Regards,

Gerry Patterson
SysAdmin PGTS Pty Ltd

Back To Index

PGTS website (3)

Date: Mon, 24 Jan 2005 10:13:57 +1100
From: "Stephen Hampson"

Yeah. no worries. That was what I was trying to tell you. It looks like
windows DNS does not fail over properly. So if it hits the secondary DNS
server (50% chance because it work in a round robin way), then there is
a fair chance it will never try the primary DNS server and will just
fail.

Just tried your site then and it works. It wasn't working about 30 mins
ago due to this issue. I'll let you know if I experience problems again.

Ed: And I am very grateful that Steve persisted ... despite me being so
relentlessly thick-headed!

Of course it helps if named is running on your DNS server!

I hadn't realised that Windows DNS servers worked that way ... Obviously
I am going to have to fix up those startup scripts. And all those other
people with Windows DNS, you may find that my site works more than 50%
of the time now!

Back To Index

"bad file" error in coldarch script

From: Grant September
Date: Mon, 24 Jan 2005 22:31:08 +1100 (EST)

Dear Webmaster,

 -- Message Follows --
Hi There, tried to use your coldarch.pl file on NT, but kept on getting
Bad file error, i think this is due to my path have a space in the name
do you have a workaround for this as i have no perl programming skills

Ed: I'm sorry Grant, but I have not allowed for spaces in data_file
names.

Back To Index

New Robot LimeBot

From: Kenn Sippell
Date: Tue, 25 Jan 2005 05:08:31 +1100 (EST)

Dear Webmaster,

 -- Message Follows --
I would like you to be aware of the web robot known as LimeBot - written
as part of the free enterprise initiative Cruise Lime. The bot
identifies itself with the User-Agent:

'LimeBot/1.0 (+www.cruiselime.com/LimeBot.php)'.

I would appreciate it if you included it in your databases for the
convenience of those who use your search.

 -- Message Ends --

Ed: I replied as follows ...

Kenn,

The agent_strings are added automatically when visitors come to this
site. If you are part of the Cruise Lime Enterprise, then I suggest you
send the Limebot to my site. It will be included within 24 hours. I will
approve the entry when I see it.

The LimeBot was first seen in the PGTS logs at 2005-01-25 11:12 GMT.

Back To Index

Getting file types from log file

From: Khoa Nguyen
Date: Fri, 28 Jan 2005 12:31:08 +1100 (EST)

Dear Webmaster,

 -- Message Follows --
Hi, thanks for the info posted on the site. They are useful for better
understanding of the log file.

On the web page, you print out 4 columns: IP, Date, Time, and Target. I
have trouble separating the file name from the file type. Example, for
the Target part, what if I only want to print out the types of file
request (txt, html, ..) instead of the whole file name (data.html)?

What if I only want to count how many hits each file type has, instead
of how many times the file is requested? When searching through the log
file, I could search for the file request, but couldn't search for file
type?

I have something like:
"/\/[\w\d\-\.]+\s/" Please advise. Much appreciate it. Thanks

Ed: Khoa,

Sorry not to answer this sooner.

I have been trying to find time to re-write some of the scripts, and I
was hoping to come up with a script that answers this.

Unfortunately I did not get around to doing it.

I do have a script (which is hooked into the PGTS database ... And the
way I try to determine the type is by examining the file extensions.
Obviously this will be site dependant. However, you could come up with a
general solution which uses standard MIME types.

The script I have is actually rather complex. It examines each IP
address and looks for "Robot" behaviour.

e.g. suppose we have a subroutine called parse log which splits each
line into its' individual components then

($IP,$htime,$cmd,$status,$bytes,$referer,$agent) = parse_log($_);
# this assigns the components
# then we can test for the MIME types
unless ($page){
	$page_type = $NULL;
}
elsif ($page =~ /\.jpg$/ || $page =~ /\.gif$/ ){
	$page_type = $IMAGE;
}
elsif ($page =~ /\.exe$/ || $page =~ /\.gz$/ || $page =~
/\.msi$/ || $page =~ /\.zip$/){
	$page_type = $BIN;
}
elsif ($page =~ /robots.txt$/ ){
	$page_type = $ROBOT;
}
elsif ($page =~ /\.txt$/ || $page =~ /-HOWTO$/ ){
	$page_type = $TXT;
}

etc ... etc

This is rather cumbersome, but it allows you to tailor the results for
your site (you may have particular standards, you may only use gif, jpg
and png or you may do as I do and put all images in the same folder in
which case you would test the first part of $page to see if it was in
the folder called /image etc etc.

Hope this helps

Back To Index