PGTS PGTS Pty. Ltd.   ACN: 007 008 568

point Site Navigation

point Other Blog Threads



  Valid HTML 4.01 Transitional

   Stop Spam! Stop Viruses!
   Secure And Reliable Ubuntu Desktop!

   Ubuntu

   If you own a netbook/laptop~
   Download Ubuntu Netbook!






PGTS Humble Blog

Thread: Perl Programming

Author Image Gerry Patterson. The world's most humble blogger
Edited and endorsed by PGTS, Home of the world's most humble blogger

Installing A Local Copy Of W3C


Chronogical Blog Entries:



Date: Mon, 30 Mar 2009 20:50:45 +1000

In my Humble Opinion, one of the most important websites to aid webmasters in the production of syntactically correct HTML is the W3C Validation site. However, depending on network performance, submitting your pages to the validator can be slow and tedious. For that reason you might want to get a local copy of your own validator, and run it as required.

This can be done easily since the Validator is written in perl, using the libwww library. Of course it is all open source. And about all you need is a 5.8+ version of perl with the libwww pods and the OpenSP SGML parser. (NOTE: It is recommended that only developers, programmers and/or technically proficient webmasters should try this)

Details of w3c and associated software can be found at www.w3.org/Status.

Instructions for installing the w3c validator can be found at validator.w3.org/docs/install.html.

If you have Ubuntu, you can easily install a Debian version of the package with this one simple command:

sudo apt-get install w3c-markup-validator

However, when I tried installing the validator package on a Kubuntu workstation, I discovered a few little undocumented features. I had to carry out some additional steps in order to get the validator working in a satisfactory manner, as described in the installation instructions in the link mentioned above.

A list of the changes that I had to make to the CGI perl script which checks URLs for W3C is listed below:

  1. First, I checked the config folder /etc/w3c. This isn't strictly speaking an additional step. It is one of the steps outlined in the installation instructions and the config file (validator.conf) is well commented and easy to modify. However, this is just a reminder that you should read the installation instructions first. There will be a few things that must be done, since the package uses the Debian defaults (base: /usr/share/w3c-markup-validator), and the root folder is different from the online installation default. In this particular instance, I wanted to validate local IP addresses, so I looked for the line that specified whether private RFC1918 addresses were allowed, and made sure that it was set to yes:
    Allow Private IPs = yes
    
  2. As part of the aptitude installation a script called /usr/lib/cgi-bin/check had been created. This is the primary Apache CGI script for checking HTML code. However it required some minor adjustments. First the paths didn't work. Rather than change all the paths in the config script, I used this command to create logical a link:
    sudo ln -s /usr/share/w3c-markup-validator /usr/local/validator
    
  3. The first time I ran the CGI script (as in the installation guide), these warnings were generated:
    [Mon Mar 30 00:38:06 2009] check: Unrecognized escape \H passed through at /usr/lib/cgi-bin/check line 1025.
    [Mon Mar 30 00:38:06 2009] check: Use of uninitialized value in string eq at /usr/lib/cgi-bin/check line 1981.
    [Mon Mar 30 00:38:08 2009] check: Use of uninitialized value in concatenation (.) or string at /usr/lib/cgi-bin/check line 685.
    

    If I had been running the check script as part of a collection of CGI scripts, these warnings would have ended up in the Apache error log.

    The error at line 1025, seemed to be due to the fact that the author wanted to search for a dollar character. This can be difficult in a regular expression since in perl \$something is interpreted as the address of $something. I changed line 1025 from this:
    $cfg->{Badge}->{URI} =~ s|^\HOMEPAGE/|$CFG->{'Home Page'}|;

    To this:
    $cfg->{Badge}->{URI} =~ s|^[\x24]HOMEPAGE/|$CFG->{'Home Page'}|;

    Line 1981 was changed from this:
    if ($q->request_method eq 'POST' and not $File->{'Is Upload'}) {

    To this:
    if ($q->request_method && $q->request_method eq 'POST' and not $File->{'Is Upload'}) {

    The third error seemed to be an environment variable that had changed. I altered line 686 from this:
    $CFG->{Paths}->{SGML}->{Catalog}.":".

    To this:
    $CFG->{Paths}->{SGML}->{CatalogDir}.":".

After making these changes the validator no longer wrote messages to STDERR when it was called.

The next step is to hook it up with the Apache CGI scripts. This may require customisation of the template scripts (look in the template folder). If you are happy with the generic W3C pages you can leave the templates as they are. If you are doing this to aid development on your own network or to supply feedback to clients who post HTML to your site, then you will need to modify the templates.


Other Blog Posts In This Thread:

Copyright     2009, Gerry Patterson. All Rights Reserved.