|
|
PGTS Humble BlogThread: Perl Programming |
|
Gerry Patterson. The world's most humble blogger | |
Edited and endorsed by PGTS, Home of the world's most humble blogger | |
| |
Installing A Local Copy Of W3C |
|
Chronogical Blog Entries: |
|
| |
Date: Mon, 30 Mar 2009 20:50:45 +1000In my Humble Opinion, one of the most important websites to aid webmasters in the production of syntactically correct HTML is the W3C Validation site. However, depending on network performance, submitting your pages to the validator can be slow and tedious. For that reason you might want to get a local copy of your own validator, and run it as required. |
This can be done easily since the Validator is written in perl, using the libwww library. Of course it is all open source. And about all you need is a 5.8+ version of perl with the libwww pods and the OpenSP SGML parser. (NOTE: It is recommended that only developers, programmers and/or technically proficient webmasters should try this)
Details of w3c and associated software can be found at www.w3.org/Status.
Instructions for installing the w3c validator can be found at validator.w3.org/docs/install.html.
If you have Ubuntu, you can easily install a Debian version of the package with this one simple command:
sudo apt-get install w3c-markup-validator
However, when I tried installing the validator package on a Kubuntu workstation, I discovered a few little undocumented features. I had to carry out some additional steps in order to get the validator working in a satisfactory manner, as described in the installation instructions in the link mentioned above.
A list of the changes that I had to make to the CGI perl script which checks URLs for W3C is listed below:
- First, I checked the config folder /etc/w3c. This isn't strictly speaking
an additional step. It is one of the steps outlined in the installation
instructions and the config file (validator.conf) is well commented and easy
to modify. However, this is just a reminder that you should read the
installation instructions first. There will be a few things that must be done,
since the package uses the Debian defaults (base:
/usr/share/w3c-markup-validator), and the root folder is different from the
online installation default. In this particular instance, I wanted to validate
local IP addresses, so I looked for the line that specified whether private
RFC1918 addresses were allowed, and made sure that it was set to yes:
Allow Private IPs = yes
- As part of the aptitude installation a script called
/usr/lib/cgi-bin/check had been created. This is the primary
Apache CGI script for checking HTML code. However it required some minor adjustments.
First the paths didn't work. Rather than change all the paths in the config script, I used
this command to create logical a link:
sudo ln -s /usr/share/w3c-markup-validator /usr/local/validator
- The first time I ran the CGI script (as in the installation guide), these warnings were generated:
[Mon Mar 30 00:38:06 2009] check: Unrecognized escape \H passed through at /usr/lib/cgi-bin/check line 1025. [Mon Mar 30 00:38:06 2009] check: Use of uninitialized value in string eq at /usr/lib/cgi-bin/check line 1981. [Mon Mar 30 00:38:08 2009] check: Use of uninitialized value in concatenation (.) or string at /usr/lib/cgi-bin/check line 685.
If I had been running the check script as part of a collection of CGI scripts, these warnings would have ended up in the Apache error log.
The error at line 1025, seemed to be due to the fact that the author wanted to search for a dollar character. This can be difficult in a regular expression since in perl \$something is interpreted as the address of $something. I changed line 1025 from this:
$cfg->{Badge}->{URI} =~ s|^\HOMEPAGE/|$CFG->{'Home Page'}|; To this:
$cfg->{Badge}->{URI} =~ s|^[\x24]HOMEPAGE/|$CFG->{'Home Page'}|; Line 1981 was changed from this:
if ($q->request_method eq 'POST' and not $File->{'Is Upload'}) { To this:
if ($q->request_method && $q->request_method eq 'POST' and not $File->{'Is Upload'}) { The third error seemed to be an environment variable that had changed. I altered line 686 from this:
$CFG->{Paths}->{SGML}->{Catalog}.":". To this:
$CFG->{Paths}->{SGML}->{CatalogDir}.":".
After making these changes the validator no longer wrote messages to STDERR when it was called.
The next step is to hook it up with the Apache CGI scripts. This may require customisation of the template scripts (look in the template folder). If you are happy with the generic W3C pages you can leave the templates as they are. If you are doing this to aid development on your own network or to supply feedback to clients who post HTML to your site, then you will need to modify the templates.