W3C HTML Validator

The following is instructions to get the W3C HTML Validator on Microsoft Windows NT (XP Pro 2002). I've never seen any instructions elsewhere for running the Validator on Windows, nor Windows listed on W3C's list of supported platforms, and figured this might be useful.

This is about the 4th attempt over the years to get W3C HTML Validator running on Microsoft Windows. Previous attempts were on Windows 98SE. I haven't tried this on Windows 98SE yet.

The following is based on the following software versions:

Apache (win32 native build, NOT cygwin)
10:34pm [ord@chimera ~] > apache -v
Server version: Apache/1.3.29 (Win32)
Server built: Oct 29 2003 08:39:07

Cygwin
10:34pm [ord@chimera ~] > uname -a
CYGWIN_NT-5.1 chimera 1.5.5(0.94/3/2) 2003-09-20 16:31 i686 unknown unknown Cygwin

Perl
10:34pm [ord@chimera ~] > perl -v

This is perl, v5.8.0 built for cygwin-multi-64int

Note: On my first recent attempt of getting the W3 HTML Validator working on Windows NT, I was successful, however it would not recognise any charset except utf-8. I don't know how I fixed this. I simply tried again from scratch, and had success.

Note: I typically run Apache as a service under the SYSTEM account. I thus far can't get the SGML parse to run properly under this environment. This is with Apache running from the command line (from my user account).

Perl requirements:

CGI.pm - I had this already installed, either from the install of Perl, or otherwise previously installed it. No specific comments
CGI::Carp - Ditto
File::Spec - Ditto
HTTP::Parser - Ditto - manually installed previously.
LWP::UserAgent - Ditto - manually installed previously.
Set::IntSpan Used Set-IntSpan-1.07 - Built OTTB.
Text::Iconv Used Text-Iconv-1.2 - This is where I've gotten stuck in the past. http://www.cygwin.com/ml/cygwin/2002-08/msg01384.html solves the problem. - Patch the Perl makefile to link against the iconv library. ( 'LIBS' => ['-liconv'], # e.g., '-lm')
URI::Escape Previously installed.

Basically the only two Perl modules I specifically built for the validator were Set::IntSpan and Text::Iconv. Any others (per chance I've missed any) were previously built, or came with Perl.

SGML Requirements

Used OpenSP 1.5 - compiles OOTB on cygwin without a hitch - ./configure, ./make, ./make install
One note is that onsmgl will be installed to /usr/local/bin. For some really strange reason, onsgml can't find cygwin1.dll when run from Perl via the web server, despite c:\cygwin\bin being in the %PATH%. (Runs fine from there from a shell). Either install onsgml to /usr/bin (or /bin, since /usr/bin is generally mounted as /bin on cygwin), or copy cygwin1.dll to /usr/local/bin (which is a bad idea TM).

Apache Requirements:

I run the validator from a named virtual host, therefore no main Apache config changes were required.

Note that some these directives may be redundant. For example, I don't have cgi-script handlers set in the main server configuration. Similarly, note the rewrite directives, thus the rewrite module needs to be loaded in the main server configuration, if not already. Further comments are listed in the virtual host code.

The virtual host looks like this:

<VirtualHost *>
	# Note, /vhosts/validator.home/ is an example path.
	# Fill in any instance of it with the correct path (which on Apache win32
	# won't start with a forward slash, it starts with a drive specification,
	# or network path.
	ServerName www.validator.home
	ServerAdmin webmaster@www.validator.home
	DocumentRoot "/vhosts/validator.home/htdocs"
	ErrorLog logs/www.validator.home-error.log
	CustomLog logs/www.validator.home-access.log common
	AddHandler cgi-script .pl
	# Given that I'm using the ScriptInterpreterSource registry instead of
	# shebang, I can't find a way to make check execute. A quick fix is
	# to renamme it to check.pl and rewrite requests to check to check.pl
	#
	# Also, to save messing around with validator.home/htdocs/ (document root)
	# and validator.home/httpd/cgi-bin/ (location of check script in distribution)
	# (because I can't get it to work, with the lack off Apache win32 understanding
	#  symlinks), I've just copied httpd/cgi-bin/check to htdocs/check.pl. This
	# is the only file whose location needs to change.
	RewriteEngine on
	RewriteRule   ^/check /check.pl
	# The following comes from httpd/conf/httpd.conf in
	# the distribution.
	#
	# ExecCGI at least is required.
	<Directory /vhosts/validator.home/htdocs>
		Options              ExecCGI IncludesNOEXEC Indexes MultiViews
#		AllowOverride        None
		AddHandler           server-parsed .html
		AddCharset           utf-8         .html
	</Directory>
</VirtualHost>
		

Validator Setup

Extract the source distribution of the validator to the directory specified in the virtual host configuration.

Copy httpd/cgi-bin/check to htdocs/check.pl.

The next step is to patch check.pl, it doesn't work OOTB on cygwin.

For a quick rundown of the changes required:

The -T switch in the shebang needs to be removed. Since things work by simply removing it, I haven't looked into why. (Yet).
Pragmas need to be disabled. Haven't looked much into it, commenting them out works, that was good enough for me.

The location of the main configuration file is changed. This is a personal preference. Since I'm not using the cygwin build of Apache, I prefer to keep all of the validator files outside of the cygwin directory tree). The change loads the configuration file from htdocs/config.

The executible check for the SGML parser being executible needs to be turned off. (Off course, when on an NTFS disk, ensure that it is excutible by the account that runs the SGML parser). Not sure exactly why this fails, but it does.

The configuration files must have UNIX line endings (based on binary mounts) for Perl to parse them correctly. Certainly check.cfg (or /etc/w3c/validator.conf if you install it as such), otherwise you'll experience problems with the SGML library path, which will contain a <CR>, which will mess up when filenames are appended to it. I've converted all configiration files to UNIX line endings, whether or not any others are required.

The largest problem is that open3() doesn't seem to work on cygwin. At least I can't get it to work. The solution is to use system(), to run the SGML parser writing stdout and stderr via the shell command, and reading them back in after the SGML parser ends.

Currently I just use /tmp/sgml.stdin, /tmp/sgml.stdout and /tmp/sgml.stderr, though creating proper temporary filenames for each stream would be better.

		--- check	2002-12-01 10:18:00.000000000 +1100
		+++ check.pl	2004-01-04 19:47:20.265625000 +1100
		@@ -1,4 +1,4 @@
		-#!/usr/bin/perl -T
		+#!/usr/bin/perl
		 #
		 # W3C MarkUp Validation Service
		 # A CGI script to retrieve and validate a MarkUp file
		@@ -25,8 +25,8 @@ use 5.006;

		 #
		 # Pragmas.
		-use strict;
		-use warnings;
		+#use strict;
		+#use warnings;

		 #
		 # Modules.
		@@ -84,16 +84,19 @@ use constant O_DOCTYPE =&gt;  4; # 0000 010
		 # Define global variables.
		 use vars qw($DEBUG $CFG $VERSION);

		-
		 #
		 # Things inside BEGIN don't happen on every request in persistent
		 # environments, such as mod_perl.  So let's do globals, eg. read config here.
		 BEGIN {

		+	# ord:
		+	# Override the location of the configuration file.
		+	$ENV{W3C_VALIDATOR_CFG} = "/vhosts/validator.home/htdocs/config/check.cfg";
		+
		   #
		   # Read Config Files.
		   $CFG = &read_cfg($ENV{W3C_VALIDATOR_CFG} || '/etc/w3c/validator.conf');
		-  if (! -x $CFG-&gt;{'SGML Parser'}) {
		+  if ( 0 && ! -x $CFG-&gt;{'SGML Parser'}) {
		     die("Configured SGML Parser '$CFG-&gt;{'SGML Parser'}' not executable!");
		   }

		@@ -533,9 +536,13 @@ if ($DEBUG) {

		 #
		 # Temporary filehandles.
		-my $spin  = IO::File-&gt;new_tmpfile;
		-my $spout = IO::File-&gt;new_tmpfile;
		-my $sperr = IO::File-&gt;new_tmpfile;
		+#my $spin  = IO::File-&gt;new_tmpfile; # ord: We create a different one.
		+#my $spout = IO::File-&gt;new_tmpfile;
		+#my $sperr = IO::File-&gt;new_tmpfile;
		+
		+# ord:
		+#   Write to our own temp file.
		+my $spin  = new IO::File " &gt; /tmp/smgl.stdin";

		 #
		 # Dump file to a temp file for parsing.
		@@ -549,18 +556,44 @@ seek $spin, 0, 0;

		 #
		 # Run it through SP, redirecting output to temporary files.
		-my $pid = do {
		-  no warnings 'once';
		-  local(*SPIN, *SPOUT, *SPERR)  = ($spin, $spout, $sperr);
		-  open3("&lt;&SPIN", "&gt;&SPOUT", "&gt;&SPERR", @cmd);
		-};
		+# ord:
		+#  open3() doesn't work on cygwin
		+#my $pid = do {
		+#  no warnings 'once';
		+#  local(*SPIN, *SPOUT, *SPERR)  = ($spin, $spout, $sperr);
		+#  open3("&lt;&SPIN", "&gt;&SPOUT", "&gt;&SPERR", @cmd);
		+#};
		+
		+# ord:
		+#  Create the command line, and run with system();
		+#my $cmd;
		+for(my $i = 0; $i &lt; scalar(@cmd); $i++){
		+	$cmd .= @cmd[$i] . " ";
		+}
		+
		+$cmd = 'cat /tmp/smgl.stdin | ' . $cmd;
		+$cmd .= ' &gt; /tmp/smgl.stdout 2&gt; /tmp/smgl.stderr';
		+#$cmd .= '-b /tmp/smgl.stdout -f /tmp/smgl.stderr';
		+# open(HANDLE, '|' . $cmd);
		+system($cmd);
		+
		+# ord:
		+#  Open the temp files we created, and set them to be deleted
		+#  when closed.
		+my $spout  = new IO::File "&lt; /tmp/smgl.stdout";
		+my $sperr  = new IO::File "&lt; /tmp/smgl.stderr";
		+# Delete all the files when closed.
		+#unlink("/tmp/smgl.stdout");
		+#unlink("/tmp/smgl.stderr");
		+#unlink("/tmp/smgl.stdin");


		 #
		 # Close input file, reap the kid, and rewind temporary filehandles.
		 undef $spin;
		-waitpid $pid, 0;
		-seek $_, 0, 0 for $spout, $sperr;
		+#my $pid = 0;
		+#waitpid $pid, 0;  #ord: don't wait.
		+seek $_,0, 0 for $spout,$sperr;

		 $File = &parse_errors($File, $sperr); # Parse error output.
		 undef $sperr; # Get rid of no longer needed filehandle.

		

Copy the SGML library to htdocs/sgml-lib.

Edit the validator configuration file to put all the file locations to where they should. It should be pretty self explainator on how to do so.

Optional Extras

That should be all that's required to get the W3C Validator work on on Microsoft Windows XP.

When using the service locally, it's worthwhile changing some links to use the local copies, rather than the remote copies.

A lot of the changes can simply be made in the config/check.cfg file. The Msg FAQ URI option helps for checking errors when not connected to the internet.

It's also worthwhile modifying htdocs/footer.html to reference http://<local_host>/images/vxhtm10.png rather than http://www.w3.org/Icons/valid-xhtml10.

Switch Styles

About Style Switching.