This page collects some thoughts on XML and links to some software.
 <strong="">It dates from 1997 and is not currently maintained.</strong>


This variant of XML is based on the XML draft and documents written in it
look very similar to those written in the language of the draft. But there
are a few important differences. The goals are similar to those of XML,
but I want to stress the following:


I'm thinking of adding another goal: it must have an associated machine-readable
format for expressing restrictions to the format. This set of restrictions
(similar to the `DTD' of SGML) allows generic tools to be written that can
check the suitability of an XML file for a particular application. Maybe
this format should itself be an application of XML.


Some examples of XML files are available on <a HREF="simple-XML.html"="">a separate
page.</a> The program packages below also include a few test files. The data
model of XML is described in `<a HREF="Datamodel.html"="">the XML data model</a>.'
There are also some thoughts on transporting the contents of
<a HREF="RDB.html"="">databases with XML</a>.


Here are some examples of programs that process (simple) XML. All Java software
is in <a HREF="xmllink.zip"="">xmllink.zip. </a> The
<a HREF="xmltest-doc/tree.html"="">documentation</a> is made with javadoc. The
software is in three packages: parser, tree and xptr. Included are a few
test programs:


The zip file contains both the source and the class files (compiled with
JDK 1.1; you'll need to recompile for JDK 1.0). If you have a CLASSPATH variable,
the zip-file can be added to it directly. For example under Unix, Bourne
shell:

CLASSPATH=$CLASSPATH:xmltest.zip
java xmltest <;some-XML-file>;
java xmlpipe <;some-XML-file>;


(If you don't have a CLASSPATH variable or the above doesn't work, you might
try unzipping the file, or ask a local guru.)


<a name="xml-in-c"="">A</a> <a HREF="9707/XML-in-C"="">Bison/Lex parser in C</a>
is also available. See the separate description. It shows a XML parser (core
syntax only, no linking, no validation) in just 13 productions and 12 tokens.


<a name="xmlbyhand" HREF="xmlbyhand.zip"="">xmlbyhand</a> (with
<a HREF="xmlbyhand-doc/tree.html"="">documentation)</a> is a (non-validating)
XML parser written in Java. It stores the parse tree in memory. The current
main program just dumps the parse tree again, in XML format. (The program
can read its own output.) The program may be useful as a `normalizer', but
the intention is really to provide some Java code that can be used in other
programs. [This program is `old', but still useful if you want to see a parser
that is not machine-generated.]


<a name="unix2coll" HREF="unix2coll.awk"="">unix2coll</a> is a small AWK script
that takes a Unix-style database (one record per line, fields separated by
a separator character) and outputs a "Web-collection". Web-collections will
probably use XML syntax, but the precise form is not yet decided. This is
just one of the possibilities, and probably not the best.


<a name="coll2unix" HREF="coll2unix.awk"="">coll2unix</a> is an AWK script that
does the opposite. It is meant to be used in a pipe after xmlpipe, and it
converts a Web-collection back into a table. Its arguments are the table
to extract (called `profile') and the field names to put into that table.
An <a HREF="example.html"="">example</a> shows how xmlpipe, unix2coll and coll2unix
work together.


The XML parsers above are very simple. They don't validate the input, and
they don't try to resolve a reference to a DTD. They rely on the well-formedness
of the input.


This is a variant of the Java-based parser above which may be more suitable
for certain kinds of XML data. It accepts the subset of XML 1.0 defined below,
and interprets certain constructs before passing the data on. The sources
are in a <a HREF="1998/03/xmlparser.zip"="">zip file</a>.


This is the grammar (compare the file Parser.ll1 in the zip file):


document
 : [ NEWLINE | misc ]*
 [ doctypedecl [ NEWLINE | misc ]* ]?
 [ element [ NEWLINE | misc ]* ]+
 ;
misc
 : COMMENT
 | PI
 | xmlinstruction
 ;
xmlinstruction
 : XML
 [ NAME
 [ %if (key.equals("version")) EQ LITERAL
 | %if (key.equals("encoding")) EQ qencoding
 | %if (key.equals("default")) defaultinfo
 ]
 ]*
 ENDPI
 | NAMESPACE attribute* ENDPI
 ;
doctypedecl
 : DOCTYPE NAME extid GT
 ;
attribute
 : NAME [ EQ LITERAL ]?
 ;
etag
 : [ ETAGO NAME? GT
 | ETAG
 ]
 ;
content
 : [ element
 | PCDATA
 | NEWLINE
 | ms
 | misc
 ]*
 ;
element
 : LT
 NAME
 attribute*
 [ GT content etag
 | EMPTY
 ]
 ;
extid
 : LITERAL
 ;
ms
 : MSSTART MSDATA MSEND
 ;
qencoding
 : LITERAL
 ;
quotedpairs
 : LITERAL
 ;
defaultinfo
 : NAME [ NAME EQ LITERAL ]*
 ;