predict what it will find. And many HTML extensions and modifications apply in one
place but do not apply in another. Although XML is extensibleyou can add all the tag
types you wishit is very strict in the way it allows you to do it. Like SGML, the for-
matting of XML can be controlled by XSL documents used for transformations.
XHTML stands for Extensible Hypertext Markup Language. An XHTML document is
a hybrid of XML and HTML in such a way that it is syntactically correct for both of
them. That is, although an XHTML document can be displayed by a Web browser, it
can also be parsed into its component parts by a SAX or DOM parser. Both XML and
HTML are subsets of SGML, so the only problem in combining the two into XHTML
was in dealing with the places where HTML had departed from the standard format.
Most obvious are the many opening tags in HTML that do not have closing tags to
match them and the fact that tag nesting is not required.
XHTML was conceived so that, once Web browsers were capable of dealing with the
strict and standard forms required for XML, a more standardized form of Web page
could evolve. With XML it is relatively easy to introduce new forms by defining addi-
tional elements and attributes, and because this same technique is part of XHTML, it
will allow the smooth integration of new features with the existing ones. This capabil-
ity is particularly attractive because alternate ways of accessing the Internet are con-
stantly being developed. The presence of a standard, parsable Web page will allow
easier modification to the display format for new demands, such as the special re-
quirements of hand-held computers.
There is a fundamental difference between XML and HTML. XML is an SGML,
while HTML is an application of SGML. That is, SGML does not have any tag names de-
fined and neither does XML. For both XML and SGML, a DTD must be used to define
and provide meanings for element names. On the other hand, HTML has a set of ele-
ment names already defined. The element names of HTML are the ones that have a
meaning to the Web browser attempting to format the page. This fundamental differ-
ence between XML and HTML can be overcome by the creation of a DTD that defines
the syntax for all the elements that are used in HTML. With such a DTD in place, an
XML document that adheres to the DTDs definitions will also be an HTML document,
and thus it can be displayed using a Web browser.
JAXP stands for Java API for XML Processing. It is a set of Java classes and interfaces
specifically designed to be used in a program to make it capable of reading, manipu-
lating, and writing XML-formatted data.
It includes complete parsers for SAX1 and SAX2 and the two types of DOM: DOM
Level 1 and DOM Level 2. Most of this book explores the use of parsers in extracting
Introduction to XML with JAXP
3851 P-01 1/28/02 10:32 AM Page 9