predict what it will find. And many HTML extensions and modifications apply in one place but do not apply in another. Although XML is extensible—you can add all the tag types you wish—it is very strict in the way it allows you to do it. Like SGML, the for- matting of XML can be controlled by XSL documents used for transformations. XHTML XHTML stands for Extensible Hypertext Markup Language. An XHTML document is a hybrid of XML and HTML in such a way that it is syntactically correct for both of them. That is, although an XHTML document can be displayed by a Web browser, it can also be parsed into its component parts by a SAX or DOM parser. Both XML and HTML are subsets of SGML, so the only problem in combining the two into XHTML was in dealing with the places where HTML had departed from the standard format. Most obvious are the many opening tags in HTML that do not have closing tags to match them and the fact that tag nesting is not required. XHTML was conceived so that, once Web browsers were capable of dealing with the strict and standard forms required for XML, a more standardized form of Web page could evolve. With XML it is relatively easy to introduce new forms by defining addi- tional elements and attributes, and because this same technique is part of XHTML, it will allow the smooth integration of new features with the existing ones. This capabil- ity is particularly attractive because alternate ways of accessing the Internet are con- stantly being developed. The presence of a standard, parsable Web page will allow easier modification to the display format for new demands, such as the special re- quirements of hand-held computers. There  is  a  fundamental  difference  between  XML  and  HTML.  XML  is  an  SGML, while HTML is an application of SGML. That is, SGML does not have any tag names de- fined and neither does XML. For both XML and SGML, a DTD must be used to define and provide meanings for element names. On the other hand, HTML has a set of ele- ment names already defined. The element names of HTML are the ones that have a meaning to the Web browser attempting to format the page. This fundamental differ- ence between XML and HTML can be overcome by the creation of a DTD that defines the syntax for all the elements that are used in HTML. With such a DTD in place, an XML document that adheres to the DTD’s definitions will also be an HTML document, and thus it can be displayed using a Web browser. JAXP JAXP stands for Java API for XML Processing. It is a set of Java classes and interfaces specifically designed to be used in a program to make it capable of reading, manipu- lating, and writing XML-formatted data. It includes complete parsers for SAX1 and SAX2 and the two types of DOM: DOM Level 1 and DOM Level 2. Most of this book explores the use of parsers in extracting Introduction to XML with JAXP 9 3851 P-01  1/28/02  10:32 AM  Page 9