The internal form of the data tree resulting from a DOM parse is quite convenient if you are going to be accessing document content out of order. That is, if your program needs to rearrange the incoming data for its output, or if it needs to move around the document and select data in random-access order, you should find that the DOM doc- ument tree will provide what you need for doing this. You can search for things in the tree and pull out what you need without regard to where it appeared in the input doc- ument. One disadvantage of DOM is that a large document will take up a lot of space because the entire document is held in memory. With modern operating systems, how- ever, the document would need to be extremely large before it would adversely affect anything. DOM also has the disadvantage of being more complicated to use than SAX. Because DOM can randomly access the stored data, the API for it is necessarily more complex. Although DOM is more complicated to use than SAX, it can be used to do much more. For more details about how DOM works, see Chapters 3, 6, and 7. Internally in the JAXP, the DOM parser actually uses SAX as its lexical scanner. That is, a SAX parser is used to read the document and break it down into a stream of its components, and the DOM software takes this token stream and constructs a tree from it. This is why it is best to have an understanding of SAX before trying to get a clear idea of how JAXP DOM works. Although you may never use SAX directly, it’s a good idea to know how it works and how the incoming document is broken down. At the very least, you will need to be familiar with the meaning of its error messages and how to process them in your application, which means you will need to know how SAX works. For more details, see Chapters 3, 4, and 5 SGML SGML stands for Standard Generalized Markup Language. This is the parent markup language of XML and HTML, which were both derived as special-purpose subsets of SGML. Included in the 500-page SGML specification document is a definition of the system for organizing and tagging elements in a document. It became a standard with the International Organization of Standards (ISO) in 1986, but the specification had ac- tually been in use some time before that. It was designed to manage large documents so that they could be frequently changed and also printed. It is a large language defin- ition and too difficult to actually implement, which has resulted in the subsets XML and HTML. XML works well being a subset of SGML because the complexity of SGML isn’t nec- essary to do all of the tagging and transforming that needs to be done. Being a practi- cal subset makes it much easier to write a parser for XML. Because of the reduction in complexity  of  the  language,  XML  documents  are  smaller  and  easier  to  create  than SGML documents would be. For example, where SGML always requires the presence of a DTD, in XML the DTD is largely optional. If you are going to validate the correct- ness of an XML document, the DTD is necessary, but otherwise it can be omitted. XML is a bit closer to being like SGML than is HTML. For one thing, HTML is filled with ambiguities because it allows things like an opening tag that has no closing tag to match it. This prevents any attempts to standardize HTML because a parser cannot 8 Chapter 1 3851 P-01  1/28/02  10:32 AM  Page 8