Markup, Character Data, and Parsing 113 Because the document type declaration specifies the root element, this must be the first element the parser encounters. If any other element but the one identified by the DOCTYPE line appears first, the document is immediately invalid. Listing 3-1 shows a very simple XHTML 1.0 document. The DOCTYPE is html (not xhtml), so the document body begins with <html ....> and ends with </html>. Listing 3-1 Simple XHTML 1.0 Document with XML Prolog and Document Body <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>XHTML 1.0</title> </head> <body> <h1>Simple XHTML 1.0 Example</h1> <p>See the <a href= "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">DTD</a>.</p> </body> </html> Markup, Character Data, and Parsing An XML document contains text characters that fall into two categories: either they are part of the document markup or part of the data content, usually called character data, which simply means all text that is not part of the markup. In other words, XML text consists of intermingled character data and markup. Lets revisit an earlier fragment. <Address> <Street>123 Milky Way</Street> <City>Columbia</City> <State>MD</State> <Zip>20777</Zip> </Address> The character data comprises the four strings 123 Milky Way, Columbia, MD, and 20777; the markup comprises the start and end tags for the five ele- ments Address, Street, City, State, and Zip. Note that this is similar but not iden- tical, to what we previously called content. For example, although each chunk of character data is the content of a particular element, the content of the Address ele- ment is all of the child elements. We can think of all the character data belonging to both the element that directly contains it and indirectly to Address. (In fact, in some sall03.fm Page 113 Wednesday, April 24, 2002 11:34 AM