XML Syntax Rules 115 XML document (which need not be a physical fileit can be a data stream) in order to split it into its various markup and character data, and more specifically, into ele- ments and their attributes. XML parsing reveals the structure of the information since the nesting of elements implies a hierarchy. It is possible for an XML docu- ment to fail to parse completely if it does not follow the well-formedness rules described in the XML 1.0 Recommendation. A successfully parsed XML document may be either well-formed (at a minimum) or valid, as discussed in detail later in this chapter and the next. There is a subtlety about processing character data. During the parsing process, if there is markup that contains entity references, the markup will be converted into character data. A typical example from XHTML would be: <p>"AT&T is a winning company," he said.</p> After the parser substitutes for the entities, the resultant character data is: "AT&T is a winning company," he said. After parsing and substituting for special characters, the character data that remains after the substitution is parsed character data, which is referred to as #PCDATA in DTDs and always refers to textual content of elements. Character data that is not parsed is called CDATA in DTDs; this relates exclusively to attribute values. XML Syntax Rules In this section, we explain the various syntactical rules of XML. Documents that follow these rules are called well-formed, but not necessarily valid, as well see. If your document breaks any of these rules, it will be rejected by most, if not all, XML parsers. Well-Formedness The minimal requirement for an XML document is that it be well-formed, mean- ing that it adheres to a small number of syntax rules,6 which are summarized in Table 3-1 and explained in the following sections. However, a document can abide by all these rules and still be invalid. To be valid, a document must both be well-formed and adhere to the constraints imposed by a DTD or XML Schema. 6. See the well-formedness discussion in the XML 1.0 Recommendation, http://www.w3.org/TR/ REC-xml#sec-well-formed. sall03.fm Page 115 Wednesday, April 24, 2002 11:34 AM