XML Syntax Rules 115 XML document (which need not be a physical fileit can be a data stream) in order to split it into its various markup and character data, and more specifically, into ele- ments  and  their  attributes.  XML  parsing  reveals  the  structure  of  the  information since the nesting of elements implies a hierarchy. It is possible for an XML docu- ment  to  fail  to  parse  completely  if  it  does  not  follow  the  well-formedness  rules described in the XML 1.0 Recommendation. A successfully parsed XML document may be either well-formed (at a minimum) or valid, as discussed in detail later in this chapter and the next. There is a subtlety about processing character data. During the parsing process, if there is markup that contains entity references, the markup will be converted into character data. A typical example from XHTML would be: <p>&quot;AT&amp;T is a winning company,&quot; he said.</p> After the parser substitutes for the entities, the resultant character data is: "AT&T is a winning company," he said. After  parsing  and  substituting  for  special  characters,  the  character  data  that remains  after  the  substitution  is  parsed  character  data,  which  is  referred  to  as #PCDATA in DTDs and always refers to textual content of elements. Character data that  is  not  parsed  is  called  CDATA  in  DTDs;  this  relates  exclusively  to  attribute values. XML Syntax Rules In  this  section,  we  explain  the  various  syntactical  rules  of  XML.  Documents  that follow  these  rules  are  called  well-formed,  but  not  necessarily  valid,  as  well  see.  If your document breaks any of these rules, it will be rejected by most, if not all, XML parsers. Well-Formedness The  minimal  requirement  for  an  XML  document  is  that  it  be  well-formed,  mean- ing that it adheres to a small number of syntax rules,6 which are summarized in Table  3-1  and  explained  in  the  following  sections.  However,  a  document  can abide by all these rules and still be invalid. To be valid, a document must both be well-formed and adhere to the constraints imposed by a DTD or XML Schema. 6.  See the well-formedness discussion in the XML 1.0 Recommendation, http://www.w3.org/TR/ REC-xml#sec-well-formed. sall03.fm  Page 115  Wednesday, April 24, 2002  11:34 AM