116 Chapter 3 XML Syntax and Parsing Concepts Legal XML Name Characters An XML Name (sometimes called simply a Name) is a token that "  begins with a letter, underscore, or colon (but not other punctuation) "  continues with letters, digits, hyphens, underscores, colons, or full stops [periods], known as name characters. Names   beginning   with   the   string   "xml",   or   any   string   which   would   match (('X'|'x')('M'|'m')('L'|'l')), are reserved. Element and attribute names must be valid XML Names. (Attribute values need not  be.)  An  NMTOKEN  (name  token)  is  any  mixture  of  name  characters  (letters, digits, hyphens, underscores, colons, and periods). The Namespaces in XML Recommendation assigns a meaning to names that contain colon characters. Therefore, authors should not use the colon in XML names except for namespace purposes (e.g., xsl:template). Listing   3-2   illustrates   a   number   of   legal   XML   Names,   followed   by   three that  should  be  avoided  but  may  or  may  not  be  identified  as  illegal,  depend- ing  on  the  XML  parser  you  use,  and  four  that  are  definitely  illegal.  (This  is file  name-tests.xml  on  the  CD;  you  can  try  this  with  your  favorite  parser,  or with  one  of  the  ones  provided  on  the  CD.) TABLE 3-1  XML Syntax Rules (Well-Formedness Constraints) " The document must have a consistent, well-defined structure. " All attribute values must be quoted (single or double quotes). " White space in content, including line breaks, is significant by default. " All start tags must have corresponding end tags (exception: empty elements). " The root element must contain all others, which must nest properly by start/end tag pairing. " Elements must not overlap; they may be nested, however. (This is also technically true for HTML. Browsers ignore overlapping in HTML, but not in XML.) " Each element except the root element must have exactly one parent element that contains it. " Element and attribute names are case-sensitive: Price and PRICE are different elements. " Keywords such as DOCTYPE and ENTITY must always appear in uppercase; similarly for other DTD keywords such as ELEMENT and ATTLIST. " Tags without content are called empty elements and must end in "/>". NOTE sall03.fm  Page 116  Wednesday, April 24, 2002  11:34 AM