116 Chapter 3 XML Syntax and Parsing Concepts Legal XML Name Characters An XML Name (sometimes called simply a Name) is a token that " begins with a letter, underscore, or colon (but not other punctuation) " continues with letters, digits, hyphens, underscores, colons, or full stops [periods], known as name characters. Names beginning with the string "xml", or any string which would match (('X'|'x')('M'|'m')('L'|'l')), are reserved. Element and attribute names must be valid XML Names. (Attribute values need not be.) An NMTOKEN (name token) is any mixture of name characters (letters, digits, hyphens, underscores, colons, and periods). The Namespaces in XML Recommendation assigns a meaning to names that contain colon characters. Therefore, authors should not use the colon in XML names except for namespace purposes (e.g., xsl:template). Listing 3-2 illustrates a number of legal XML Names, followed by three that should be avoided but may or may not be identified as illegal, depend- ing on the XML parser you use, and four that are definitely illegal. (This is file name-tests.xml on the CD; you can try this with your favorite parser, or with one of the ones provided on the CD.) TABLE 3-1 XML Syntax Rules (Well-Formedness Constraints) " The document must have a consistent, well-defined structure. " All attribute values must be quoted (single or double quotes). " White space in content, including line breaks, is significant by default. " All start tags must have corresponding end tags (exception: empty elements). " The root element must contain all others, which must nest properly by start/end tag pairing. " Elements must not overlap; they may be nested, however. (This is also technically true for HTML. Browsers ignore overlapping in HTML, but not in XML.) " Each element except the root element must have exactly one parent element that contains it. " Element and attribute names are case-sensitive: Price and PRICE are different elements. " Keywords such as DOCTYPE and ENTITY must always appear in uppercase; similarly for other DTD keywords such as ELEMENT and ATTLIST. " Tags without content are called empty elements and must end in "/>". NOTE sall03.fm Page 116 Wednesday, April 24, 2002 11:34 AM