105 Chapter 3 XML Syntax and Parsing Concepts Elements, Tags, Attributes, and Content To understand XML syntax, we must first be familiar with several basic terms from HTML (and SGML) terminology. XML syntax, however, differs in some important ways from both HTML and SGML, as well see. Elements are the essence of document structure. They represent pieces of infor- mation and may or may not contain nested elements that represent even more spe- cific information, attributes, and/or textual content. In our employee directory example from chapter 2 (Listing 2-2), some of the elements were Employees, Employee, Name, First, Last, Project, and PhoneNumbers. Tags are the way elements are indicated or marked up in a document. For each element,1 there is typically a start tag that begins with < (less than) and ends with > (greater than), and an end tag that begins with </ and ends with >. Some of the start tags in our example were <Employees>, <Employee>, <Name>, and so forth. The corresponding end tags for these elements were </Employees>, </Employee>, and </Name>. If an element has one or more attributes, they must appear between the < and > delimiters of the start tag. Attributes are qualifying pieces of information that add detail and further define an instance of an element. They are typically details that the language designer feels do not need to be nested elements themselves; the In this chapter, we cover the rules of XML syntax that are stated or implied in the XML 1.0 Recommendation from the W3C. A considerable amount of XML terminology is introduced, including discussions of pars- ing, well-formedness, and validation. XML document structure, legal XML Names, and CDATA are also among the topics. The XML 1.0 specification also discusses rules for Document Type Definitions (DTDs), which we present in chapter 4. The material in chapters 3 and 4 is very interrelated. 1. With the exception of something called an empty element, as we will soon discuss. sall03.fm Page 105 Wednesday, April 24, 2002 11:34 AM