perfectxml.com
 Basic Search  Advanced Search   
Topics Resources Free Library Software XML News About Us
Home » Topics » XML

15 minute guide to XML

Extensible Markup Language (XML) is a W3C Recommendation ( http://www.w3.org/TR/REC-xml) for how to represent information in a text-based document.

To give you an analogy: if Java is portable code, then, XML is portable data.

XML is extensible (as it does not have any fixed set of tags); it makes use of markup (angled brackets, elements, attributes) to add meaning to the text; and it is a meta-language (language used to create other languages; XML syntax is used for various other markup languages, such as SVG, XSLT, WML, and so on).

XML is not...

  • ...a programming language like C++, Visual Basic, ...
    Even though XML has word "language" in its acronym, it does not refer to programming language; but it means meta-language. Syntax/rules defined by XML can be used to create other markup languages.
  • ...only for Internet/Web applications
    It is true that XML is best choice if you need to transfer data cross-platform over Internet. However, XML is not just for Internet/Web applications. Today, XML is being used for wide variety of applications, such as graphics (SVG), configuration files, code documentation, speech/multimodal, MathML, and so on.
  • ...something to replace or in competition with HTML
    When you first look at XML, it might look very similar to HTML. Why not, like HTML, XML is also a markup language; XML document also has hierarchical structure containing elements (start-tag and end-tag, example <name>Darshan</name>) and attributes. However, XML is not here to replace HTML. XML's goal and utility is quite different from that of HTML. It is true that XML is being used to make HTML better (XHTML); but XML is not necessarily here to only replace or compete with HTML. HTML is about presentation; XML is all about data.
  • ...some proprietary technology
    As noted earlier, XML is created by the same standards body (W3C) that created HTML. XML standard has received excellent tool and vendor support. All major vendors support XML and provide tools/technologies that use XML and/or help in working with XML.
  • ...t-shirt size (Large, XL, XXL, ......XML)
  • ...an unusual roman number
  • ...eXcellent Marketing Lingo
  • ...eXciting Modern Language
  • ...eXcessive aMount of pubLicity
Looking for XML Training?  Check out the Online XML Course from Webucator!
Webucator, offers At-Your-Own-Pace Online XML training Call 1-877-WEBUCATE or visit www.webucator.com for complete details.

Unlike HTML

  • XML is all about data; it does not provide any display/presentation details.
  • XML does not have a fixed set of tags
  • XML is case-sensitive
  • XML has strict rules
    • Each start tag should have an end tag (<Name>D</Name> or <Comments />)
    • Attribute values must be in single or double quotes (<Vendor id="1" /> or <Vendor id='1' />)
    • Tags cannot overlap (<a><b></a></b> is not allowed). They should be properly nested (<a><b></b></a>).
    • Only one top/root element is allowed.
    • Strict rules for element names (names such as <123 /> is not allowed)
    • No element may have two attributes with the same name

Well-Formed and Valid XML

All documents that conform to XML 1.0 rules (one root element, matching start and end tags, attributes in quotation marks, ...) are known as Well-Formed XML documents. In addition to above XML 1.0 rules, if XML documents also follow the rules that you have defined (structural rules, such as hierarchy of elements, presence of attributes, value data types, child-element occurrences, etc.), that document can be called as a Valid XML document. You can write DTDs/XML schemas (XSD) to validate the XML documents. DTDs/XML Schemas help in making sure that XML structure looks like as you expected. In other words, all Well-Formed XML documents that adhere to structure defined by DTD or XML Schema, are known as valid XML document. All valid XML documents are well-formed, but converse may not necessarily be true.

In a well-formed XML document:
  • entity references are used for five special characters (&amp; for &; &lt; for <; &gt; for >; &apos; for '; and &quot; for "
  • character references are used for other special characters (for example: &#0174; or &#xAE; is used for ®; &#8486; or &#x2126; for Ω)
  • Characters between 0 to 31 (except CR, LF, and tab) are not allowed

What’s so great about XML?

  • Self-describing data in text format
    XML's textual nature makes it highly portable. This is the reason XML is being heavily used for cross-platform data integration. If meaningful tag/attribute names are used to enclose the data, the document becomes self-describing (compare this with comma-separated-values CSV or fixed-length delimited data).

  • Open, Standard, License-free, Platform-neutral with great tool and vendor support
    Many developers have started using XML in their application design/architecture because of the fact that XML is an open standard (and hence avoiding vendor lock-in); and at the same time all vendors agree upon and support XML; and also that XML is surrounded by many supporting standards (XSLT, XPath, XML Schema, and so on) that help while implementing XML solution.

  • Clean separation between Content & Presentation
    Once you have your data in XML format, you can transform same XML document into HTML, text, SVG (scalable vector graphics), WML (wireless markup language), PDF, or to any other format that you desire. And hence you have "document-view" architecture, wherein there is a clean separation between your content/data and the presentation.

  • Unicode support
    XML documents may contain Unicode characters (excluding the surrogate blocks, FFFE, and FFFF); and hence XML can be readily used for international applications.

  • Easy to transfer and transform
    XML can be easily transformed to any other format. XSLT stylesheets can be used to transform XML document into any other format (such as HTML, CSV, PDF, and so on).

  • Easy to parse, process, and search
    XML parsers are used to parse and process XML. For instance, MSXML (Microsoft XML Core Services) or .NET can be used on Microsoft platform to work with XML. Similarly, there are many Java-based XML processing APIs (JAXP, Xerces, Xalan, and so on) available from various vendors, including Sun, Apache, and Oracle.

  • (Machine and) Human-readable
    XML documents are text documents; and hence you don't need any special tools to read or write XML documents. Just notepad would do!

  • Hierarchical Structure
    XML documents are hierarchical in nature – with one top-level root element, and hence is an excellent choice for modeling hierarchical data in an easy-to-read fashion.

  • Enables many other technologies (for example: Web services)
    Web services refer to cross-platform messaging over Internet. Web services facilitate application integration. XML is one of the core building blocks in the Web services architecture. For instance, the caller (client) sends the XML-formatted message envelope to the server; and in return, server sends XML-formatted response.

What's NOT so great about XML?

  • XML is a space, processor, and bandwidth hog
    If you are working on an application which let's assume that will be used inside corporate network and that good performance and/or low network bandwidth usage is a critical requirement. In such cases, it does not makes sense to use XML for data transfer! You can use proprietary binary format for optimized results. XML's textual nature and markup requirements places more demand for space, bandwidth, and processor.

What about binary data?

We now know that XML documents are text data. What if you need to transmit some binary data? There are primarily three options if you need to send some binary data along with XML:
  1. Provide a reference
    Instead of sending binary data along with XML, you can just include a reference to the binary data and let the client get to it separately.
  2. Send as an "Attachment" – MIME/DIME
  3. Base64/Hex encoding
    See www.perfectxml.com/articles/xml/binary.asp for more details.

Current Applications of XML

Following is just a small list of how XML is being used today:
  • “Data on Move”
    This is the most common use of XML – to transfer data from one machine to another, cross-platform, cross-networks, over the Internet.

  • Application Integration (EAI, legacy applications, eBusiness)

  • Content Management

  • Messaging (SOAP and Web Services)

  • XML as a file format (WordML, InfoPath, Star Office)

  • Miscellaneous
    • Configuration Files
    • Code documentation
    • RSS (news/weblog syndication)
    • Graphics (SVG)
    • Multimodal Applications (SALT, VoiceXML, WAP, WML)
    • XForms and other data collection methods (InfoPath)

Sample XML Document

Next Steps

This was a quick 15 minutes overview of XML. If you would like to learn more about XML, read following sample chapters: More Sample Chapters...
 
XML Books Recommendations


Effective XML: 50 Specific Ways to Improve Your XML


XML: A Manager's Guide , 2nd Edition


XML Family of Specifications: A Practical Guide


XML Companion, The , 3rd Edition


Real World XML , 2nd Edition


XML Software Recommendations


xmlspy® 2004


<oXygen/>


Learn more about XML

Chapter 1: Introduction to XML Technologies from the book Professional ASP.NET 1.0 XML with C#

Chapter 1: Essential XML from the book Real World XML

Chapter 2: Introducing XML from the book Professional XML 2nd Edition



XML Training Institutes



  Contact Us | E-mail Us | Site Guide | About PerfectXML | Advertise ©2004 perfectxml.com. All rights reserved. | Privacy