Skip to content
On this page

Introduction to XML

XML stands for eXtensible Markup Language, a data representation format capable of describing very complex data structures. It is commonly used for data transmission and storage.

For example, an XML document describing a book might look like this:

xml
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE note SYSTEM "book.dtd">
<book id="1">
    <name>Core Java</name>
    <author>Cay S. Horstmann</author>
    <isbn lang="CN">1234567</isbn>
    <tags>
        <tag>Java</tag>
        <tag>Network</tag>
    </tags>
    <pubDate/>
</book>

XML has several characteristics: it is plain text, uses UTF-8 encoding by default, and supports nesting, making it suitable for representing structured data. If the XML content is stored as a file, it is called an XML file, such as book.xml. Additionally, XML content is often transmitted over the network as messages.

Structure of XML

XML has a fixed structure, with the first line always being <?xml version="1.0"?>, and an optional encoding can be added. Following this, if a declaration like <!DOCTYPE note SYSTEM "book.dtd"> is present, it specifies the Document Type Definition (DTD), which is optional. The XML document content follows, and an XML document must have exactly one root element, which can contain any number of child elements. Elements can have attributes, such as <isbn lang="CN">1234567</isbn>, and elements must be properly nested. Empty elements can be represented as <tag/>.

Because XML uses symbols such as <, >, and quotation marks, special symbols need to be escaped using &???; notation if they appear in the content. For example, Java<tm> must be written as:

xml
<name>Java&lt;tm&gt;</name>

Common special characters are listed below:

CharacterRepresentation
<&lt;
>&gt;
&&amp;
"&quot;
'&apos;

Well-formed XML means the format of the XML is correct and can be read by a parser. Valid XML means that the XML is not only well-formed but also its data structure can be validated against a DTD or XSD.

A DTD document can specify a set of rules, such as:

  • The root element must be book
  • The book element must include specified elements like name and author
  • The isbn element must have a lang attribute
  • ...

How can the correctness of an XML file be verified? The simplest way is through a browser. You can drag the XML file directly into the browser window; if the format is incorrect, the browser will report an error.

Unlike HTML, which is structurally similar, browsers have some "fault tolerance" for HTML, allowing it to be parsed even with missing closing tags. However, XML requires a strict format, and any improperly nested tags will cause errors.

XML Technology Stack

XML is a technology stack that, in addition to the XML documents we commonly use, supports:

  • DTD and XSD: Validating the structure and data of XML;
  • Namespace: The namespace for XML nodes and attributes;
  • XSLT: Transforming XML into another text format;
  • XPath: A language for querying XML nodes;
  • ...

In practice, the implementation of these related XML technologies is quite complex and is rarely used in real-world applications, so a basic understanding is usually sufficient.

Summary

XML uses a nested structure for data representation and supports format validation. It is commonly used for configuration files, network message transmission, and more.

Introduction to XML has loaded