Skip to content
On this page

Using DOM

Since XML is a tree-structured document, it has two standard parsing APIs:

  • DOM: Reads the entire XML at once and represents it as a tree structure in memory;
  • SAX: Reads XML as a stream, using event callbacks.

Let's first look at how to use DOM to read XML.

DOM stands for Document Object Model. The DOM model treats the XML structure as a tree, starting from the root node, where each node can contain any number of child nodes.

Using the following XML as an example:

xml
<?xml version="1.0" encoding="UTF-8" ?>
<book id="1">
    <name>Core Java</name>
    <author>Cay S. Horstmann</author>
    <isbn lang="CN">1234567</isbn>
    <tags>
        <tag>Java</tag>
        <tag>Network</tag>
    </tags>
    <pubDate/>
</book>

If parsed into a DOM structure, it would look approximately like this:

                      ┌─────────┐
                      │document │
                      └─────────┘


                      ┌─────────┐
                      │  book   │
                      └─────────┘

         ┌──────────┬──────────┼──────────┬──────────┐
         ▼          ▼          ▼          ▼          ▼
    ┌─────────┐┌─────────┐┌─────────┐┌─────────┐┌─────────┐
    │  name   ││ author  ││  isbn   ││  tags   ││ pubDate │
    └─────────┘└─────────┘└─────────┘└─────────┘└─────────┘

                             ┌────┴────┐
                             ▼         ▼
                         ┌───────┐ ┌───────┐
                         │  tag  │ │  tag  │
                         └───────┘ └───────┘

Notice that the top-level document represents the XML document itself, which is the true "root". Although <book> is the root element, it is a child node of document.

Java provides the DOM API to parse XML, which uses the following objects to represent the XML content:

  • Document: Represents the entire XML document;
  • Element: Represents an XML element;
  • Attribute: Represents an attribute of an element.

The code to parse an XML document using the DOM API is as follows:

java
InputStream input = Main.class.getResourceAsStream("/book.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(input);

DocumentBuilder.parse() is used to parse an XML, which can accept an InputStream, File, or URL. If the parsing is successful, we obtain a Document object that represents the tree structure of the entire XML document. We need to traverse it to read the values of specific elements:

java
void printNode(Node n, int indent) {
    for (int i = 0; i < indent; i++) {
        System.out.print(' ');
    }
    switch (n.getNodeType()) {
    case Node.DOCUMENT_NODE: // Document node
        System.out.println("Document: " + n.getNodeName());
        break;
    case Node.ELEMENT_NODE: // Element node
        System.out.println("Element: " + n.getNodeName());
        break;
    case Node.TEXT_NODE: // Text node
        System.out.println("Text: " + n.getNodeName() + " = " + n.getNodeValue());
        break;
    case Node.ATTRIBUTE_NODE: // Attribute node
        System.out.println("Attr: " + n.getNodeName() + " = " + n.getNodeValue());
        break;
    default: // Other nodes
        System.out.println("NodeType: " + n.getNodeType() + ", NodeName: " + n.getNodeName());
    }
    for (Node child = n.getFirstChild(); child != null; child = child.getNextSibling()) {
        printNode(child, indent + 1);
    }
}

The parsed structure is as follows:

Document: #document
 Element: book
  Text: #text = 
  
  Element: name
   Text: #text = Core Java
  Text: #text = 
  
  Element: author
   Text: #text = Cay S. Horstmann
  Text: #text = 
  ...

For the structure parsed by the DOM API, starting from the root node Document, you can traverse all child nodes to obtain all elements, attributes, and text data, including comments. These nodes are collectively referred to as Node. Each Node has its own Type, which distinguishes whether a Node is an element, attribute, text, etc.

When using the DOM API, if you need to read the text of a particular element, you must access its child node of type Text, making it somewhat cumbersome to use.

Exercise

Use DOM to parse XML.

Summary

  • Java's DOM API can parse XML into a DOM structure, represented by the Document object;
  • DOM can fully represent the XML data structure in memory;
  • DOM parsing is slow and consumes a lot of memory.
Using DOM has loaded