Appearance
Using SAX
Using DOM to parse XML is convenient, but its main drawback is the large memory consumption.
An alternative approach to parsing XML is SAX. SAX stands for Simple API for XML, which is a stream-based parsing method that reads and parses the XML while triggering event callbacks to provide data to the caller. Since it parses the XML as it is read, memory consumption is minimal regardless of the XML size.
SAX parsing triggers a series of events:
- startDocument: Begins reading the XML document;
- startElement: Reads an element, such as
<book>
; - characters: Reads character data;
- endElement: Reads an ending element, such as
</book>
; - endDocument: Ends reading the XML document.
To parse XML using the SAX API, the Java code looks like this:
java
InputStream input = Main.class.getResourceAsStream("/book.xml");
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser saxParser = spf.newSAXParser();
saxParser.parse(input, new MyHandler());
The key code SAXParser.parse()
requires an InputStream
and a callback object, which must extend DefaultHandler
:
java
class MyHandler extends DefaultHandler {
public void startDocument() throws SAXException {
print("start document");
}
public void endDocument() throws SAXException {
print("end document");
}
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
print("start element:", localName, qName);
}
public void endElement(String uri, String localName, String qName) throws SAXException {
print("end element:", localName, qName);
}
public void characters(char[] ch, int start, int length) throws SAXException {
print("characters:", new String(ch, start, length));
}
public void error(SAXParseException e) throws SAXException {
print("error:", e);
}
void print(Object... objs) {
for (Object obj : objs) {
System.out.print(obj);
System.out.print(" ");
}
System.out.println();
}
}
Running the SAX parsing code can produce the following output:
start document
start element: book
characters:
start element: name
characters: Core Java
end element: name
characters:
start element: author
...
If you need to read the text of the <name>
node, you have to locate the current node during the parsing process using startElement()
and endElement()
. A stack can be used to keep track: push to the stack on each startElement()
and pop on each endElement()
. This way, when characters()
is called, you know which node's text is being read. It shows that using the SAX API is still somewhat cumbersome.
Exercise
Use SAX to parse XML.
Summary
- SAX is a stream-based XML parsing API;
- SAX parses XML quickly with minimal memory consumption through event triggers;
- The caller must obtain data during the parsing process through callback methods.