Appearance
XML
Although XML is more complex than JSON and not as commonly used in web applications today, it's still important to understand how to work with XML.
DOM vs SAX
There are two primary methods for parsing XML: DOM and SAX.
- DOM loads the entire XML document into memory, creating a tree structure. This consumes more memory and can be slower, but allows for easy traversal of nodes.
- SAX operates in a streaming fashion, parsing the XML as it reads it. This method is more memory-efficient and faster, but requires the programmer to handle events manually.
In most cases, it's preferable to use SAX due to its lower memory usage.
SAX Example
Using SAX to parse XML in Python is straightforward. You typically define three key event handlers: start_element
, end_element
, and char_data
. Here's a simple example:
python
from xml.parsers.expat import ParserCreate
class DefaultSaxHandler(object):
def start_element(self, name, attrs):
print('sax:start_element: %s, attrs: %s' % (name, str(attrs)))
def end_element(self, name):
print('sax:end_element: %s' % name)
def char_data(self, text):
print('sax:char_data: %s' % text)
xml = r'''<?xml version="1.0"?>
<ol>
<li><a href="/python">Python</a></li>
<li><a href="/ruby">Ruby</a></li>
</ol>
'''
handler = DefaultSaxHandler()
parser = ParserCreate()
parser.StartElementHandler = handler.start_element
parser.EndElementHandler = handler.end_element
parser.CharacterDataHandler = handler.char_data
parser.Parse(xml)
Note: When reading large strings, the CharacterDataHandler
might be called multiple times, so you'll need to concatenate the data and process it in the EndElementHandler
.
Generating XML
Most of the time, generated XML structures are simple. The most straightforward way to create XML is by concatenating strings:
python
L = []
L.append(r'<?xml version="1.0"?>')
L.append(r'<root>')
L.append(encode('some & data'))
L.append(r'</root>')
return ''.join(L)
For more complex XML, it's advisable to use JSON instead.
Summary
When parsing XML, focus on the nodes of interest, save the data during event handling, and process it after parsing is complete.
Exercise
Here's an exercise to parse the XML format of weather forecasts from WeatherAPI using SAX:
python
from xml.parsers.expat import ParserCreate
from urllib import request
class WeatherSaxHandler(object):
def __init__(self):
self.city = ''
self.condition = ''
self.temperature = ''
self.wind = ''
self.current_tag = ''
def start_element(self, name, attrs):
self.current_tag = name
def end_element(self, name):
if name == 'location':
self.city = self.current_data
elif name == 'current':
pass # You could process the current weather here
def char_data(self, text):
if self.current_tag == 'name':
self.current_data = text.strip()
elif self.current_tag == 'condition':
self.condition = text.strip()
elif self.current_tag == 'temp_c':
self.temperature = text.strip()
elif self.current_tag == 'wind_kph':
self.wind = text.strip()
def parseXml(xml_str):
handler = WeatherSaxHandler()
parser = ParserCreate()
parser.StartElementHandler = handler.start_element
parser.EndElementHandler = handler.end_element
parser.CharacterDataHandler = handler.char_data
parser.Parse(xml_str)
return {
'city': handler.city,
'weather': {
'condition': handler.condition,
'temperature': handler.temperature,
'wind': handler.wind,
}
}
# Test
URL = 'https://api.weatherapi.com/v1/current.xml?key=b4e8f86b44654e6b86885330242207&q=Beijing&aqi=no'
with request.urlopen(URL, timeout=4) as f:
data = f.read()
result = parseXml(data.decode('utf-8'))
assert result['city'] == 'Beijing'
print(result)
This code defines a SAX handler for weather data and verifies that the parsed city matches "Beijing."