Reading XML File :-
<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<food>
<item name="breakfast">Idly</item>
<price>$2.5</price>
<description>
Two idly's with chutney
</description>
<calories>553</calories>
</food>
<food>
<item name="breakfast">Paper Dosa</item>
<price>$2.7</price>
<description>
Plain paper dosa with chutney
</description>
<calories>700</calories>
</food>
<food>
<item name="breakfast">Upma</item>
<price>$3.65</price>
<description>
Ravaupma with bajji
</description>
<calories>600</calories>
</food>
<food>
<item name="breakfast">BisiBele Bath</item>
<price>$4.50</price>
<description>
BisiBele Bath with sev
</description>
<calories>400</calories>
</food>
<food>
<item name="breakfast">Kesari Bath</item>
<price>$1.95</price>
<description>
Sweet rava with saffron
</description>
<calories>950</calories>
</food>
</metadata>
The above example shows the contents of a file which I have named as ‘Sample.xml’
Python XML Parsing Modules
Python allows parsing these XML documents using two modules namely, the
xml.etree.ElementTree module and Minidom (Minimal DOM Implementation). Parsing means to
read information from a file and split it into pieces by identifying parts of that particular XML file.
xml.etree.ElementTree Module:
This module helps us format XML data in a tree structure which is the most natural
representation of hierarchical data. Element type allows storage of hierarchical data structures in
memory and has the following properties:
Property Description
It is a string representing the type of data
Tag
being stored
Consists of a number of attributes stored as
Attributes
dictionaries
A text string having information that needs
Text String
to be displayed
Tail String Can also have tail strings if necessary
Consists of a number of child elements
Child Elements
stored as sequences
ElementTree is a class that wraps the element structure and allows conversion to and from XML.
Let us now try to parse the above XML file using python module.
There are two ways to parse the file using ‘ElementTree’ module. The first is by using the parse()
function and the second is fromstring() function. The parse () function parses XML document
which is supplied as a file whereas,fromstring parses XML when supplied as a string i.e within
triple quotes.
Using parse() function:-
As mentioned earlier, this function takes XML in file format to parse it. Take a look at the following
example:
importxml.etree.ElementTree as ET
mytree = ET.parse('Sample.xml')
myroot = mytree.getroot()
print(myroot)
As you can see, The first thing you will need to do is to import the xml.etree.ElementTree module.
Then, the parse() method parses the ‘Sample.xml’ file. The getroot() method returns the root
element of ‘Sample.xml’.
To check for the root element, you can simply use the print statement as follows:
OUTPUT:
<Element ‘metadata’ at 0x033589F0>
The above output indicates that the root element in our XML document is ‘metadata’.
Using fromstring() function:
You can also use fromstring() function to parse your string data. In case you want to do this, pass
your XML as a string within triple quotes as follows:
importxml.etree.ElementTree as ET
data='''<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<food>
<item name="breakfast">Idly</item>
<price>$2.5</price>
<description>
Two idly's with chutney
</description>
<calories>553</calories>
</food>
</metadata>
'''
myroot = ET.fromstring(data)
#print(myroot)
print(myroot.tag)
You can also slice the tag string output by just specifying which part of the string you want to see
in your output.
EXAMPLE:
print(myroot.tag[0:4])
OUTPUT:
meta
As mentioned earlier, tags can have dictionary attributes as well. To check if the root tag has any
attributes you can use the ‘attrib’ object as follows:
EXAMPLE:
print(myroot.attrib)
OUTPUT:
{}
As you can see, the output is an empty dictionary because our root tag has no attributes.
Finding Elements of Interest:
The root consists of child tags as well. To retrieve the child of the root tag, you can use the
following:
print(myroot[0].tag)
OUTPUT: food
Now, if you want to retrieve all first-child tags of the root, you can iterate over it using the for loop
as follows:
for x in myroot[0]:
print(x.tag, x.attrib
OUTPUT:
item {‘name’: ‘breakfast’}
price {}
description {}
calories {}
All the items returned are the child attributes and tags of food.
To separate out the text from XML using ElementTree, you can make use of the text attribute. For
example, in case I want to retrieve all the information about the first food item, I should use the
following piece of code:
for x in myroot[0]:
print(x.text)
OUTPUT:
Idly
$2.5
Two idly’s with chutney
553
As you can see, the text information of the first item has been returned as the output. Now if you
want to display all the items with their particular price, you can make use of the get() method.
This method accesses the element’s attributes.
EXAMPLE:
for x in myroot.findall('food'):
item =x.find('item').text
price = x.find('price').text
print(item, price)
OUTPUT:
Idly $2.5
Paper Dosa $2.7
Upma $3.65
BisiBele Bath $4.50
Kesari Bath $1.95
The above output shows all the required items along with the price of each of them. Using
ElementTree, you can also modify the XML files.
Writing XML Documents:-
Using ElementTree
ElementTree is also great for writing data to XML files. The code below shows how to create an
XML file with the same structure as the file we used in the previous examples.
The steps are:
1. Create an element, which will act as our root element. In our case the tag for this element is "data".
2. Once we have our root element, we can create sub-elements by using the SubElement function. This
function has the syntax:
SubElement(parent, tag, attrib={}, **extra)
Here parent is the parent node to connect to, attrib is a dictionary containing the element
attributes, and extra are additional keyword arguments. This function returns an element
to us, which can be used to attach other sub-elements, as we do in the following lines by
passing items to the SubElement constructor.
3. Although we can add our attributes with the SubElement function, we can also use the
set() function, as we do in the following code. The element text is created with the text
property of the Element object.
4. In the last 3 lines of the code below we create a string out of the XML tree, and we write
that data to a file we open.
Example code:
Import xml.etree.cElementTree as ET
root = ET.Element("data")
doc = ET.SubElement(root,"food")
ET.SubElement(doc, "item", name="breakfast").text = "idly"
ET.SubElement(doc, "price").text = "25"
ET.SubElement(doc, "description").text = "Two idly's with chutney"
doc = ET.SubElement(root,"food")
ET.SubElement(doc, "item", name="breakfast").text = "Dosa"
ET.SubElement(doc, "price").text = "35"
ET.SubElement(doc, "description").text = "one dosa with chutney"
tree = ET.ElementTree(root)
tree.write("FILE3.xml")
Executing this code will result in a new file, "FILE3.xml", which
should be equivalent to the original "Sample.xml" file, at least
in terms of the XML data structure. You'll probably notice that
it the resulting string is only one line and contains no
indentation,