XML and Web Services
XML DOM & SAX
XML Processing - Parser to
Application Communication
Two dominant standards:
Document Object Model (DOM):
Tree-based model passes complete picture of
document to application at processing conclusion,
Java, JavaScript, IDL descriptions; Perl
implementation in independent development,
Simple API for XML (SAX):
Event-based model reads document to application
handlers,
Supported by nearly all Java XML parsers
2
R. LOGAMBIGAI, TA
December 8, 2016
DOM
DOM (Document Object Model)
It is an object model for representing XML documents
in your code.
Using DOM we can create or modify an XML document
programmatically.
The DOM defines theobjects and propertiesof all
document elements, and themethods(interface) to
access them.
.3
R. LOGAMBIGAI, TA
December 8, 2016
Parsing XML - The View from the
Application
<?xml?>
XML Document
Loads
document
Parses declarations
Builds DTD
Interprets document against DTD
May validate
May build DOM tree
May provide XSL or XLink Services
Application
4
R. LOGAMBIGAI, TA
December 8, 2016
DOM Levels
Core DOM - standard model for any structured document
XML DOM - standard model for XML documents
HTML DOM - standard model for HTML documents
R. LOGAMBIGAI, TA
December 8, 2016
XML DOM
The XML DOM is:
A standard object model for XML
A standard programming interface for XML
Platform- and language-independent
A W3C standard
The XML DOM defines theobjects and propertiesof all
XML elements, and themethods(interface) to access
them.
R. LOGAMBIGAI, TA
December 8, 2016
DOM Nodes
DOM, everything in an XML document is anode.
The DOM says:
The entire document is a document node
Every XML element is an element node
The text in the XML elements are text nodes
Every attribute is an attribute node
Comments are comment nodes
R. LOGAMBIGAI, TA
December 8, 2016
Generic Form
XML
Document
R. LOGAMBIGAI, TA
Parser
DOM
Application
Programme
December 8, 2016
Parent, Children and Siblings
The nodes in the node tree have a hierarchical
relationship to each other.
The terms parent, child, and sibling are used to describe
the relationships.
In a node tree,
The top node is called the root
Every node, except the root, has exactly one parent
node
A node can have any number of children
A leaf is a node with no children
Siblings are nodes with the same parent
9
R. LOGAMBIGAI, TA
December 8, 2016
XML Document
Information to be represented in the DOM
structure.
<?xml version="1.0" encoding="UTF-8"?>
<entry id="Baker2005">
<author>Mark Baker and Amy W. Apon and Clayton Ferner and Jeff
Brown</author>
<title>Emerging Grid Standards</title>
<journal>IEEE Computer</journal>
<year>2005</year>
<volume>38</volume>
<pages>43-50</pages>
<number>4</number>
</entry>
10
R. LOGAMBIGAI, TA
December 8, 2016
Information To Be Represented
Document
Attributes
id
<citation.xml>
Root Element
<entry>
<etc>
<author>
<title>
<journal>
11
R. LOGAMBIGAI, TA
<volume>
<year>
December 8, 2016
DOM Example
<?xml version=1.0?>
Node
<addressbook>
<person>
addressbook
Node
<name>John Doe</name>
<email>jdoe@yahoo.com</email>
</person>
<person>
<name>Jane Doe</name>
<email>jdoe@mail.com</email>
XML
Parser
Node
person
Node
Name=John Doe
Node
email=jdoe@yahoo.com
person
Node
Name=John Doe
Node
email=jdoe@yahoo.com
</person>
</addressbook>
12
R. LOGAMBIGAI, TA
December 8, 2016
DOM Representation
Document
Node
Document Root
<Parent>
NodeList
<Child id=123>Text here</Child>
Element
Node
<Parent>
</Parent>
NodeList
Element
Node
<Child>
NamedNodeMap
Attribute
Node
<id=123>
NodeList
Text CDATA
Node
13
R. LOGAMBIGAI, TA
Text here
December 8, 2016
Common DOM Methods
Node.getNodeType()- the type of the
underlying object, e.g.
Node.ELEMENT_NODE.
Node.getNodeName() - value of this node,
depending on its type, e.g. for elements its
tag name, for text nodes always string
#text.
Node.getFirstChild() and
Node.getLastChild()- the first or last child
of a given node.
Node.getNextSibling() and
Node.getPreviousSibling()- the next or
previous sibling of a given node.
Node.getAttributes()R. LOGAMBIGAI, TA
14
collection December 8, 2016
Common DOM methods (2)
Node.getNodeValue()- value of this node,
depending on its type, e.g. value of an
attribute but null in case of an element node.
Node.getChildNodes()- collection that
contains all children of this node.
Node.getParentNode()- parent of this node.
Element.getAttribute(name)- an attribute
value by name.
Element.getTagName()- name of the
element.
Element.getElementsByTagName()collection
of
all
descendant
Elements
with
a
R. LOGAMBIGAI, TA
December 8, 2016
15
given tag name.
Common DOM methods (3)
Element.setAttribute(name,value)- adds
a new attribute, if an attribute with that
name is already present in the element, its
value is changed.
Attr.getValue()- the value of the
attribute.
Attr.getName()- the name of this attribute.
Document.getDocumentElement()- allows
direct access to the child node that is the
root element of the document.
Document.createElement(tagName)creates an element of the type specified.
16
R. LOGAMBIGAI, TA
December 8, 2016
Advantages & Disadvantages
Advantage:
(1) It is good when random access to
widely
separated parts of a document is
required
(2) It supports both read and write
operations
Disadvantage:
17
(1) It is memory inefficient
R. LOGAMBIGAI, TA
December 8, 2016
(2) It seems complicated, although
Simple API for XML (SAX)
Event driven processing of XML documents.
Parser sends events to programmers code (start and
end of every component).
Programmer decides what to do with every event.
SAX parser does not create any objects at all, it
simply delivers events.
18
R. LOGAMBIGAI, TA
December 8, 2016
SAX features
SAX API acts like a data stream.
Stateless.
Events are not permanent.
Data not stored in memory.
Impossible to move backward in XML data.
Impossible to modify document structure.
Fastest and least memory intensive way of working
with XML.
19
R. LOGAMBIGAI, TA
December 8, 2016
Basic SAX events
startDocument receives notification of
the beginning of a document.
endDocument receives notification of the
end of a document.
startElement gives the name of the tag
and any attributes it might have.
endElement receives notification of the
end of an element.
characters parser will call this method to
report each chunk of character data.
20
R. LOGAMBIGAI, TA
December 8, 2016
Additional SAX events
ignorableWhitespace allows to react
(ignore) whitespace in element content.
warning reports conditions that are not
errors or fatal errors as defined by the XML
1.0 recommendation, e.g. if an element is
defined twice in a DTD.
error non-fatal error occurs when an
XML document fails a validity constraint.
fatalError a non-recoverable error e.g.
the violation of a well-formed-ness
constraint; the document is unusable after
the parser has invoked this method.
21
R. LOGAMBIGAI, TA
December 8, 2016
SAX events in a simple example
<?xml version="1.0"?>
startDocument()
<xmlExample>
startElement(): xmlExample
<heading>
This is a simple
example.
characters():
This is a simple example
endElement(): heading
</heading>
characters(): That is all folks
That is all folks.
endElement(): xmlExample
</xmlExample>
22
startElement(): heading
R. LOGAMBIGAI, TA
endDocument()
December 8, 2016
SAX2 Handlers Interfaces
ContentHandler - receives notification of
the logical content of a document
(startDocument, startElement,
characters etc.).
ErrorHandler - for XML processing errors
generates events (warning, error,
fatalError) instead of throwing exception
(this decision is up to the programmer).
DTDHandler - receives notification of
basic DTD-related events, reports notation
and unparsed entity declarations.
EntityResolver
handles the external
R. LOGAMBIGAI, TA
December 8, 2016
23
entities.
DefaultHandler class
Class
org.xml.sax.helpers.DefaultHandler:
Implements all four handle interfaces with
null methods,
Programmer can derive from
DefaultHandler his own class and pass its
instance to a parser,
Programmer can override only methods
responsible for some events and ignore the
rest.
24
R. LOGAMBIGAI, TA
December 8, 2016
How Does SAX work?
XML Document
SAX Objects
<?xml version=1.0?>
Parser
startDocument
<addressbook>
Parser
startElement
<name>John Doe</name>
Parser
startElement & characters
<email>jdoe@yahoo.com</email>
Parser
startElement & characters
</person>
Parser
endElement
<person>
Parser
startElement
<name>Jane Doe</name>
Parser
startElement & characters
<email>jdoe@mail.com</email>
Parser
startElement & characters
Parser
endElement
Parser
endElement & endDocument
<person>
</person>
</addressbook>
25
R. LOGAMBIGAI, TA
December 8, 2016
Advantages & Disadvantages
Advantage:
(1) It is simple
(2) It is memory efficient
(3) It works well in stream application
Disadvantage:
The data is broken into pieces and
clients never have all the information as
a whole unless they create their own
data structure
26
R. LOGAMBIGAI, TA
December 8, 2016
SAX vs. DOM
DOM
More information about
structure of the document,
Allows to create or modify
documents.
SAX
You need to use the
information in the document
only once,
Less memory use.
27
R. LOGAMBIGAI, TA
December 8, 2016
SAX vs. DOM
SAX Parser:
A SAX (SimpleAPI forXML) parser does not create any
internal structure. Instead, it takes the occurrences of
components of an input documentas events, and tells the client
what it reads as it reads through the input document
A SAX parser serves the client application always only with
pieces of the document at any given time.
A SAX parser, however, is much more space efficient in case of
a big input document (because it creates no internal structure).
Whats more, it runs faster and is easier to learn than DOM parser
because its API is really simple. But from the functionality point
of view, it provides a fewer functions, which means that the users
themselves have to take care of more, such as creating their own
data structures.
28
R. LOGAMBIGAI, TA
December 8, 2016
SAX vs. DOM
DOM Parser
A DOM (Document Object Model) parser creates a tree
structure in memory from an input document and then waits for
requests from client.
A DOM parser always serves the client application with the
entire document no matter how much is actually needed by the client.
A DOM parser is rich in functionality. It creates a DOM tree in
memory and allows you to access any part of the document repeatedly
and allows you to modify the DOM tree. But it is space inefficient
when the document is huge, and it takes a little bit longer to learn
how to work with it.
29
R. LOGAMBIGAI, TA
December 8, 2016
30
R. LOGAMBIGAI, TA
December 8, 2016