KEMBAR78
5.XML Processing | PDF | Xml | Document Object Model
0% found this document useful (0 votes)
46 views30 pages

5.XML Processing

The document discusses two dominant XML parsing standards: DOM and SAX. DOM builds a tree representation of the entire XML document in memory before providing it to the application. SAX is event-based and passes XML elements to the application as they are parsed without building an in-memory tree representation, making it more memory efficient than DOM.

Uploaded by

Nivethitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views30 pages

5.XML Processing

The document discusses two dominant XML parsing standards: DOM and SAX. DOM builds a tree representation of the entire XML document in memory before providing it to the application. SAX is event-based and passes XML elements to the application as they are parsed without building an in-memory tree representation, making it more memory efficient than DOM.

Uploaded by

Nivethitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

XML and Web Services

XML DOM & SAX

XML Processing - Parser to


Application Communication

Two dominant standards:

Document Object Model (DOM):


Tree-based model passes complete picture of
document to application at processing conclusion,
Java, JavaScript, IDL descriptions; Perl
implementation in independent development,
Simple API for XML (SAX):
Event-based model reads document to application
handlers,
Supported by nearly all Java XML parsers
2

R. LOGAMBIGAI, TA

December 8, 2016

DOM
DOM (Document Object Model)
It is an object model for representing XML documents
in your code.
Using DOM we can create or modify an XML document
programmatically.
The DOM defines theobjects and propertiesof all
document elements, and themethods(interface) to
access them.

.3

R. LOGAMBIGAI, TA

December 8, 2016

Parsing XML - The View from the


Application

<?xml?>

XML Document

Loads

document

Parses declarations

Builds DTD

Interprets document against DTD


May validate
May build DOM tree
May provide XSL or XLink Services

Application
4

R. LOGAMBIGAI, TA

December 8, 2016

DOM Levels
Core DOM - standard model for any structured document
XML DOM - standard model for XML documents
HTML DOM - standard model for HTML documents

R. LOGAMBIGAI, TA

December 8, 2016

XML DOM
The XML DOM is:
A standard object model for XML
A standard programming interface for XML
Platform- and language-independent
A W3C standard
The XML DOM defines theobjects and propertiesof all
XML elements, and themethods(interface) to access
them.

R. LOGAMBIGAI, TA

December 8, 2016

DOM Nodes

DOM, everything in an XML document is anode.


The DOM says:
The entire document is a document node
Every XML element is an element node
The text in the XML elements are text nodes
Every attribute is an attribute node
Comments are comment nodes

R. LOGAMBIGAI, TA

December 8, 2016

Generic Form

XML
Document

R. LOGAMBIGAI, TA

Parser

DOM

Application
Programme

December 8, 2016

Parent, Children and Siblings


The nodes in the node tree have a hierarchical
relationship to each other.
The terms parent, child, and sibling are used to describe
the relationships.
In a node tree,
The top node is called the root
Every node, except the root, has exactly one parent
node
A node can have any number of children
A leaf is a node with no children
Siblings are nodes with the same parent
9

R. LOGAMBIGAI, TA

December 8, 2016

XML Document
Information to be represented in the DOM
structure.
<?xml version="1.0" encoding="UTF-8"?>
<entry id="Baker2005">
<author>Mark Baker and Amy W. Apon and Clayton Ferner and Jeff
Brown</author>
<title>Emerging Grid Standards</title>
<journal>IEEE Computer</journal>
<year>2005</year>
<volume>38</volume>
<pages>43-50</pages>
<number>4</number>
</entry>

10

R. LOGAMBIGAI, TA

December 8, 2016

Information To Be Represented
Document
Attributes

id

<citation.xml>

Root Element

<entry>

<etc>
<author>

<title>
<journal>

11

R. LOGAMBIGAI, TA

<volume>
<year>

December 8, 2016

DOM Example
<?xml version=1.0?>
Node

<addressbook>
<person>

addressbook
Node

<name>John Doe</name>
<email>jdoe@yahoo.com</email>
</person>
<person>
<name>Jane Doe</name>
<email>jdoe@mail.com</email>

XML
Parser
Node

person
Node

Name=John Doe

Node

email=jdoe@yahoo.com

person
Node

Name=John Doe

Node

email=jdoe@yahoo.com

</person>
</addressbook>

12

R. LOGAMBIGAI, TA

December 8, 2016

DOM Representation
Document
Node

Document Root

<Parent>

NodeList

<Child id=123>Text here</Child>


Element
Node

<Parent>

</Parent>

NodeList
Element
Node

<Child>
NamedNodeMap
Attribute
Node

<id=123>

NodeList
Text CDATA
Node

13

R. LOGAMBIGAI, TA

Text here
December 8, 2016

Common DOM Methods


Node.getNodeType()- the type of the

underlying object, e.g.


Node.ELEMENT_NODE.
Node.getNodeName() - value of this node,
depending on its type, e.g. for elements its
tag name, for text nodes always string
#text.
Node.getFirstChild() and
Node.getLastChild()- the first or last child
of a given node.
Node.getNextSibling() and
Node.getPreviousSibling()- the next or
previous sibling of a given node.
Node.getAttributes()R. LOGAMBIGAI, TA
14
collection December 8, 2016

Common DOM methods (2)


Node.getNodeValue()- value of this node,

depending on its type, e.g. value of an


attribute but null in case of an element node.
Node.getChildNodes()- collection that
contains all children of this node.
Node.getParentNode()- parent of this node.
Element.getAttribute(name)- an attribute
value by name.
Element.getTagName()- name of the
element.
Element.getElementsByTagName()collection
of
all
descendant
Elements
with
a
R. LOGAMBIGAI, TA
December 8, 2016
15
given tag name.

Common DOM methods (3)


Element.setAttribute(name,value)- adds

a new attribute, if an attribute with that


name is already present in the element, its
value is changed.
Attr.getValue()- the value of the
attribute.
Attr.getName()- the name of this attribute.
Document.getDocumentElement()- allows
direct access to the child node that is the
root element of the document.
Document.createElement(tagName)creates an element of the type specified.
16

R. LOGAMBIGAI, TA

December 8, 2016

Advantages & Disadvantages


Advantage:

(1) It is good when random access to


widely
separated parts of a document is
required
(2) It supports both read and write
operations
Disadvantage:
17

(1) It is memory inefficient


R. LOGAMBIGAI, TA
December 8, 2016
(2) It seems complicated, although

Simple API for XML (SAX)


Event driven processing of XML documents.
Parser sends events to programmers code (start and

end of every component).


Programmer decides what to do with every event.
SAX parser does not create any objects at all, it
simply delivers events.

18

R. LOGAMBIGAI, TA

December 8, 2016

SAX features
SAX API acts like a data stream.
Stateless.
Events are not permanent.
Data not stored in memory.
Impossible to move backward in XML data.
Impossible to modify document structure.
Fastest and least memory intensive way of working

with XML.

19

R. LOGAMBIGAI, TA

December 8, 2016

Basic SAX events


startDocument receives notification of

the beginning of a document.


endDocument receives notification of the
end of a document.
startElement gives the name of the tag
and any attributes it might have.
endElement receives notification of the
end of an element.
characters parser will call this method to
report each chunk of character data.
20

R. LOGAMBIGAI, TA

December 8, 2016

Additional SAX events


ignorableWhitespace allows to react

(ignore) whitespace in element content.


warning reports conditions that are not
errors or fatal errors as defined by the XML
1.0 recommendation, e.g. if an element is
defined twice in a DTD.
error non-fatal error occurs when an
XML document fails a validity constraint.
fatalError a non-recoverable error e.g.
the violation of a well-formed-ness
constraint; the document is unusable after
the parser has invoked this method.
21

R. LOGAMBIGAI, TA

December 8, 2016

SAX events in a simple example

<?xml version="1.0"?>

startDocument()

<xmlExample>

startElement(): xmlExample

<heading>
This is a simple
example.

characters():

This is a simple example

endElement(): heading

</heading>

characters(): That is all folks

That is all folks.

endElement(): xmlExample

</xmlExample>

22

startElement(): heading

R. LOGAMBIGAI, TA

endDocument()

December 8, 2016

SAX2 Handlers Interfaces


ContentHandler - receives notification of

the logical content of a document


(startDocument, startElement,
characters etc.).
ErrorHandler - for XML processing errors
generates events (warning, error,
fatalError) instead of throwing exception
(this decision is up to the programmer).
DTDHandler - receives notification of
basic DTD-related events, reports notation
and unparsed entity declarations.
EntityResolver
handles the external
R. LOGAMBIGAI, TA
December 8, 2016
23
entities.

DefaultHandler class
Class

org.xml.sax.helpers.DefaultHandler:
Implements all four handle interfaces with
null methods,
Programmer can derive from
DefaultHandler his own class and pass its
instance to a parser,
Programmer can override only methods
responsible for some events and ignore the
rest.
24

R. LOGAMBIGAI, TA

December 8, 2016

How Does SAX work?


XML Document

SAX Objects

<?xml version=1.0?>

Parser

startDocument

<addressbook>

Parser

startElement

<name>John Doe</name>

Parser

startElement & characters

<email>jdoe@yahoo.com</email>

Parser

startElement & characters

</person>

Parser

endElement

<person>

Parser

startElement

<name>Jane Doe</name>

Parser

startElement & characters

<email>jdoe@mail.com</email>

Parser

startElement & characters

Parser

endElement

Parser

endElement & endDocument

<person>

</person>
</addressbook>

25

R. LOGAMBIGAI, TA

December 8, 2016

Advantages & Disadvantages


Advantage:

(1) It is simple
(2) It is memory efficient
(3) It works well in stream application
Disadvantage:
The data is broken into pieces and
clients never have all the information as
a whole unless they create their own
data structure
26

R. LOGAMBIGAI, TA

December 8, 2016

SAX vs. DOM


DOM

More information about

structure of the document,


Allows to create or modify
documents.

SAX

You need to use the

information in the document


only once,
Less memory use.

27

R. LOGAMBIGAI, TA

December 8, 2016

SAX vs. DOM


SAX Parser:

A SAX (SimpleAPI forXML) parser does not create any


internal structure. Instead, it takes the occurrences of
components of an input documentas events, and tells the client
what it reads as it reads through the input document

A SAX parser serves the client application always only with


pieces of the document at any given time.
A SAX parser, however, is much more space efficient in case of
a big input document (because it creates no internal structure).
Whats more, it runs faster and is easier to learn than DOM parser
because its API is really simple. But from the functionality point
of view, it provides a fewer functions, which means that the users
themselves have to take care of more, such as creating their own
data structures.
28

R. LOGAMBIGAI, TA

December 8, 2016

SAX vs. DOM


DOM Parser
A DOM (Document Object Model) parser creates a tree
structure in memory from an input document and then waits for
requests from client.

A DOM parser always serves the client application with the


entire document no matter how much is actually needed by the client.
A DOM parser is rich in functionality. It creates a DOM tree in
memory and allows you to access any part of the document repeatedly
and allows you to modify the DOM tree. But it is space inefficient
when the document is huge, and it takes a little bit longer to learn
how to work with it.

29

R. LOGAMBIGAI, TA

December 8, 2016

30

R. LOGAMBIGAI, TA

December 8, 2016

You might also like