When To Store Data As XML ?
Philip Nelson
Lloyds Banking Group PLC
Agenda
Why Consider Storing As XML
Where XML Started Out
XML : Data In Transit Data At Rest
Experiences Before pureXML
Life After pureXML
Why Consider Storing as XML
Whats Wrong With Relational ?
Is XML The Answer ?
Lets consider the real world
Is Relational Wrong ?
NO
Still best for
Relatively simple and uniform structures
Non-evolving data structures
Typical data best in relational format -
Financial transactions
Reference data (e.g. actuarial rates)
Why Is Relational Ubiquitous ?
Traditional large scale business computing
Largely automating simple financial processes
Availability Of RDBMS Software
Many products / many platforms
Standards (e.g. SQL) help in update
Focus Of Education
Application development : SQL
Database design targets relational structures
Traditional (DB) World View
The World Is Not All Tabular
Only small percentage is naturally tabular
Much is unstructured
More is semi-structured
Some is hierarchical
When to Use XML
Volatile Data Structures
Naturally Hierarchical Data
Complete Business Objects
Sparse Attributes
Information Interchange
Volatile Data Structures
When database definitions change frequently
Not an excuse for poor database design
Often driven by external factors
Regulatory changes
Regular new product releases
Naturally Hierarchical Data
Some data in naturally hierarchical
Examples
Organisational structures
Bill of Materials
Often using relational tables for these
structures results in complex programming
and poor performance
Complete Business Objects
Collection of data used as a whole
Often represented as objects in application
Must be decomposed into and constructed
from many relational tables
Often this process is as expensive as the
real work done with the data
Sparse Attributes
In relational terms nullable columns
High percentage of null instances
Wasted storage (less with compression)
Unnecessary processing
Can't see the wood for the trees
Especially in BI contexts
Information Interchange
Traditional XML use case
But consider storing received data intact
Also store transmitted data intact
Useful as an audit trail
XMLs Heritage
SGML : dynamic information display (1980s)
W3C interest : solving internet problems
XML : first draft (1996)
XML : W3C recommendation (1998)
XML : First Uses
Information interchange
Became seen as a means of representing data in
transit
Many industry-specific dialects / schemas
produced to enable B2B processes
Web standards
Basis of, among other things, web services
Storing XML
Two initial approaches
Storage as LOBs
Every access requires parsing the textual format
Updates are full document replacements
Shredding to relational tables
XML is parsed and mapped to relational columns
Even simple XMLs need many tables
Result : large overheads !!!
Before pureXML : LOBs
Web services audit log
Key elements extracted before LOB storage
Only practically able to search on these few
elements
Useful information locked up in the LOB
eventually unlocked by moving to pureXML
Before pureXML : Shredding
Application Form from POS System
Shredded to 80 tables
Complex mapping documents
Shred time limits system throughput
With pureXML
Faster storage
Direct reads
Direct updates
Easier shredding (XMLTABLE)
Easier XML construction
PureXML : Faster Storage
Storing into XML column performs well
Even with validation as well as parsing
Faster than previous LOB storage solution
Even if some overhead
Massive benefits
Especially if high read to write ratio
PureXML : Direct Reads
Opens the XML content to full query access
In one case we were able to unlock the BI
potential of our ESB audit logs
Turning data exhaust into business benefit
PureXML : Direct Updates
XQuery update is different
But well worth learning
Key to treating XML as a first class citizen
Very efficient
PureXML : Easier Shredding
Often still need to get data into table format
Joining to existing relational data
Populating legacy systems
Presentation to users
XMLTABLE is king in this field
Easy to learn
Some basic concepts (e.g. setting the right
granularity) critical to success
Replaced old mapping document schemes
PureXML : Easier Construction
Old mapping document schemes for
construction replaced
SQL/XML construction functions are -
Standard
Easier to learn and maintain
Often matched with XQuery updates to -
Produce control totals and checksums
Perform sync with previous documents in series
Horses for Courses
In conclusion
Native XML storage is now an option to be
considered (we have the technology)
Not for everything : many data storage
requirements are better suitable to relational
Learn to know the signs of when XML is better
Other technologies (e.g. Big Data) will introduce
yet other data storage formats
The great thing is you can mix and match
Philip Nelson
(teamba@scotdb.com)