Unit IV XML Databases Adt
Unit IV XML Databases Adt
Structured data:
Structured data is data whose elements are addressable for effective analysis.
It has been organized into a formatted repository that is typically a database. It
concerns all data which can be stored in database SQL in a table with rows and
columns. They have relational keys and can easily be mapped into pre-designed
fields. Today, those data are most processed in the development and simplest way
to manage information. Example: Relational data.
Semi-Structured data:
Semi-structured data is information that does not reside in a relational
database but that has some organizational properties that make it easier to analyze.
With some processes, you can store them in the relation database (it could be very
hard for some kind of semi- structured data), but Semi-structured exist to ease space.
Example: XML data.
Unstructured data:
Unstructured data is data which is not organized in a predefined manner or
does not have a predefined data model, thus it is not a good fit for a mainstream
relational database. So for Unstructured data, there are alternative platforms for
storing and managing, it is increasingly prevalent in IT systems and is used by
organizations in a variety of business intelligence and analytics applications.
Example: Word, PDF, Text, Media logs.
XML Attributes
XML elements can have attributes. By the use of attributes we can add
the information about the element.
<book publisher='Tata McGraw Hill'></book>
Metadata should be stored as attribute and data should be stored as elements
<book>
<book category="computer">
<author> A & B </author>
</book>
XML Comments
XML comments are just like HTML comments. We know that the comments
are used to make codes more understandable to other developers.
An XML comment should be written as:
<!-- Write your comment-->
XML Validation
A well formed XML document can be validated against DTD or Schema. A
well- formed XML document is an XML document with correct syntax. It is very
necessary to know about valid XML documents before knowing XML validation.
Valid XML document
● It must be well formed (satisfy all the basic syntax condition)
● It should be behave according to predefined DTD or XML schema
Description of DTD:
● <!DOCTYPE employee : It defines that the root element of the
document is employee.
● <!ELEMENT employee: It defines that the employee element contains 3
elements "firstname, lastname and email".
● <!ELEMENT firstname: It defines that the firstname element is #PCDATA
typed. (parse-able data type).
● <!ELEMENT lastname: It defines that the lastname element is #PCDATA
typed. (parse-able data type).
● <!ELEMENT email: It defines that the email element is #PCDATA typed.
(parse- able data type).
XML DTD
A DTD defines the legal elements of an XML document. In simple words we can
say that a DTD defines the document structure with a list of legal elements and
attributes. XML schema is a XML based alternative to DTD. Actually DTD and XML
schema both are used to form a well formed XML document. We should avoid errors in
XML documents because they will stop the XML programs.
XML schema
It is defined as an XML language. It uses namespaces to allow for reuses of
existing definitions. It supports a large number of built in data types and definition of
derived data types
Valid and well-formed XML document with External DTD
Let's take an example of well-formed and valid XML document. It
follows all the rules of DTD.
employee.xml
<?xml version="1.0"?>
<!DOCTYPE employee SYSTEM "employee.dtd">
<employee>
<firstname>vimal</firstname>
<lastname>jaiswal</lastname>
<email>vimal@javatpoint.com</email>
</employee>
In the above example, the DOCTYPE declaration refers to an external DTD
file. The content of the file is shown in below paragraph.
employee.dtd
<!ELEMENT employee (firstname,lastname,email)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT email (#PCDATA)>
XML CSS
Purpose of CSS in XML
CSS (Cascading Style Sheets) can be used to add style and display
information to an XML document. It can format the whole XML document.
How to link XML file with CSS
To link XML files with CSS, you should use the following syntax:
<?xml-stylesheet type="text/css" href="cssemployee.css"?>
employee.dtd
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="cssemployee.css"?>
<!DOCTYPE employee SYSTEM "employee.dtd">
<employee>
<firstname>vimal</firstname>
<lastname>jaiswal</lastname>
<email>vimal@javatpoint.com</email>
</employee>
In the above CDATA example, CDATA is used just after the element
employee to make the data/text unparsed, so it will give the value of
employee:
<firstname>vimal</firstname>
<lastname>jaiswal</lastname>
<email>vimal@javatpoint.com</email>
PCDATA
PCDATA: (Parsed Character Data): XML parsers are used to parse all the text in
an XML document. PCDATA stands for Parsed Character data. PCDATA is the text that
will be parsed by a parser. Tags inside the PCDATA will be treated as markup and entities
will be expanded. In other words you can say that a parsed character data means the XML
parser examines the data and ensures that it doesn't contain an entity if it contains that
will be replaced.
Let's take an example:
<?xml version="1.0"?>
<!DOCTYPE employee SYSTEM "employee.dtd">
<employee>
<firstname>vimal</firstname>
<lastname>jaiswal</lastname>
<email>vimal@javatpoint.com</email>
</employee>
In the above example, the employee element contains 3 more elements
'firstname', 'lastname', and 'email', so it parses further to get the data/text of
firstname, lastname and email to give the value of employee as: vimaljaiswal
vimal@javatpoint.com
XML Schema:
XML schema is a language which is used for expressing constraints about XML
documents. There are so many schema languages which are used now a days for
example Relax- NG and XSD (XML schema definition). An XML schema is used to
define the structure of an XML document. It is like DTD but provides more control on
XML structure.
Example:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.javatpoint.com" xmlns="http://www.javatpoint.com"
elementFormDefault="qualified">
<xs:element name="employee">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
<xs:element name="email" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Description of XML Schema:
<xs:element name="employee"> : It defines the element name employee.
<xs:complexType> : It defines that the element 'employee' is complex type.
<xs:sequence> : It defines that the complex type is a sequence of elements.
<xs:element name="firstname" type="xs:string"/> : It defines that the element
'firstname' is of string/text type.
<xs:element name="lastname" type="xs:string"/> : It defines that the element
'lastname' is of string/text type.
<xs:element name="email" type="xs:string"/> : It defines that the element 'email' is of
string/text type.
XML Schema Data types:
There are two types of data types in XML schema.
1. SimpleType
2. ComplexType
SimpleType
The simpleType allows you to have text-based elements. It contains less attributes,
child elements, and cannot be left empty.
Simple type element is used only in the context of the text. Some of the predefined
simple types are: xs:integer, xs:boolean, xs:string, xs:date. For example −
<xs:element name = "phone_number" type = "xs:int" />
ComplexType
The complexType allows you to hold multiple attributes and elements. It can contain
additional sub elements and can be left empty.
A complex type is a container for other element definitions. This allows you to specify
which child elements an element can contain and to provide some structure within
your XML documents. For example −
<xs:element name = "Address">
<xs:complexType>
<xs:sequence>
<xs:element name = "name" type = "xs:string" />
<xs:element name = "company" type = "xs:string" />
<xs:element name = "phone" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
In the above example, Address element consists of child elements. This is a container
for other <xs:element> definitions, that allows to build a simple hierarchy of elements
in the XML document.
Global Types
With the global type, you can define a single type in your document, which can be
used by all other references. For example, suppose you want to generalize
the person and company for different addresses of the company. In such case, you can
define a general type as follows −
<xs:element name = "AddressType">
<xs:complexType>
<xs:sequence>
<xs:element name = "name" type = "xs:string" />
<xs:element name = "company" type = "xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
Now let us use this type in our example as follows −
<xs:element name = "Address1">
<xs:complexType>
<xs:sequence>
<xs:element name = "address" type = "AddressType" />
<xs:element name = "phone1" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name = "Address2">
<xs:complexType>
<xs:sequence>
<xs:element name = "address" type = "AddressType" />
<xs:element name = "phone2" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
Attributes
Attributes in XSD provide extra information within an element. Attributes
have name and type property as shown below −
<xs:attribute name = "x" type = "y"/>
DTD vs XSD
There are many differences between DTD (Document Type Definition) and XSD
(XML Schema Definition). In short, DTD provides less control on XML structure
whereas XSD (XML schema) provides more control.
Sl.No DTD XSD
1. DTD stands for Document Type XSD stands for XML Schema
Definition. Definition.
2. DTDs are derived from SGML XSDs are written in XML.
syntax.
3. DTD doesn't support datatypes. XSD supports datatypes for
elements and attributes.
4. DTD doesn't support namespace. XSD supports namespace.
5. DTD doesn't define order for child XSD defines order for child
elements. elements.
6. DTD is not extensible. XSD is extensible.
7. DTD is not simple to learn. XSD is simple to learn because
you don't need to learn a new
language.
8. DTD provides less control on XML XSD provides more control on
structure. XML structure.
XML Database:
XML database is a data persistence software system used for storing the huge
amount of information in XML format. It provides a secure place to store XML
documents. You can query your stored data by using XQuery, export and serialize
into desired format. XML databases are usually associated with document-oriented
databases.
1. XML-enable Database:
XML-enable database works just like a relational database. It is like an
extension provided for the conversion of XML documents. In this database, data is
stored in table, in the form of rows and columns.
Symbol Description
Selects nodes in the document from the current node that match the
//
selection no matter where they are
XPath specification specifies seven types of nodes which can be the output of
execution of the XPath expression.
Root
Element
Text
Attribute
Comment
Processing Instruction
Namespace
We know that XPath uses a path expression to select nodes or a list of nodes
from an XML document. A list of useful paths and expression to select any node/
list of nodes from an XML document:
XPath expression in details covering common types of Nodes, XPath defines
and handles.
Root
1
Root element node of an XML Document.
Element
2
Element node.
Text
3
Text of an element node.
Attribute
4
Attribute of an element node.
Comment
5
Comment
XPath Operators
XPath defines operators and functions on nodes. An XPath expression returns either a
node-set, a string, a Boolean, or a number.
A list of operators used in XPath expression:
XPath Expression Example
Let's take an example to see the usage of XPath expressions. Here, we use an xml
file "employee.xml" and a stylesheet for that xml file named "employee.xsl". The XSL file
uses the XPath expressions under the select attribute of various XSL tags to fetch values
of id, firstname, lastname, nickname and salary of each employee node.
Employee.xml
<?xml version = "1.0"?>
<?xml-stylesheet type = "text/xsl" href = "employee.xsl"?>
<class>
<employee id = "001">
<firstname>Aryan</firstname>
<lastname>Gupta</lastname>
<nickname>Raju</nickname>
<salary>30000</salary>
</employee>
<employee id = "024">
<firstname>Sara</firstname>
<lastname>Khan</lastname>
<nickname>Zoya</nickname>
<salary>25000</salary>
</employee>
<employee id = "056">
<firstname>Peter</firstname>
<lastname>Symon</lastname>
<nickname>John</nickname>
<salary>10000</salary>
</employee>
</class>
Employee.xsl
<?xml version = "1.0" encoding = "UTF-8"?>
<xsl:stylesheet version = "1.0">
xmlns:xsl = "http://www.w3.org/1999/XSL/Transform">
<xsl:template match = "/">
<html>
<body>
<h2> Employees</h2>
<table border = "1>
<tr bgcolor = "pink">
<th> ID</th>
<th> First Name</th>
<th> Last Name</th>
<th> Nick Name</th>
<th> Salary</th>
</tr>
<xsl:for-each select = "class/employee">
<tr>
<td> <xsl:value-of select = "@id"/> </td>
<td> <xsl:value-of select = "firstname"/> </td>
<td> <xsl:value-of select = "lastname"/> </td>
<td> <xsl:value-of select = "nickname"/> </td>
<td> <xsl:value-of select = "salary"/> </td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
XQuery:
XQuery is a functional query language used to retrieve information stored in XML
format. It is the same as for XML what SQL is for databases. It was designed to query
XML data. XQuery is built on XPath expressions. It is a W3C recommendation which is
supported by all major databases.
What does it do
XQuery is a functional language which is responsible for finding and extracting
elements and attributes from XML documents. It can be used for following things:
● To extract information to use in a web service.
● To generate summary reports.
● To transform XML data to XHTML.
XQuery Features:
There are many features of XQuery query language. A list of top features are given
below:
● XQuery is a functional language. It is used to retrieve and query XML based data.
● XQuery is an expression-oriented programming language with a simple type
system.
● XQuery is analogous to SQL. For example: SQL is a query language for databases,
same as XQuery is a query language for XML.
● XQuery is XPath based and uses XPath expressions to navigate through XML
documents.
● XQuery is a W3C standard and universally supported by all major databases.
Advantages of XQuery:
XQuery can be used to retrieve both hierarchal and tabular data.
XQuery can also be used to query tree and graphical structures.
XQuery can be used to build web pages.
XQuery can be used to query web pages.
XQuery is best for XML-based databases and object-based databases. Object databases are
much more flexible and powerful than purely tabular databases.
XQuery can be used to transform XML documents into XHTML documents.
XQuery Environment Setup
Let's see how to create a local development environment. Here we are using the jar
file of the Saxon XQuery processor. The Java-based Saxon XQuery processor is used to
test the ".xqy" file, a file containing XQuery expression against our sample XML
document. You need to load Saxon XQuery processor jar files to run the java application.
For the eclipse project, add build-path to these jar files. Or, if you are running java using
command prompt, you need to set classpath to these jar files or put these jar files inside the
JRE/lib/ext directory.
How to Set CLASSPATH in Windows Using Command
Prompt
Type the following command in your Command Prompt and press enter.
1.set CLASSPATH=%CLASSPATH%;C:\Program Files\Java\jre1.8\rt.jar;
courses.xqy
for $x in doc("courses.xml")/courses/course
where $x/fees>5000
return $x/title
This example will display the title elements of the courses whose fees are greater
than 5000.
Create a Java based XQuery executor program to read the courses.xqy, pass it to the
XQuery expression processor, and execute the expression. After that the result will be
displayed.
XQueryTester.java
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import javax.xml.xquery.XQConnection;
import javax.xml.xquery.XQDataSource;
import javax.xml.xquery.XQException;
import javax.xml.xquery.XQPreparedExpression;
import javax.xml.xquery.XQResultSequence;
import com.saxonica.xqj.SaxonXQDataSource;
public class XQueryTester
{
public static void main(String[] args)
{
try
{
execute();
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
catch (XQException e)
{
e.printStackTrace();
}
}
private static void execute() throws FileNotFoundException, XQException
{
InputStream inputStream = new FileInputStream(new File("courses.xqy"));
XQDataSource ds = new SaxonXQDataSource();
XQConnection conn = ds.getConnection();
XQPreparedExpression exp = conn.prepareExpression(inputStream);
XQResultSequence result = exp.executeQuery();
while (result.next())
{
System.out.println(result.getItemAsString(null));
}
}
}
Compile:
javac XQueryTester.java
Execute:
javaXQueryTester
XQuery FLWOR
FLWOR is an acronym which stands for "For, Let, Where, Order by, Return".
● For - It is used to select a sequence of nodes.
● Let - It is used to bind a sequence to a variable.
● Where - It is used to filter the nodes.
● Order by - It is used to sort the nodes.
● Return - It is used to specify what to return (gets evaluated once for every node).
Let's take an XML document having the information on the collection of courses.
We will use a FLWOR expression to retrieve the titles of those courses whose fees are
greater than 2000.
courses.xml
<?xml version="1.0" encoding="UTF-8"?>
<courses>
<course category="JAVA">
<title lang="en">Learn Java in 3 Months.</title>
<trainer>Sonoo Jaiswal</trainer>
<year>2008</year>
<fees>10000.00</fees>
</course>
<course category="Dot Net">
<title lang="en">Learn Dot Net in 3 Months.</title>
<trainer>Vicky Kaushal</trainer>
<year>2008</year>
<fees>10000.00</fees>
</course>
<course category="C">
<title lang="en">Learn C in 2 Months.</title>
<trainer>Ramesh Kumar</trainer>
<year>2014</year>
<fees>3000.00</fees>
</course>
<course category="XML">
<title lang="en">Learn XML in 2 Months.</title>
<trainer>Ajeet Kumar</trainer>
<year>2015</year>
<fees>4000.00</fees>
</course>
</courses>
Let's take the Xquery document named "courses.xqy" that contains the query
expression to be executed on the above XML document.
courses.xqy
let $courses := (doc("courses.xml")/courses/course)
return <results>
{
for $x in $courses
where $x/fees>2000
order by $x/fees
return $x/title
}
</results>
Create a Java based XQuery executor program to read the courses.xqy, pass it to the
XQuery expression processor, and execute the expression. After that the result will be
displayed.
XQueryTester.java
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import javax.xml.xquery.XQConnection;
import javax.xml.xquery.XQDataSource;
import javax.xml.xquery.XQException;
import javax.xml.xquery.XQPreparedExpression;
import javax.xml.xquery.XQResultSequence;
import com.saxonica.xqj.SaxonXQDataSource;
public class XQueryTester
{
public static void main(String[] args)
{
try
{
execute();
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
catch (XQException e)
{
e.printStackTrace();
}
}
private static void execute() throws FileNotFoundException, XQException
{
InputStream inputStream = new FileInputStream(new File("courses.xqy"));
XQDataSource ds = new SaxonXQDataSource();
XQConnection conn = ds.getConnection();
XQPreparedExpression exp = conn.prepareExpression(inputStream);
XQResultSequence result = exp.executeQuery();
while (result.next())
{
System.out.println(result.getItemAsString(null));
}
}
}
Here, we use three different types of XQuery statements that will display the same
result having fees greater than 2000.
Execute XQuery against XML
● Put the above three files to the same location. We put them on the desktop in a
folder named XQuery3.
● Compile XQueryTester.java using the console. You must have JDK 1.5 or later
installed on your computer and classpaths are configured.
XQuery vs XPath: