Chapter 7 – Information representation
and sharing – XML -10 Marks
7.1 XML documents, DTD
Definiton
Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding
documents in a format that is both human-readable and machine-readable. The design goals of
XML focus on simplicity, generality, and usability across the Internet. It is a textual data format
with strong support via Unicode for different human languages. Although the design of XML
focuses on documents, the language is widely used for the representation of arbitrary data
structures such as those used in web services.
XML stands for extensible Markup Language
XML is a markup language like HTML
XML is designed to store and transport data
XML is designed to be self-descriptive
Significance/ Importance/ Benefits of XML in Web:
1. Ease
Simplicity is the biggest advantage of using XML. Any computer can process the
information and it is simple to read and comprehend. XML follows the standards of W3C
and the market leaders in the software industry endorse it. Therefore, its openness is
something to reckon with.
2. No limitation of tags XML is not limited to the fixed set of tags. Whenever it is needed,
new tags can be developed.
3. Self-description In case of the customary databases, the data administrator sets up schemas
for maintaining data records. There is no need of such definitions with XML documents as
there are meta data with tags and other features. XML present a foundation for author
recognition and versioning at the basic level. Any XML tag can hold numerous
characteristics as in version or author.
4. Highly readable context information One of the biggest advantages of XML over the plain
text format of HTML is its context information. Attributes, Tags, and element structure are
present context information that can be utilized for interpreting the significance of content,
clever data mining, agents, creating latest possibilities for extremely competent search
engines, etc.
5. Content is important- not how it is presented XML’s motto is to elaborate the meaning of
the content and not the presentation of the same. If HTML stands for “how it appears” then
XML means “what it signifies and how it should appear.” To change and control the look
and feel of a document or a website created with XML, there is no need to alter the content
of the document. It is possible to easily render numerous presentations or views of the
similar content. XML is supportive to Unicode and multilingual documents, which is
essential for betterment of the applications as per the international standard of web
development.
6. Assists in data assessment and aggregation XML document structure is designed in such a
way that the documents can be efficiently assessed and aggregated part by part. Another
prolific advantage XML is its ability to feature any possible type of data. The data might
range from active components such as ACTIVEX and Java applets or multimedia data such
as video, image and sound.
Differences between XML and HTML
XML and HTML were designed with different goals:
XML is designed to carry data emphasizing on what type of data it is.
HTML is designed to display data emphasizing on how data looks
XML tags are not predefined like HTML tags.
HTML is a markup language whereas XML provides a framework for defining markup languages.
HTML is about displaying data,hence it is static whereas XML is about carrying
information,which makes it dynamic.
EXAMPLE :
XML code for a note is given below
XML documents
An XML document is a basic unit of XML information composed of elements and other markup in an
orderly package. An XML document can contains wide variety of data. For example, database of
numbers, numbers representing molecular structure or a mathematical equation.
XML Document Example
A simple document is shown in the following example −
<?xml version = "1.0"?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
The following image depicts the parts of XML document.
XML Elements
The XML elements are the basic building block of the XML document. It is used as a container
to store text elements, attributes, media objects etc. Every XML documents contain at least one
element whose scopes are delimited by start and end tags or in case of empty elements it is
delimited by an empty tag.
Syntax:
<element-name attributes> Contents...</element-name>
element-name: It is the name of element.
attributes: The attributes are used to define the XML element property and these attributes
are separated by white space. It associates the name with a value, which is a string of
characters.
Example:
1. name="Geeks"
Here, Geeks represents the value of attribute
Rules to define XML elements: There are some rules to create XML elements which are given
below:
An element an contain alphanumeric values or characters. But only three special characters
are required in the names these are hyphen, underscore and period.
Names are case sensitive. It means lower case letters have different meaning and upper case
characters have different meaning. For example address, Address, aDDress are different
names.
Both start and end tags for elements need to be same.
An element, which is a container, can contain text or elements
Empty Elements: An element in XML document which does not contains the content is known
as Empty Element. The basic syntax of empty element in XML as follows:
Example 1: Following is the example of an XML document describing the address of a college
student using XML elements.
1. <?xml version = “1.0”?>
2. <contactinfo>
3. <address category = “college”>
4. <name>G4G</name>
5. <College>Geeksforgeeks</College>
6. <mobile>2345456767</mobile>
7. </address>
8. </contactinfo>
9.
Output:
1. G4G
2. Geeksforgeeks
3. 2345456767
4.
Example 2:
1. <?xml version = "1.0"?>
2. <student>
3. <_personal_details = "Personal Details">
4. <name>xyz</name>
5. <father_name>abc</father_name>
6. </personal_details>
7. <edu_details = "Educational Details">
8. <hsc_perc>80%</hsc_perc>
9. <ssc_perc>98%</ssc_perc>
10. </edu_details>
11. </student>
Output:
1. xyz
2. abc
3. 80%
4. 98%
DTD:
DTD stands for Document Type Definition. It is a document that defines the structure of an
XML document. It is used to describe the attributes of the XML language precisely. It can be
classified into two types namely internal DTD and external DTD. It can be specified inside a
document or outside a document. DTD mainly checks the grammar and validity of an XML
document. It checks that an XML document has a valid structure or not.
Characteristics
It defines the compulsory and optional elements in the XML document.
It validates the structure of the XML document.
It check for the grammar of the XML document.
It describes the order in which the element occurs.
Advantages
We can define our own format for the XML files by DTD.
It helps in validation of XML file.
It provides us with a proper documentation.
It enables us to describe a XML document efficiently.
Disadvantages
DTDs are hard to read and maintain if they are large in size.
It is not object oriented.
The documentation support is limited.
DTD doesn’t support namespaces.
7.2 Stylesheet and transformation – XSLT
Definition
XSLT (eXtensible Stylesheet Language Transformations) is the recommended style sheet
language for XML.
XSLT is far more sophisticated than CSS. With XSLT you can add/remove elements and attributes
to or from the output file. You can also rearrange and sort elements, perform tests and make
decisions about which elements to hide and display, and a lot more.
XSLT uses XPath to find information in an XML document.
How XSLT Works
An XSLT stylesheet is used to define the transformation rules to be applied on the target XML
document. XSLT stylesheet is written in XML format. XSLT Processor takes the XSLT stylesheet
and applies the transformation rules on the target XML document and then it generates a
formatted document in the form of XML, HTML, or text format. This formatted document is then
utilized by XSLT formatter to generate the actual output which is to be displayed to the end-user.
Advantages
Here are the advantages of using XSLT −
Independent of programming. Transformations are written in a separate xsl file which is
again an XML document.
Output can be altered by simply modifying the transformations in xsl file. No need to
change any code. So Web designers can edit the stylesheet and can see the change in the
output quickly.
XSLT Example
We will use the following XML document:
Example 1
1. <?xml version="1.0" encoding="UTF-8"?>
2. <breakfast_menu>
3.
4. <food>
5. <name>Belgian Waffles</name>
6. <price>$5.95</price>
7. <description>Two of our famous Belgian Waffles with plenty of real
maple syrup</description>
8. <calories>650</calories>
9. </food>
10.
11. <food>
12. <name>Strawberry Belgian Waffles</name>
13. <price>$7.95</price>
14. <description>Light Belgian waffles covered with strawberries and
whipped cream</description>
15. <calories>900</calories>
16. </food>
17.
18. <food>
19. <name>Berry-Berry Belgian Waffles</name>
20. <price>$8.95</price>
21. <description>Light Belgian waffles covered with an assortment of fresh
berries and whipped cream</description>
22. <calories>900</calories>
23. </food>
24.
25. <food>
26. <name>French Toast</name>
27. <price>$4.50</price>
28. <description>Thick slices made from our homemade sourdough
bread</description>
29. <calories>600</calories>
30. </food>
31.
32. <food>
33. <name>Homestyle Breakfast</name>
34. <price>$6.95</price>
35. <description>Two eggs, bacon or sausage, toast, and our ever-popular
hash browns</description>
36. <calories>950</calories>
37. </food>
38.
39. </breakfast_menu>
Example 2
1. <?xml version="1.0" ?>
2. <persons>
3. <person username="JS1">
4. <name>John</name>
5. <family-name>Smith</family-name>
6. </person>
7. <person username="MI1">
8. <name>Morka</name>
9. <family-name>Ismincius</family-name>
10. </person>
11. </persons>
7.3 Information Syndication RSS
Recently, there has been an unprecedented caution regarding data privacy. With infamous leaks
and instances of phishing and spamming all around, no one wants to put their personal information
out there without restraint, in fear of being the next unfortunate target. This makes staying posted
with favored content on the massive expanse of the World Wide Web a daunting task. There
appears to be a trade-off between user control and unobstructed access. To gain access to a
preferred blog, which regularly posts about top travel destinations or let’s say the latest technical
trends, one is often asked to offer up personal information on a plate in exchange for a rudimentary
subscription service. One alternative to staying updated with rich content is via social media, but
then again that too does not advocate privacy.
It seems like its time to grab onto the steering wheels.
What is RSS?
RSS is an open method for delivering regularly changing web content. Many news-related sites,
weblogs, and other online publishers syndicate their content as an RSS Feed to whoever wants it.
Any time you want to retrieve the latest headlines from your favorite sites, you can access the
available RSS Feeds via a desktop RSS reader. You can also make an RSS Feed for your own
site if your content changes frequently.
In brief:
RSS is a protocol that provides an open method of syndicating and aggregating web
content.
RSS is a standard for publishing regular updates to web-based content.
RSS is a Syndication Standard based on a type of XML file that resides on an Internet
server.
RSS is an XML application, which conforms to the W3C's RDF specification and is
extensible via XML.
You can also download RSS Feeds from other sites to display the updated news items on
your site, or use a desktop or online reader to access your favorite RSS Feeds.
What does RSS stand for? It depends on what version of RSS you are using.
RSS Version 0.9 - Rich Site Summary
RSS Version 1.0 - RDF Site Summary
RSS Versions 2.0, 2.0.1, and 0.9x - Really Simple Syndication
What is RSS Feed?
RSS Feed is a text XML file that resides on an Internet server.
An RSS Feed file includes the basic information about a site (title, URL, description), plus
one or more item entries that include - at a minimum - a title (headline), a URL, and a
brief description of the linked content.
There are various flavors of RSS Feed depending on RSS Version. Another XML Feed
format is called ATOM.
RSS Feeds are registered with an RSS registry to make them more available to viewers
interested in your content area.
RSS Feeds can have links back to your website, which will result in a high traffic to your
site.
RSS Feeds are updated hourly (Associated Press and News Groups), some RSS Feeds are
updated daily, and others are updated weekly or irregularly.
How Does RSS Work?
This is how RSS works:
A website willing to publish its content using RSS creates one RSS Feed and keeps it on
a web server. RSS Feeds can be created manually or with software.
A website visitor will subscribe to read your RSS Feed. An RSS Feed will be read by an
RSS Feed reader.
The RSS Feed Reader reads the RSS Feed file and displays it. The RSS Reader displays
only new items from the RSS Feed.
The RSS Feed reader can be customized to show you content related to one or more RSS
Feeds and based on your own interest.
News Aggregators and Feed Readers
RSS Feed readers and news aggregators are essentially the same thing; they are a piece of
software. Both are used for viewing RSS Feeds. News aggregators are designed specifically to
view news-related Feeds but technically, they can read any Feeds.
Who can Use RSS? (Uses Of RSS)
RSS started out with the intent of distributing news-related headlines. The potential for RSS is
significantly larger and can be used anywhere in the world.
Consider using RSS for the following:
New Homes - Realtors can provide updated Feeds of new home listings on the market.
Job Openings - Placement firms and newspapers can provide a classified Feed of job
vacancies.
Auction Items - Auction vendors can provide Feeds containing items that have been
recently added to eBay or other auction sites.
Press Distribution - Listing of new releases.
Schools - Schools can relay homework assignments and quickly announce school
cancellations.
News & Announcements - Headlines, notices, and any list of announcements.
Entertainment - Listings of the latest TV programs or movies at local theatres.
RSS is growing in popularity. The reason is fairly simple. RSS is a free and easy way to promote
a site and its content without the need to advertise or create complicated content sharing
partnerships.
Benefits of RSS:
RSS forms a tunneled subscription service, handled solely by the client.
It exterminates the need to disclose personal information to multiple websites but delivers
controlled, automated and regular content, without any additional baggage.
Can be used to increase traffic on a website.
Example code
Below is a sample RSS document.
1. filter_none
2. brightness_5
3. <?xml version="1.0" encoding="UTF-8" ?>
4. <rss version="2.0">
5.
6. <channel>
7. <title>RSS title</title>
8. <link> https://mywebsitename/index.html </link>
9. <description>My Blog</description>
10. <item>
11. <title>My First Feed</title>
12. <link>http://mywebsitename/blog/article/1.html</link>
13. <description>My new article</description>
14. </item>
15. <item>
16. <title>My Second Feed</title>
17. <link>http://mywebsitename/blog/article/2.html</link>
18. <description>Another new article</description>
19. </item>
20. </channel>
21.
22. </rss>
Explanation of the code :
1. First comes the XML tag, its version and encoding scheme.
2. The following line marks the beginning of the RSS tag with its version in use. .
3. The next few lines show the channel tag, which marks the beginning of the RSS Feed. It
holds the title of the channel, a hyperlink to it and a description of the channel. .
4. Within the channel tag are defined one or more items which is essentially the content or
story, each with its own title, link and description. The channel can hold data in any form
– images, gifs, audio etc. Each has its own unique XML tag.
Once the XML is ready and validated, it is uploaded to the server. This allows a registered
aggregator to access the RSS document.The XML has to be constantly updated with new content.
This task is performed and managed by the website developers and owners. There also exist third
party automated RSS providers such as Bloggers and WordPress offering in-built automated RSS
services.
On the aggregator side, the newly updated RSS document is intercepted by a ‘RSS Reader’. The
RSS reader regularly checks the registered RSS Feeds for freshly brewed content. This reader is
presented as a user interface and can be built into a website, or installed on a device to be made
available to clients. There are several widely available RSS Feed Readers such as QuiteRSS and
FeedReader. The client can easily specify the Feed URLs the reader must look into. When needed,
the client can as easily opt out from this content delivery.
Example of XML code:
External DTD declaration:
To have the external DTD declaration in an XML document, we must include the reference to
the DTD file in the <!DOCTYPE> definition, as we have done in the following example.
1. <?xml version="1.0"?>
2. <!DOCTYPE beginnersbook SYSTEM "bb.dtd">
3. <beginnersbook>
4. <to>My Readers</to>
5. <from>Chaitanya</from>
6. <subject>A Message to my readers</subject>
7. <message>Welcome to beginnersbook.com</message>
8. </beginnersbook>
The <!DOCTYPE> definition in the above document contains the reference to “bb.dtd” file.
Here is the content of “bb.dtd” file that contains the DTD for above XML document –
1. <!ELEMENT beginnersbook (to,from,subject,message)>
2. <!ELEMENT to (#PCDATA)>
3. <!ELEMENT from (#PCDATA)>
4. <!ELEMENT subject (#PCDATA)>
5. <!ELEMENT message (#PCDATA)>
Internal DTD Declaration:
1. <?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
2. <!DOCTYPE address [
3. <!ELEMENT address (name,company,phone)>
4. <!ELEMENT name (#PCDATA)>
5. <!ELEMENT company (#PCDATA)>
6. <!ELEMENT phone (#PCDATA)>
7. ]>
8.
9. <address>
10. <name>Tanmay Patil</name>
11. <company>TutorialsPoint</company>
12. <phone>(011) 123-4567</phone>
13. </address>
DTD for above Document is:
1. <!ELEMENT address (name,company,phone)>
2. <!ELEMENT name (#PCDATA)>
3. <!ELEMENT company (#PCDATA)>
4. <!ELEMENT phone_no (#PCDATA)>
XML with Internal DTD declaration:
1. <?xml version="1.0"?>
2. <!-- XML DTD declaration starts here -->
3. <!DOCTYPE beginnersbook [
4. <!ELEMENT beginnersbook (to,from,subject,message)>
5. <!ELEMENT to (#PCDATA)>
6. <!ELEMENT from (#PCDATA)>
7. <!ELEMENT subject (#PCDATA)>
8. <!ELEMENT message (#PCDATA)>
9. ]>
10. <!-- XML DTD declaration ends here-->
11. <beginnersbook>
12. <to>My Readers</to>
13. <from>Chaitanya</from>
14. <subject>A Message to my readers</subject>
15. <message>Welcome to beginnersbook.com</message>
16. </beginnersbook>
Old Question:
2074 Ashwin
2068 Chaitra: