DTD and XML, Part 1 Introduction
The purpose of this assignment was to: Learn what DTD is. Learn how and why to put constrains on an XML document by using a DTD. What is a DTD? A DTD(Document Type Definition) defines the the structure of a document with a list of allowed elements and attributes. Why should/could a DTD be used? There are several advantages to using DTDs that become very obvious as the size and complexity of the XML code increases. Because almost all non-trivial software that use XML benefit from a DTD, it's essential for document authors to understand how to write them.
There are two main reasons for XML authors to use DTDs for their XML documents:
Documentation. A developer can look at the DTD of a XML document and immediately understand it's structure. This makes it easy for independent groups to agree opon a common DTD for interchanging data.
Validation. The process of document validation involves passing an XML document through a XML parser that parses/reads the DTD and compares with the XML markup to ensure that elements appear in correct order, that mandatory elements and attributes are in place, and that no undefined elements or attributes have been inserted where they shouldn't have been.
Working with validated data makes life much easier for a developer. If data is known to be valid, it's completely predictable. There's no longer any need to clutter the code with error checks or assertions; if the document validates it can be taken for granted that the data will be there in the format it should be. DTD Declaration
A DTD can be declared as an internal reference (i.e. inline in your XML document), or as an external reference (points to a separate file). Internal DOCTYPE declaration
If a DTD is included directly in the XML document, a DOCTYPE definition with the following syntax should be used:
<!DOCTYPE root-element [element-declarations]>
Example of a XML document with an internal DTD declaration:
<?xml version="1.0"?> <!DOCTYPE message [ <!ELEMENT message (receiver,sender,subject,content)> <!ELEMENT receiver (#PCDATA)>
<!ELEMENT sender (#PCDATA)> <!ELEMENT subject (#PCDATA)> <!ELEMENT content (#PCDATA)> ]>
<message> <receiver >Buck</receiver> <sender>Lenny</sender> <subject>Welcome</subject> <content>Welcome Buck!</content> </message>
The DTD is interpreted by a XML parser like this:
!DOCTYPE message (second row) defines that this is message document .
!ELEMENT message (third row) defines the message element to have these four elements:receiver, sender, subject, content
!ELEMENT receiver (fourth row) defines the receiver element to be of the type "#PCDATA".
!ELEMENT sender (fifth row) defines the sender element to be of the type "#PCDATA".
!ELEMENT subject (sixth row) defines the subject element to be of the type "#PCDATA".
!ELEMENT content (seventh row) defines the content element to be of the type "#PCDATA" External DOCTYPE declaration
If the DTD is included from a separate .dtd file(external), a DOCTYPE definition with the following syntax should be used:
<!DOCTYPE root-element SYSTEM "URI/URL or System path to .dtd file">
or
<!DOCTYPE root-element PUBLIC "Path Description" "URI/URL or System path to .dtd file">
Same XML document as above, but now with an external DTD:
<?xml version="1.0"?> <!DOCTYPE message SYSTEM "message.dtd"> <message> <receiver >Buck</receiver> <sender>Lenny</sender> <subject>Welcome</subject> <content>Welcome Buck!</content> </message>
And this is a copy of the external .dtd file "message.dtd", containing the DTD:
<!ELEMENT message (receiver,sender,subject,content)> <!ELEMENT receiver (#PCDATA)>
<!ELEMENT sender (#PCDATA)> <!ELEMENT subject (#PCDATA)> <!ELEMENT content (#PCDATA)>
This was part 1 of the DTD and XML assignment. In part 2 you will learn about the components of XML documents seen from a DTD perspective, and how to use them for the markup declarations in the DTD.
DTD and XML, Part Two
Components of XML documents from a DTD perspective
From a DTD perspective, XML documents are constructed by these five components: Elements Attributes Entities PCDATA CDATA Elements
Elements are the main components of XML documents as well as HTML and XHTML documents.
"message", "subject", "sender","receiver" and "content"from the message example in DTD and XML, Part 1. are examples of XML elements.
Elements can be empty, have text, or other elements as their content. Declaring an Element
XML elements are declared with a DTD element declaration inside the DTD.
This is the syntax for an element declaration:
<!ELEMENT element-name content-keyword> or <!ELEMENT element-name (element-content)> Empty elements
Elements types with empty content are declared using the content keyword EMPTY:
<!ELEMENT element-name EMPTY>
For example:
<!ELEMENT break EMPTY>
In XML document: <break />
As the example show empty elements have no content between it's start tag and it's end tag. This is referred to as having empty content.
The "img" "br" elements are examples of empty elements from HTML and XHTML. Elements with only pure text
Element types with text (character data) only are declared using the content keyword #PCDATA inside round brackets, like this:
<!ELEMENT element-name (#PCDATA)> example:
<!ELEMENT sender (#PCDATA)> Elements types with any content
Element types declared using the keyword ANY , have no constraints on its content. It may contain subelements of any type and number.
<!ELEMENT element-name ANY> example: <!ELEMENT message ANY> Element types with child elements
Element types with one or more child elements are declared in a sequence using the name of the child elements inside round brackets:
<!ELEMENT element-name (child-element-name)> or <!ELEMENT element-name (child-element-name,another-child-element-name,.....)>
example: <!ELEMENT message (receiver,sender,subject,content)>
When child elements(subelements) are declared in a sequence separated by commas, the children must occur in the same sequence in the XML document. In a complete declaration, the children, and all those childrens children... and so on...must be declared as well.
The complete declaration of the "message" element would be:
<!ELEMENT message (receiver,sender,subject,content)> <!ELEMENT receiver (#PCDATA)>
<!ELEMENT sender (#PCDATA)> <!ELEMENT subject (#PCDATA)> <!ELEMENT content (#PCDATA)> Element types that can occur only once
<!ELEMENT element-name (child-name)>
example:<!ELEMENT message (content)>
In the example declaration above the child element(i.e. the content element)is constrained to occur only once inside the "message" element. Element types that must occur at least once
<!ELEMENT element-name (child-name+)> example: <!ELEMENT message (content+)>
The + sign in the example above declares that the child element(i.e. the content element) must occur at least once inside the "message" element. (I.e. a one to many constrain) Element types that doesn't have to occur, but could occur many times
<!ELEMENT element-name (child-name*)>
example:<!ELEMENT message (content*)>
The * sign in the example above declares that the child element(i.e. the content element) doesn't have to occur - but can occur many times - within the "message" element. (I.e. a zero to many constrain) Element types that doesn't have to occur, but could occur one time
<!ELEMENT element-name (child-name?)>
example:
<!ELEMENT message (content?)>
The ? sign in the example above declares that the child element(i.e. the content element) can occur zero or one time within the "message" element. Element types with either this or that content
example: <!ELEMENT message (receiver,sender,subject,(content|announcement))>
The example above declares that the "message" element must contain a "receiver" element, a "sender"element, a "subject" element, and either a "content" element or a "announcement" element. Element types with mixed content
example: <!ELEMENT message (#PCDATA|receiver|sender|subject|content)*>
The example above declares that the "message" element can contain zero or more occurrences of text content(parsed character), "receiver elements", "sender elements", "subject elements", or "content" elements. Attributes
An attribute is used to give extra information about an element.
Attributes are inserted within an elements start tag. An Attribute have a attribute name and an attribute value. The img element in HTML and XHTML, for example, use the src attribute to give extra information:
<img src="hacker.jpg" />.
The element name is "img". The attribute name is "src", and the attribute value is "hacker.jpg". The element itself, however, is empty.(has empty content) In XML, XHTML and stricter versions of HTML empty elements are closed by a " /" in the end tag of the element. Entities
Entities are variables for defining shortcuts/macros to text.
They can be declared as: Internal Entities(shortcuts/macros for associating an arbitrary piece of text) External Entities(incorporation of content from other files, i.e. XML files)
You probably know the HTML entity reference: " "(No Breaking SPace), which is used in HTML to insert an extra space in a a document. Entities like " " are expanded when a document is parsed by a parser.
You can define your own entities within the DTD, but some common entities are already definded in XML:
<!ENTITY lt <!ENTITY gt
"&"> ">">
<!ENTITY amp "&"> <!ENTITY apos "'"> <!ENTITY quot """> PCDATA
PCDATA stands for Parsed Character DATA.
Character data is the text between the start tag end tag of an XML element. This text will be parsed by a parser. CDATA
CDATA is character data(text) that will NOT be parsed by a parser. Assignment Description
Put constrains on the cv-template.xml file from XML Basics Assignment by using a DTD. The document should be well-formed(have correct XML syntax) and validate(follow the rules set up in the DTD). My solution, Assignment Files cv-template.xml To validate cv-template.xml you can do so here with the W3C Markup Validator: W3C Validation Service