Identification and description
Full name
Extensible Markup Language (XML)
Description
Extensible Markup Language (XML) is a simple, very flexible text
format derived from SGML (ISO 8879). XML documents fall into
two broad categories: data-centric and document-centric. Datacentric documents are those where XML is used as a data
transport. Examples include sales orders, patient records,
directory entries, and metadata records. One significant use of
data-centric XML is for manifests (lists) of digital content; another
is for metadata embedded into digital content files. Documentcentric documents are those in which XML is used for its SGMLlike capabilities, reflecting the structure of particular classes of
documents, such as books with chapters, user manuals,
newsfeeds and articles incorporating explicit metadata in addition
to the text. An XML document's markup structure can be defined
by a schema language and validated against a definition in that
language. The initial, and as of 2008, most widely used schema
languages are the Document Type Definition (DTD) language and
W3C XML Schema. Other schema languages exist, including RDF
and RELAX-NG.
Production Can be used as initial, middle, or final-state format.
phase
Relationship to other formats
Has
subtype
XML_1_0, XML (Extensible Markup Language) 1.0
Has
subtype
XML_1_1, XML (Extensible Markup Language) 1.1
Has
subtype
XML_DTD, Document Type Definition
Has
subtype
XML_SCHEMA, W3C XML Schema Language
Local use
LC
experience
or existing
holdings
Used by LC to represent metadata records (including
MARC bibliographic and authority records, MODS,
METS) for web-compatible interchange, in particular
using the Open Archives Initiative Protocol for Metadata
Harvesting and SRU (Search/Retrieval via URL).
LC
preference
May be a preferred format for textual content, metadata
records, or as a wrapper format for complex digital
objects if conformant to an appropriate standard or
agreed DTD or schema that can be used for technical
validation. LC will express preferences based on specific
DTDs, W3C XML Schema instances, or instance
documents in other schema languages for defining XMLbased formats. LC will prefer XML that represents the
structure of documents rather than layout.
Sustainability factors
Disclosure
Open standard. Developed by W3C (World Wide
Web Consortium). To be useful for interoperability
or long-term content preservation, an XML
document must be associated with a schema
specification for the elements and tags it contains.
Such schema specifications
(see XML_DTD and XML_XSD) must also be
disclosed.
Documentation
Adoption
Maintained by W3C [http://www.w3.org/XML/].
Specifications for the two versions as of 2008 are
at Extensible Markup Language (XML)
1.0 and Extensible Markup Language (XML) 1.1.
Very widely adopted as the basis for interchange
of documents and data over the Web. Many
generic tools exist, including free and open source
software. Major software vendors have all
incorporated support for XML in some form.
Licensing and
None
Transparency
XML is human-readable and designed for
straightforward automatic parsing. For the contents
to be understood, a well-documented DTD, XML
Schema, or other specification is needed. Humancomprehensible element tags are advantageous
for transparency.
Self-documentation
XML is widely used as a syntax for metadata, and
metadata for all purposes can be embedded in
XML documents with appropriate schema
specifications.
External dependencies
None
Technical protection
considerations
None
Quality and funct
Normal
rendering
XML can represent all UNICODE characters, with
UTF-8 being the default character encoding. XML
tagging offers potential for explicitly representing
logical structure of text, such as paragraphs and
headings, and character emphasis (bold, italics,
etc.). Effective support for normal rendering is
dependent on an appropriate DTD or schema
specification.
Integrity of
document
structure
XML is ideal for representing document structure.
Integrity of
layout and
display
For textual content, best practice is to have the
XML represent the logical document structure
and use stylesheets to render the text in a form
appropriate for the end user.
Support for
mathematics,
formulae, etc.
Requires specialized markup (e.g., MathML) and
corresponding rendering engine. Scholars in
many scientific disciplines are not satisfied with
the performance of such rendering engines.
Functionality Depends on particular DTD or schema
beyond normal specification.
rendering
ionality factors
Text
File type signifiers
Tag
Value
Filename xml
extension
Note
Common practice for XML document
instances is to use the .xml extension.
The particular schema or DTD should be
declared within the document. Some
schemas specify the use of different file
extensions.
Internet
Media
Type
Text/xml
If an XML document is readable by
Application/ casual users, text/xml is preferred.
xml
See RFC 3023 for further details.
Magic
See note.
numbers
Although no byte sequences can be
counted on to always be present, XML
MIME entities in ASCII-compatible
charsets (including UTF-8) often begin
with hexadecimal 3C 3F 78 6D 6C ("<?
xml"), and those in UTF-16 often begin
with hexadecimal FE FF 00 3C 00 3F 00
78 00 6D 00 6C or FF FE 3C 00 3F 00 78
00 6D 00 6C 00 (the Byte Order Mark
(BOM) followed by "<?xml"). See RFC
3023for further details.
Notes
Genera
l
The original design goals for XML were:
1. XML shall be straightforwardly usable over the
Internet.
2. XML shall support a wide variety of applications.
3. XML shall be compatible with SGML.
4. It shall be easy to write programs which process
XML documents.
5. The number of optional features in XML is to be kept
to the absolute minimum, ideally zero.
6. XML documents should be human-legible and
reasonably clear.
7. The XML design should be prepared quickly.
8. The design of XML shall be formal and concise.
9. XML documents shall be easy to create.
10. Terseness in XML markup is of minimal
importance.
History
"XML is primarily intended to meet the requirements of
large-scale Web content providers for industry-specific
markup, vendor-neutral data exchange, media-independent
publishing, one-on-one marketing, workflow management in
collaborative authoring environments, and the processing of
Web documents by intelligent clients. It is also expected to
find use in certain metadata applications. XML is fully
internationalized for both European and Asian languages,
with all conforming processors required to support the
Unicode character set in both its UTF-8 and UTF-16
encodings. The language is designed for the quickest
possible client-side processing consistent with its primary
purpose as an electronic publishing and data interchange
format." [from 1997-12-08 W3C press release]
See http://www.w3.org/XML/hist2002.