XML Summary
reference: XML in a Nutshell by Elliotte Rusty Harold & W. Scott Means, published by O'Reilly, January 2001
XML is a meta-markup language for text documents. Data takes the form of text strings, and is described via text markup tags. XML allows tags to be defined as they are needed. The advantage of XML: portability.

History: derived from SGML, invented at IBM in the 70s. XML is "SGML-lite". Development started in 1996, and XML 1.0 resulted in Feb 1998. Then came XML Namespaces, which allowed mixing XML components to form a larger document without conflicts. Then came XSL, which allowed formatting of other XML documents. This split into XSLT, used for transforming one XML document into another, and XSL-FO, used for describing layout of XML documents on printer or web browser. Alternately, CSS and DSSSL, which existed prior to XML, are used for formatting XML documents for print/web. XLL is used to connect documents in a hypertext network.

XLL then split into XLink, used to describe connections between documents, and XPointer, used to link components of an XML document. The addressing portion of XPointer and XSLT were consolidated into XPath. XML 1.0, XSLT, XML Schemas, and DOM are migrating to a common XML Information Set. A consistent API for XML was then introduced, the Simple API for XML, or SAX. SAX2 was released in 2000.

XML Syntax
Tag Syntax: empty tags begin with < and end with />. Tags with content must be closed, and cannot be closed outside a parent tag. Attribute values must be enclosed in quotes. All & and < characters inside quotes must be expressed as &amp; and &lt;, but this is not necessary inside <![CDATA[...]]>. Tag names and attributes are case-sensitive. Comments are of the form <!-- ... -->. The last character in a comment cannot be a -. Processing instructions are of the form <?processing-instruction ... ?> XML documents must be well formed. That is they must conform to:
  • End tags must follow start tags
  • Elements cannot overlap
  • Only one root element
  • Attributes must be quoted
  • Cannot have 2 attributes with same name
  • No comments or processing instructions inside tags
  • No unescaped < or & signs inside element or attribute strings

Simple Example:
<?xml version=1.0" encodeing="ISO-8859_1" standalone="yes"?>
<root-tag-name>content</root-tag-name>

The xml declaration is recommended, and if present, must be the first thing in the document. Version is likely to be 1.0 for quite a while. The default encoding is UTF-8. Sometimes when standalone is no, an external DTD is required. DTDs provide default values for omitted attributes. Valid documents must include a DTD.

Document Type Definitions, or DTDs
DTDs are referenced in documents via <!DOCTYPE root-element SYSTEM "dtd-url-specification">. DTD declarations must precede the root element and follow the xml declaration. PUBLIC "id-name" can be used instead of SYSTEM for rare cases. No network connection is necessary for know public ids.

Usually DTDs are embedded in the document until debugged, and then moved to a separate file. To embed a DTD, use <!DOCTYPE root-name [ ... ]>

ELEMENT Declarations: <!ELEMENT element-name (content-model)>
The content model specifies what children the element may or must have, and in what order. There are 2 simple content models: #PCDATA, which is parsed character data. and child elements. These can be combined in lists, and logical expressions. Logic syntax includes parentheses,
OperatorTypeDescription
|infixor
+suffix1 or more
*suffix0 or more
?suffix0 or 1
Mixed content of elements and text is expressed as (#PCDATA | element-1 | element-2 | element-3 ...)* Empty elements are expressed as <!ELEMENT element-name EMPTY>. Unrestricted content is defined via <!ELEMENT element-name ANY>.

ATTRIBUTE Declarations: <!ATTLIST element-name attribute-list>
Attributes are defined by the name, the type, and the default, each of these separated by whitespace in the ATTLIST declaration. This list is also separated by whitespace, and so there will be 3n components in a list of n attributes.

Possible defaults:

#IMPLIED: attribute is optional, no default value
#REQUIRED: attibute is required
FIXED: attribute value is constant. It is optional, but must correspond to the fixed value.
Literal: The actual default value as a quoted string

Predefined ENTITYs are: &lt; = <, &amp; = &, &gt; = >, &quot; = ", &apos; = '

DTD defined ENTITY, glarf, can be used as &glarf;, much like a macro. In fact, ENTITYs can contain markup, so long as it is well-formed. They cannot be used inside the the DTD; only in the XML document to which the DTD is applied.

DTD Strategies

Data-Oriented DTDs: more lists, less mixed content Narrative-Oriented DTDs: more mixed content.

Complete list of attibute types

TYPEDescription
CDATAcharacter data
NMTOKENsame as an XML name, except digits and legal punctuation characters can occur at start
NMTOKENSNMTOKEN list, separated by whitespace
ENUMERATIONList of possible values: ( att-val1 | att-val2 | att-val3 ... )
ENTITYname of unparsed entity declared elsewhere in the document
ENTITIESList of ENTITYs separated by whitespace
IDa unique XML name Example: in XHTML, the name value in <a name=value/>
IDREFan XML name corresponding to an ID
IDREFSList of IDREFs separated by whitespace
NOTATIONrarely used, but can associate types with elements.

XLinks
XLinks are an attibute-based syntax for attaching links to XML documents. Xlinks take the form:

<xml-tag xmlns="..." xlink:type="simple" xlink:href="..."> ... other tags ... </xml-tag>

Simple XLinks

  • Starting resource is always an XML element (tag)
  • Ending resource can be an XML document, an element in an XML document, a group of elements, or a non-XML document.
  • Note ending resource URI need not be a URL; could just be a reference id.

Possible xlink:type values can be:

xlink:type=Parameters
locatorxlink:href="uri"
xlink:label="xml-name-sans-colon"
xlink:title="glarf"
xlink:role="foo"
arcxlink:from="link-source-label"
xlink:to="link-target-label"
xlink:title="glarf"
xlink:arcrole="uri-pts-to-arc-descr"
titlecontent of element will have well-formed mark-up that provides a complete description of the element
resourcexlink:label="id"
xlink:title="glarf"
xlink:role="foo"
simplexlink:href="uri"
xlink:show="show-type
where show-type can be one of:
  • new: open new window, show contents at the URI
  • replace: show ending resource by replacing current window document
  • embed: embed ending resource in the current document at this link location
  • other: none of the above; application dependent
  • none: no behavior

xlink:actuate="actual-value" where actuate-value can be one of:
  • onLoad: follow link as soon as application sees it
  • onRequest: follow link when user request (clicks) it
  • other: unspecified by XLink
  • none: do nothing
extendeddescribes a collection of resources and paths between resources. A directed, labeled graph where paths are arcs, verices are documents, and labels are URIs.

XLink DTDs

For each tag, glarf that uses XLinks, define each XLink parameter:

<!ELEMENT glarf (foo1, ...)>
<!ATTLIST glarf
xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
xlink:type (simple) #FIXED "simple"
xlink:href CDATA #REQUIRED>
<!ELEMENT foo1 (#PCDATA)> ...

...or to simplify the DTD, create this entity:

<!ENTITY % simplelink
"xlink:type (simple) #FIXED 'simple'
xlink:href CDATA #REQUIRED
xmlns:xlink CDATA #FIXED 'http://www.w3.org/1999/xlink'
xlink:role NMTOKEN #IMPLIED
xlink:title CDATA #IMPLIED
xlink:actuate (onRequest | onLoad | other | none) 'onRequest'
xlink:show (new | embed | other | none) 'new'">

...and in the DTD, use it like this:

<!ATTLIST glarf %simplelink;>

XLink Terminology

3rd party links: Links between purely remote resources.
Inbound links: Links from remote resource (locator label) to a local resource (resource label)
Outbound links: Links from local resource (resource label) to a remote resource (locator label)
NOTE: simple links are outbound
Linkbase: an XML document containing an inbound or 3rd party link.

XPath
XPath isa non-XML syntax used to identify particular parts of a document, indicating nodes by position, relative position, type, content, and several other criteria. XSLT uses XPath expressions to match and select particular elements in the input document for copying into the output document, and can evaluate to either strings, booleans, or numbers. Location Path: ....
XPointers
XPointers are a non-XML syntax used to locate points in (or ranges accross) XML documents. An XPointer is attaced to the end of the URI to indicate a particular part of an XML document. It builds on top of the XPath syntax.