XML (eXtensibe Markup Language) is a magnet for hype: the successor to HTML for web publishing, electronic data interchange, and e-commerce. In fact, XML is just a notation for trees, little more than a verbose variant of Lisp S-expressions; and a way to define tree grammars, a poor-man's BNF. Yet this simple basis has spawned scores of specialized sub-languages: for airlines, banks, and cell phones; for astronomy, biology, and chemistry; for the DOD and the IRS.
This note is a brief guide to web resources that explain XML, the associated core technologies, describes some representative applications and lists additional applications and resources.
XML is a descendant of SGML (Standardized General Markup Language), and is related to HTML (Hypertext Markup Language), another SGML descendant. XML is more general and uniform than HTML, and simpler than SGML. (Here's an Overview of SGML by the National Library of Canada.)
XML is a simple notation for describing trees. Each internal node in the tree is an element, and leaf nodes are either attributes or text. An XML document that is properly nested (so that it really describes a tree) is called well-formed. In addition, one can give a DTD (Document Type Definition) or Schema, which specifies what nodes might appear in the tree. For each element type, one can list what attributes might appear with that element, and give a regular expression specifying what elements can appear within that element. A document that satisfies the associated DTD or Schema is called valid. Each XML dialect is specified by giving its DTD or Schema. Some XML systems work with any well-formed document, while others also check that the document is valid.
General introductions to XML include:
Technical introductions to XML include:
There are many XML portals:
XML conferences:
Recommended text:
From NewsScan:
W3C (the Worldwide Web Consortium) promulgates both the XML and HTML standards. There is a continuing W3C XML Activity to develop standards related to XML, some aspects of which are described below. The W3C process has a number of stages: typically a proposal goes through several working drafts, then spends a short time as a proposed recommendation before becoming a recommendation.
XML is a notation for defining trees. The relevant W3C document is:
The W3C XML Namespace are a sort of module system; they are supposed to prevent collisions that might occur when different organizations pick the same element names. The relevant W3C document is:
XHTML is the dialect of XML that replaces HTML. It has a more modular structure than HTML, makeing it easy for browsers or handheld devices to support some features but not others. The relevant W3C document is:
DOM [W3C members] (Document Object Model) specifies how to access and update XML trees from within languages such as JavaScript, Java, and C++. Many (most?) vendors provide some implementation of the DOM. The relevant W3C documents are:
CSS [W3C members] (Cascading Style Sheets) let you control the display of XML or HTML documents, by specifying how to render specific components (e.g., display all headers in red bold Helvetica 18 pt). The relevant W3C documents are:
XSL [W3C members] (Extensible Stylesheet Language) also lets you specify how to display an XML document. It is much more flexible than CSS, for instance, you can specify that the document is to be processed twice, once to generate a table of contents and once to generate the document proper. XSL, unlike CSS, is suitable for enerating both web and print versions of a document; but it is more complex than CSS and less widely supported.
XSL has two parts: a language for transforming trees (called XSLT), and a vocabulary of formatting objects (sometimes called FO). The idea is you use XSLT to convert your document to FO, but you might also use it to convert your document to, say, HTML. Right now, the latter is much more common, since there are several implementations of XSLT (from Microsoft and IBM among others) and only one partial implementation of FO. XSLT and XPointer (see below) share a common sublanguage, XPath.
The relevant W3C documents are:
Glosses on XSLT:
XLL
[W3C members]
(Extensible Linking Language)
lets you specify links between
documents (generalizing the a
tag in HTML).
It consists of two parts: XLink specifies links between
documents, and XPointer specifies how to point to a specific
part of a document (e.g., the second paragraph in section three).
XPointer and XSLT (see above) share a common sublanguage, XPath.
The relevant W3C documents are:
XML Schema [W3C members] is intended to let you use XML to specify the format of another XML document, generalizing the notion of a DTD, and resembling the schemas used in the database world. The relevant W3C documents are:
XML Query (W3C members) is a query language for XML. The relevant W3C documents are:
SOAP stands for Simple Object Access Protocol. It is a component standard, roughly analogous to COM or Corba. A program written in one language running on a client may invoke a method on an object running on a possibly different language on a server, thus allowing working across languages and across machines. SOAP includes a standard way to represent data structures in languages such as Java and C in XML. SOAP is backed by Microsoft and IBM, among others. IBM's implementation of SOAP for Java is freely available under Apache.
"SOAP is a lightweight protocol for exchange of information in a decentralized, distributed environment. It is an XML based protocol that consists of three parts: an envelope that defines a framework for describing what is in a message and how to process it, a set of encoding rules for expressing instances of application-defined datatypes, and a convention for representing remote procedure calls and responses. SOAP can potentially be used in combination with a variety of other protocols; however, the only bindings defined in this document describe how to use SOAP in combination with HTTP and HTTP Extension Framework." [source]
WSDL stands for Web Services Description Language. If SOAP is analogous to Corba or COM components, then WSDL is analogous to IDL, the Interface Description Language for components.
"WSDL is an XML format for describing network services as a set of endpoints operating on messages containing either document-oriented or procedure-oriented information. The operations and messages are described abstractly, and then bound to a concrete network protocol and message format to define an endpoint. Related concrete endpoints are combined into abstract endpoints (services). WSDL is extensible to allow description of endpoints and their messages regardless of what message formats or network protocols are used to communicate, however, the only bindings described in this document describe how to use WSDL in conjunction with SOAP 1.1, HTTP GET/POST, and MIME." [source]
UDDI stands for Universal Description, Discovery, and Integration. It is analogous to a yellow pages for the web, letting you look up both companies and web services. Web services may be described with WSDL and invoked with SOAP, so it builds upon the above two standards.
"Discover businesses worldwide that offer the exact products and services that you need. Register the products and services of your own business for others to discover. Or both. Technology and business champions are leading the development and deployment of an open, Internet-based Universal Description, Discovery, and Integration (UDDI) specification. UDDI is the building block that will enable businesses to quickly, easily and dynamically find and transact business with one another using their preferred applications." [source]
A namespace mentions a URI; what should that URI point to? The W3C Namespace Recommendation is quite clear that it need not point to anything, but it might be useful for it to point to a readable description of the XML dialect, an XML Schema describing the dialect, a stylesheet, or related executable code.
RDDL, the Resource Directory Description Language, provides a frameork in which one can do all these things. A RDDL pages consists of HTML describing the XML dialect, together withrddl:resource
elements which is specified using xlink:role
,
xlink:arcrole
, and xlink:href
attributes
as defined by XLink.
There are standard lists of
role natures and
arcrole purposes.
ebXML is one of a number of competing initiatives to manage vocabularies for e-business. Others include bizTalk (founded by Microsoft), Rosetta Net, and ecoFramework. Currently, ebXML is the leader of the pack, partly because it is supported by EDIFACT, the UN organization that defined EDI (Electronic Data Interchange).
"ebXML is a set of specifications that together enable a modular electronic business framework. The vision of ebXML is to enable a global electronic marketplace where enterprises of any size and in any geographical location can meet and conduct business with each other through the exchange of XML based messages. ebXML is a joint initiative of the United Nations (UN/CEFACT) and OASIS, developed with global participation for global usage." [source]
VoiceXML Forum is an industry organization founded by AT&T, IBM, Lucent and Motorola, and chartered with establishing and promoting the Voice eXtensible Markup Language (VoiceXML), a new standard to make Internet content and information accessible via voice and phone.
Wireless Application Protocol (WAP) Forum is an industry organization including wireless equipment manufactures, operators, service providers, and content providers. One component of the protocol is Wireless Markup Language (WML), an analogue of HTML aimed at display on devices such as cell phones, pagers, and PDAs.
DSML is a markup language for representing directory services in XML. LDAP (Lightweight Directory Access Protocol) provides a means for accessing directory information. DSML provides to XML-based applications a means for accessing directory information. Here is a Data Sheet on DSML. DirXML is Novell directory product based on XML and DSML.
Universal Plug and Play (UPnP) is an extension of the Plug and Play initiative, which defined standards that allowed PC systems to self-configure when new devices were connected. Universal Plug and Play enables smart appliances to identify each other on a network and work together independently of a server or even a PC. The specifications of Universal Plug and Play are based on industry standards, such as HTTP, XML, DNS and LDAP.
SyncML is an XML-based standard to make it easy for users of handheld computers to exchange calendars, e-mail, to-do lists and other data among different devices and different operating systems. The SyncML alliance was founded by IBM, Nokia, Lotus, Motorola, 3Com, and Starfish Software.
(Thanks to Jerome Simeon. Some entries are adapted from his Lucent internal XML page.)
Many vendors are investing in XML:
Some programming languages provide support for XML:
XML support in functional languages:
Some scripting languages for XML and/or embedded in XML:
There is also some material on languages for XML in my talk The Next 700 Markup Languages.
Documents and Data in XML:
Tools for editing XML:
XML is getting some coverage in the national press: