main header pic

 Download .PDF of White Paper >>  

 

XML - A Decade Old and Still Going Strong

Ten Years of Proliferation and Infiltration to the Heart of Information Technology

White Paper by Binary Research International Inc.

Executive Summary

Happy Birthday!  XML - the eXtensible Markup Language - has just turned ten. [1]   Amid a technology landscape strewn with the acronyms of standards past, abandoned and displaced in their infancy, XML has crossed a critical threshold into adolescence.

Its parents - or more prosaically, its guardians, the standards overlords of the World Wide Web Consortium (W3C) - have reason to be proud.  In ten short years XML has established its claim as the archetype for exchange of data across diverse systems, platforms, and applications.

But the importance of XML is not merely the supremacy of its claim among standards - an accomplishment that would hold dubious merit in the absence of industry adoption.  Rather, its importance is its insidious pervasiveness and its infiltration to the core of today's information technology platforms.  Indeed, we have reached a tipping point.  XML stands ready to first enable and then unleash an unprecedented level of information technology integration.  If you haven't yet made its acquaintance, the time is now.

After a backwards glance at XML's conceptual vault beyond the limitations of the Hypertext Markup Language HTML, this white paper illustrates the application of XML as well as its integration into the Windows Vista operating system, the Microsoft Office productivity suite, and the development tools with which tomorrow's XML-powered applications will be crafted.

XML: Concept and Evolution

Markup

Any third-grader knows markup.  It's those comments, in the margin and elsewhere on the page, by a teacher who reviews a student's homework.  Those comments - maybe words of encouragement here and a spelling or grammar correction there - are, of course, for the child's benefit.

Software applications also need markup if they are to interpret and display (i.e. render) electronic information correctly.  For example, a Web browser renders documents (Web pages) in the markup language of HTML.  HTML documents combine not only content (say, a description of a company's products and services) but also instructions (markup) about how that content should be displayed.  The instructions are in the form of so-called tags - embedded notes or annotations such as paragraph tags (<p>, </p>) to specify the start and end of paragraphs, table tags (<table>, <tr>, and <td>) to define the row and column structure of tables, header tags (<h1>, <h2>, <h3>) to denote levels of header and subhead, and image tags (<img>) to indicate placement of images.[2]

Segment of HTML Code

<h1>Annie's Pie Shop</h1>
<p>The health benefits of fruit pies are largely unrecognized.  Try these best-selling favorites!</p>
<table>
<tr><td>Flavor</td><td>Price</td></tr>
<tr><td>Banana crème</td><td>$7.95</td></tr>
<tr><td>Peach</td><td>$7.50</td></tr>
<tr><td>Dutch crumb apple</td><td>$8.25</td></tr>
<img src= "freshly_baked_pie.jpg">
</table>

Most Web developers would agree that Tim Berners-Lee, who proposed the original HTML specification in 1989, did a pretty good job defining the set of tags to describe a document's structure.  Yet HTML has two major limitations.  It uses a fixed, predefined set of tags, and it provides no way to describe the meaning (distinct from the structure and format) of a document's content.

The fixed, predefined nature of its tag set simplifies HTML while limiting its scope of use.  Designed to render conventional documents, the tag set is ill-suited to many other types of content.  Scientists and engineers, for example, are frustrated by the difficulty of representing mathematical equations in HTML.  Many other constituencies, meanwhile - architects, musicians, choreographers, and more - are unable to adequately describe the symbologies of their professions.  The need is for a markup language that is both more flexible than HTML and more specialized to the user's context.

HTML's second limitation - its focus on structure rather than meaning - is no less significant.  HTML content conveys no meaning.  Whether the HTML document contains a candidate's résumé, a list of suppliers, or a product data sheet, it's all just a jumble of headers, paragraphs, tables, and images to an application that consumes HTML content.  Even a simple task such as sorting HTML-coded résumé documents into alphabet­ical order would be virtually impossible.  Again, the need is for a markup language that is both more flexible than HTML and more specialized to the user's particular context.  The need is for an extensible markup language.

Extensible Markup

Devising a language that is both more flexible and more specialized to the user's context poses a dilemma, for the goals of generality and particularity are mutually exclusive.  The solution, the extended markup language XML, is more accurat­ely not a language but rather a metalanguage - a general specification for creating particular markup languages.  Hundreds of such XML-based languages have already been created, each developed for the needs of its particular user constituency.  Examples include:

◊    XBRL, the eXtensible Business Reporting Language, which businesses can use to share financial reporting information with regulatory agencies.

◊    Health Level Seven (HL7) for exchange and retrieval of electronic health records.

◊    MathML, whose tags describe both the visual presentation of a mathematical expression and the expression's meaning.

◊    JSML (Java Speech API Markup Language), a language for annotating text input to speech synthesizers.

The details of each XML-based language - that is, the names and attributes of elements, the element hierarchy, and any constraints on content within each element - are specified in its schema.  Schemas are commonly expressed in an XML Schema Definition (XSD) file, which conforms to the standard XML Schema Definition Language.[3]   For example, the JSML language specifies its particular elements and their attributes in the jsml.xsd file.  The schemas of all XML-based languages must themselves conform to a core syntax of the XML specification.  In this way, an XML parser will always be able to interpret any well-formed XML-type document.  For example, a text-to-speech syn­thes­izer can process a parsed JSML document and output speech.

The JSML code segment below illustrates the JSML language's utility to a synthesizer that converts JSML-encoded text into speech.  The "sentence" attribute of the <div> element conveys the contextual meaning of the text that it encap­sulates.  The ability to tag text in this way enables developers of speech synthesizer applications to enhance a user's experience by adding, say, a replay-previous-sentence button.  This functionality will then work on any document that conforms to the JSML schema.  Another element in the code segment is <literal>.  When the synthesizer encounters content between the <literal> tags, it can enunciate this content letter-by-letter (jay, es, em, el) rather than interpret it as a word that would be (incorrectly) pronounced as "jesmul" or the like.

Segment of JSML Code

<jsml>
<div type="sentence">This block about <literal>JSML</literal> is constructed as an example.</div>
<break time="1s">
<voice gender="female">Did <emphasis>you</emphasis> say you had broken up with your boyfriend?</voice>
</jsml>
 

Application of XML

XML languages describe content (or data) in a standardized manner appropriate to its use.  The foregoing example of a JSML-enabled speech synthesizer illustrates the power of describing data in this way.  But the application of XML is far broader than any one example might suggest, extending into systems integration, data storage and retrieval, and content distribution.

XML in Content Distribution

During the Middle Ages, a predominantly illiterate population of villagers and townsfolk sought information from one source: the town crier.  Half a millennium later, numerous alternative channels now exist for disseminating information - newspapers, radio, Web browsers, handheld wireless devices, MP3 podcasts, and others.

The availability of disparate publishing formats provides a multitude of ways to deliver content.  However, this proliferation has, until recently, come at the price of complexity.  Consider three channels - a conventional PC screen, a Blackberry hand-held wireless device, and hard copy.  Whereas content for the PC screen might be encoded as XHTML, [4] complete with images, large fonts, and wide scrollable pages, the small form factor and potentially narrower bandwidth of a hand-held device demand a very different presentation of content.  Output to paper, meanwhile, might exploit that medium's higher resolution, allowing, say, footnotes to print in a smaller font than would be necessary on a screen.  Use of color and ink-intensive graphics could reduce to a minimum to conserve printer supplies.

Extensible Stylesheet Language Transformations (XSLT) is an XML technology that spares publishers the pain of formulating content in multiple languages for multiple distribution channels.  It allows them to prepare the content just once and then easily publish it in multiple formats.  The technology works by presenting an XSLT processor with two input documents - an XML source document and an XSLT stylesheet from which the processor derives a suitably formatted output document.  Each XSLT stylesheet is a kind of template for the output file.  In this way, one XSLT stylesheet may transform content into XHTML, while a second transforms it into the Wireless Markup Language WML, and a third into the portable PDF format.

Web syndication is another XML-powered realm of publishing.  It's the process by which a so-called Web feed - some portion of a Web site - is available for distribution to other sites or to individual subscribers.  For example, a high school Web site might syndicate a feed of announcements from the local school board, updates to the school calendar, and news of the football team's most recent travails.  Parents, students, and teachers may choose to subscribe to the feed, receiving the content via a browser-based or desktop-based feed reader.  The two major Web feed standards are the Atom format and the older RSS (Really Simple Syndication) format family.

Example of a Feed in the Atom Syndication Format

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 
<title>Hello Again from Springfield High School</title>
<link href="http://springfieldhigh.org/feed/" rel="self"/>
<updated>2003-12-13T18:30:02Z</updated>
<author>
   <name>Principal W. Seymour Skinner</name>
</author>
 
<entry>
title> Cougars March Towards Championship!</title>
<link href="http://springfieldhigh.org/2008/03/31/atom03"/>
<updated>2008-03-31T15:30:02Z</updated>
<summary>A dramatic last-minute touch down helped the Cougars to a stunning victory over the Vikings and now places us just one win away from a trip to the national championship game in Springfield.</summary>
</entry>
 
</feed>
 

XML in Data Storage and Retrieval

The user-definable nature of an XML tag set and its hierarchical structure (that is, XML's ability to nest, say, <name>, <address>, <work-history>, and <hobbies> elements inside a <resume> element) supports use of XML as a simple, cross-platform database.  Human resource professionals in a law firm that litigates intellectual property cases, for instance, might store job applicants' résumés in this particular XML language whose tags describe the various sections of the résumé.  A search utility could then rapidly identify whether, say, a query for "patents" and "trademarks" matches any content inside the each résumés <work-history> tags.

The development of XML query languages such as XQuery and its derivative XPath further advances the opportunity to extract and manipulate data from XML documents.  Indeed, XML query languages can find use with data source whose content can be stored in an XML format, including relational databases [5] and office documents.  In this way, XML promises to bridge the long-standing disconnect between the Web and the colossal quantity of data locked inside databases.

XML in Systems Integration

XML's potential to integrate systems - including integration between enterprise applications and between applications and consumers - may be more important still.  Of course, systems integration already exists to some extent, mostly through hard coding of interfaces that convert the flat file format of one system into the flat file format of the other.  As more systems interconnect, the maintenance costs and complexity of hard-coded interfaces increase accordingly.  Hard coding is also inflexible, often requiring extensive revision following a change in the logic of any one system.

A new approach to systems integration is Web services - most commonly, XML-based Web services.  In this model, applications on disparate systems exchange XML-based messages via an XML messaging engine via the SOAP [6] protocol.  Enterprises can use XML-based Web services to make their applications accessible to users both inside and outside the company.

Consider the case of an automobile manufacturer.  Managers in the accounts payable department had to field dozens of calls and emails each week from suppliers seeking status updates on invoices.  Then the manufacturer introduced XML-based messaging.  Now suppliers can check the status of an invoice directly.  Status requests are relayed via the Internet protocol to a messaging engine, which maps (converts) them into an XML format that the manufacturer's ERP system can understand.  After obtaining the status from the ERP system, the messaging engine maps the response into an XML format intelligible to the supplier's system - perhaps XHTML or the ebXML electronic business standard.  XML and other distributed messaging technologies such as DCOM and CORBA are now enabling systems integration that had formerly been economic­ally if not technologically unthinkable.

 

XML: Beauty or the Beast?

XML is not without its detractors.  Criticisms typically center on issues of intuitiveness (or lack thereof), complexity, and verbosity.  The unintuitive nature of XML's elements and attributes, say some critics, makes XML code difficult for humans to read and write.  Defenders of the XML standard, however, either dismiss this claim or point to emerging tools such as Microsoft Expression (see Design Tools and Integrated Development Environments), which increasingly insulate application designers from exposure to XML code.

The verbosity of XML lies in its syntax, which can introduce a sizable overhead, especially when XML is used to describe tabular data.  Other needs such as links between XML doc­uments and XSLT translations between XML schema also require extraneous code, according to some critics, with implications on the storage, transmission, and processing of potentially bloated XML files.  In counterpoint, use of an optimal schema - whether a custom schema or an existing XML language - together with judicious use of namespaces may contain files to a size readily manageable by today's IT systems.

The complaint of complexity is more difficult to refute.  Although a particular XML language may be very simple, many are not and the associated technologies present a confusing array of competing and, in part, duplicative standards.  This paper references some (but by no means all) of those technologies, and intelligent choices between them - for example, between XSDs and DTDs, or between XPath and XQuery - can only be made after a thorough grounding in an array of information technologies.  In response to this complaint, XML proponents offer assurance that a market consensus is fast forming around the best technologies.  E pluribus unum.

The debate over XML - which ranges from vitriolic damnation to messianic identification - will eventually subside.  In the meantime, developers and standards agencies may continue to hold quite contrary views on the merits and demerits of XML.  These differences derive more from perspective than fact.  Specifically, the developer's perspective is of direct responsibility to integrate XML technologies.  Industry-driven standards agencies, in contrast, devise and promulgate those technologies in service of enterprise and consumer needs.

 

Under the Hood: XML and Windows Vista

XML isn't just a technology for harmonizing ERP systems and other enterprise applications.  Peek under the hood of Windows Vista, Microsoft's new PC operating system, and you'll find it in abundance.

Microsoft XML Core Services

Microsoft XML Core Services (MSXML) is a set of services that make XML technologies available to applications written in programming languages such as JScript, VBscript, and C++.  Among the core services available in the latest version (6.0) of MSXML:

◊    Access to the Document Object Model (DOM), a library of application programming interfaces (APIs) to XML documents.

◊    The Simple API for XML (SAX), an alternative to DOM-based process­ing.

◊    XMLHttpRequest and ServerXMLHTTPRequest for implementing service-oriented applications and AJAX-based (Asynchronous JavaScript and XML) interactive Web applications.

◊    The ability to use the XPath query language within DOM documents.

◊    The ability to transform XML documents using XSLT.

◊    Support for the XSD 1.0 specification.

◊    The Schema Object Model (SOM), an additional set of APIs for accessing XML Schema documents programmatically.

Answer Files and Configuration Files

In a bid to streamline installation of Vista for enterprise customers who may need to deploy to hundreds or even thousands of PCs, Microsoft has again exploited XML.  For most image-based deployments, an IT professional can customize the Windows settings for an unattended installation, using the Windows System Image Manager tool to create and modify a single answer file (Unattend.xml).  This contrasts with installation of Windows XP, Vista's predecessor, which involved setting up multiple text files such as unattend.txt, winnt.sif, sysprep.inf, winborn.ini, and oobeinfo.ini.

Indeed, the common value-pair syntax of attributes in all XML languages and XML's hierarchical data structure make XML an ideal format for storing configuration data.  As a result, many applications designed for use with Vista - for example, Microsoft Office 2007, Adobe Acrobat, and more recent versions of Visual Studio - also now include XML files to store user and application settings.

Desktop and SideShow Gadgets

Desktop gadgets are small specialized applications or applets for simple tasks.  Examples include clocks, calendars, RSS notifiers, and search tools.  Comprising XML, DHTML, and Microsoft .NET Framework code, gadgets run on the desktop and on the Windows Sidebar in the Vista operating system.

Also new to Vista are SideShow gadgets, which run on auxiliary displays - for example, a secondary screen on the outside surface of a notebook computer lid, or a detachable device via Bluetooth or other wireless connectivity.  SideShow gadgets enable access to information and media, even when the main system is in stand by mode.

Windows Presentation Foundation and XAML

Preinstalled on Windows Vista, [7] the Windows Presentation Foundation (WPF) is a feature of the latest Microsoft .NET Framework for creating and implementing applications that employ graphical user interfaces.  The WPF set of tools and technologies provides a consistent programming model for building such applications, enabling control, design, and development of all the visual aspects of the program.  WPF-based applications are deployable either on the PC desktop or within a host Web browser. [8] In the latter case, the application uses Silverlight, a light-weight cross-browser plug-in version of the .NET Framework.

XAML (pronounced zah'-mul), the eXtensible Application Markup Language, is central to the Windows Presentation Foundation and its Silverlight counterpart.  It allows application designers to easily create graphical user interfaces by specifying elements such as buttons, 2-D and 3-D objects, rotations, animations, and other effects.  It also allows designers to specify the data binding relationships between the user interface and the underlying application logic. [9]

Traditionally, designers - who typically lack expertise in procedural programming languages such as C# and VB.NET - have been unable to integrate an interface design with application code.  In this way, XAML represents a milestone in application design, providing a declarative (descriptive) language alternative to procedural languages.  XAML thereby shortens the application development cycle by empowering designers who may have no programming experience.  Moreover, designers can readily create XAML designs using advanced visual tools such as Microsoft Expression and then share code with one another prior to compil­ation.

XML Paper Specification

Vista introduces the XML Paper Specification (XPS), a combined page description language and spool format for its printing system. [10] In a nutshell, XPS documents are not only platform-independent, but also print better and faster, are more secure, and can be shared and archived more easily than other document formats.

An XPS file is a ZIP archive of document files.  The archive includes an XML markup file for each page, the document text, any embedded fonts, raster images, and 2-D vector graphics, and any digital rights management information.  (Notably, the XML markup language of XPS is a subset of XAML.  This allows it to incorporate vector-graphic elements into documents, which ensures image quality remains high, even when images are magnified.)

Every Vista PC includes an XPS viewer, comparable to Adobe's Acrobat Reader for PDF documents.  However, Microsoft has also leveraged its dominance in office productivity software - an advantage it holds over Adobe - to include XPS in Microsoft Office 2007.  Office users can now save Word, Excel, PowerPoint, Publisher, Access, Visio, and other application files in the XPS format as an alternative to PDF or their native formats.  And the latest Microsoft .NET Framework includes APIs for developers to integrate the XPS file type in third-party applications.

On the Desktop: XML and Microsoft Office

Any doubts over XML's potential to infiltrate the desktop receded with the launch of Office 2007.  This latest version of Microsoft's ubiquitous productivity suite radically advances on its predecessor, Office 2003, which first integrated a limited degree of XML functionality.  Indeed, the newer Office Open XML (OOXML) standard - an open and royalty-free specification adopted for Office 2007 - overcomes the limitations of XML's implementation in Office 2003.  For example, OOXML files are now more compact.  Like XPS files, they exploit ZIP compression to achieve smaller sizes than their monolithic XML counterparts in Office 2003.

If Office 2003 was Microsoft's first foray into XML on the desktop, Office 2007 takes it mainstream by making OOXML the default Office format. [11] This said, few users will even notice Office 2007's integration of XML, and that is exactly as intended - transparency for the majority of users to enjoy a familiar productivity suite.  Meanwhile, the businesses who license Office 2007 can exploit its XML functionality to build meaningful Office documents.

Conventional Office documents aren't meaningful - at least, not meaningful to anyone but a human who is familiar with the context.  But an Office document stored as XML can associate user-defined tags to sections of content, thereby conferring explicit meaning.  And when documents on PCs across a company are similar­ly tagged accord­ing to a suitable schema, the potential for data access and reuse increases exponent­ially.

Whereas OOXML is the default schema, users can create and save Office documents in any XML schema.  Sometimes, a user may favor a custom schema to best meet the needs of the enterprise.  Oftentimes, however, an existing schema - perhaps a vertical standard such as the healthcare industry's HL7 or a horizontal standard such as the XBRL financial reporting language - will enable the enterprise to more efficiently share information with customers, suppliers, regulatory agencies, and other external stake­holders.  In this way, Microsoft is seeking to establish Office as the front end, not only for integration of business information on the desktop, but also for XML Web services.

Design Tools and Integrated Development Environments

Integrated Development Environments (IDEs) are at the sharp end of software development.  It is in these environments that new technologies evolve into core technologies.  For that reason, this white paper concludes with the observation that the latest generation of IDEs all integrate XML tools.  For example, Microsoft's Visual Studio 2008 features a XAML-based visual designer, XML Schema designer, and XSLT debugger.  Rival Stylus Studio 2008 from DataDirect Technologies integrates hundreds of XML tools.  And the open-source Eclipse, an extensible development environment, also includes broad support for XML technologies.

Beyond IDEs, the latest user-interface design tools are further driving XML into the heart of application development.  Most notable, to this end, is Microsoft's aforementioned Expression suite.  Expression Web for Web applications, Expression Blend for Web-connected desktop applications, and Expression Media for multimedia experiences are all built upon the XAML-processing Windows Presentation Foundation.

IDEs and design tools aside, developers with an interest in learning more about XML may prefer to explore the fundamentals in a simple XML editor.  To this end, the freeware products XMLPad, Liquid XML Studio, and Microsoft's XML Notepad 2007 [12] are good places to begin your journey.

About Binary Research International

Binary Research International has long been at the forefront of software development and support services for IT departments.  Our proud record of innovation extends almost twenty years to the founding of our predecessor, Binary Research Ltd.  This New Zealand-based pioneer in file-transfer technology introduced Ghost, the original cloning utility, in 1996.  Binary Research Ltd. was subsequently acquired by Symantec Corporation, the global leader in maintaining critical IT infrastructure.

Today Binary Research International provides sales, training, and support for best-in-class products such as Ghost, the Universal Imaging Utility, and other back-up and imaging solutions.  In addition, we offer consulting services to Ghost users, including project planning, network assessment, deployment, and troubleshooting.  We also train network administrators, IT project managers, business analysts, and systems integration consultants in the deployment and management of Windows Vista inside corporate, government, and academic environments.

Our clients include General Motors, Exxon, Booz-Allen, AT&T, British Telecom, U.S. Federal Reserve Bank, Oxford University, Harvard University, NATO, Fujitsu, Coca Cola, DreamWorks, Rockwell Automation, U.S. Department of Justice, Procter & Gamble, Siemens, Xerox, U.S. Library of Congress, Intel, DuPont, Bank of America, and the U.S. Air Force.

Binary Research International is headquartered in Glendale, Wisconsin.  We operate in Europe through our subsidiary, Binary Resource (UK) Ltd.  To learn more about our expertise, contact:

Binary Research International Inc.

5215 N Ironwood Rd, Suite 200
Glendale, WI 53217
United States

USA Toll Free: 1-888-446-7898

USA Phone: 414-961-7077

USA Fax: 414-961-1716

info@BinaryResearch.net

www.BinaryResearch.net

             

Binary Resource (UK) Ltd.

Lombard House
12-17 Upper Bridge Street
Canterbury, Kent CT1 2NF
United Kingdom

UK Toll Free: 0800 404 9282 (UK Only)

UK Toll Free Fax: 0800 404 9286 (UK Only)

International Phone: +33 321.86.76.17

International Fax: +33 321.86.76.68

info@BinaryResource.com

www.BinaryResource.com



 [1] The World Wide Web Consortium's Recommendation Extensible Markup Language 1.0 was published February 10, 1998.

 [2] HTML markup may also include tags that describe the style or format of a document - for example, the color and size of text and the type of border around each image.  However, cascading stylesheets (CSS), a complementary technology, is the W3C's preferred means to specify formatting.

 [3] The XML Schema Definition Language is not the only standard for specifying an XML schema.  Other (non-XML) specification standards exist, most notably Document Type Definitions (DTDs).

 [4] Essentially, the latest version of HTML, the eXtensible Hypertext Markup Language XHTML is an XML language that specifies the familiar HTML elements and properties.

 [5] Whether native XML databases displace their relational counterparts remains unclear, though relational database vendors, recognizing the importance of XML, have subsequently included advanced features to support XML.

 [6] The acronym, which formerly represented Simple Object Access Protocol, has been subsequently abandoned by the W3C and XML Protocol Working Group as misleading.

 [7] WPF is also available for installation on the earlier PC and server operating systems Windows XP SP2 and Windows Server 2003.

 [8] Technically, Web-hosted WPF applications run in an out-of-process executable session, separate from the browser.  However, from a user's perspective, the application runs inside the browser.

 [9] XAML elements map directly to object instances of Common Language Runtime (CLR), the virtual machine component of the .NET Framework.  XAML attributes, meanwhile, map to the CLR properties and events pertaining to those objects.

 [10] Via a free download from Microsoft, earlier Windows operating systems and alternative platforms such as MacOS, Linux, and Unix can integrate XPS functionality.

 [11] Older versions such as Microsoft Office 2000, Office XP, and Office 2003 store files by default in binary formats such as .doc, .xls, and .ppt.  However, compatibility packs are available for these versions to manipulate OOXML files.

 [12] Look for a Binary Research white paper on XML Notepad 2007.

 

footer_left   footer_right
spacer
spacer