Versioning made easy
with W3C XML Schema and Pipelines

Henry S. Thompson

W3C logo

Architecture Domain
World Wide Web Consortium

Markup Technology logo

Markup Technology Ltd.

20 April 2004

What is the versioning problem?

Applications grow and change
XML document types for applications grow and change
- So we end up with multiple versions of the schema for the document type
Old code is hard to eradicate
- So we end up with multiple versions of the code implementating the application

Analysis of versioning scenarios

Two dimensions of the problem:
- Active vs. passive
  
  Active
  
  Schema author prepares for versioning, with e.g. wildcards
  
  Passive
  
  No such preparation
- Engaged vs. dis-engaged
  
  Engaged
  
  Close coupling between developer and users; regular upgrades of schemas
  
  Dis-engaged
  
  Loose coupling; no schema upgrades

Versioning and Web Services

David Orchard has articulated a detailed analysis of expected versioning scenarios for Web Services
The bad news
- We're in the Passive+Dis-engaged quadrant
The good news
- Changes are likely to be additive
How can we get a solution for W3C XML Schema
- New software?
  - E.g. Semi-validator
- Changes to W3C XML Schema
  - E.g. Open content models

Exploiting partial validation

Key aspect of W3C XML Schema is fine-grained distributed validity information
- Every infoitem has validity information
- Two three-valued properties in the PSVI
  
  Validity
  
  valid; notKnown; invalid
  
  Validation attempted
  
  full; partial; none
- In the case of additive versioning, we get a PSVI like this
- It looks like we need to think about a tool that could walk the PSVI and look for valid sub-sequences and re-decorate as it goes

Exploiting partial validation, cont'd

We could do that
- but it would be wrong :-)
It would have to recapitulate almost all the semantics of a schema validator
Lightbulb! Why not use a validator itself?
If we could just get rid of those notKnown nodes and revalidate, that would be good
So we need a three-step pipeline
- Validation
- Surgery
- Validation
Fortunately, Markup Technology have developed a pipeline authoring and execution tool
- So it was easy to check this out

Introducing MT Pipeline

The lack of a coherent XML processing model to support decomposition of complex XML processing tasks represents a serious bottleneck
- for enterprise use of XML in general
- for Web Services in particular
All that's needed is support for the basic tool in the architect's armoury: Divide and Conquer
In other words -- XML Pipelines
- Configurations of basic XML processing steps
- Some steps are relatively heavy
  - XSLT-based transformation
  - W3C XML Schema-based validation
- Others can be much simpler
  - XPath-based extraction
  - One-for-one renaming

What is missing?

A standard for pipeline specification?
- Interop matters here just like everywhere else
High performance?
- Pipelines need to be fast to be attractive
We have a candidate standard: the Sun XML Pipeline W3C Note is a good starting point
- Published by W3C in February of 2002
- Edited by Eve Maler and Norm Walsh
- Many co-submitters, including Markup Technology

The Sun Pipeline design

An XML document type for describing pipelines
A pipeline is a sequence of steps, with specified input(s), output(s) and parameters
The processing required to perform a step is named, not defined in detail
Dependency-driven, in the mode of make and ant

Re-interpreting the Sun Pipeline Note

We have re-interpreted the proposed document type
- Removing some limitations
- Enabling more efficient implementation
We interpret it as simply specifying a configuration of operations on XML-encoded information
- Without dependency-driven semantics
Allows intermediate results to be passed between components without serialisation
So we think of pipelines more like shell scripts
- Mapping externally specified inputs to outputs
Facilitates deploying pipelines
- For example in servers where they can then operate on message-derived input to produce message-delivered output
And we have a highly optimised implementation

Using MT Pipeline for V2S

We're looking for a way to Validate twice
- with Surgery in between
Fortunately, MTPL already supports
- W3C XML Schema validation, with full PSVI output in the pipeline
- Surgery, that is, XPath-based elimination of elements and/or attributes from an infoset
- XPath extension functions to access the PSVI
So we can just build the pipe and try it
- [Demo]
- This pipeline runs very efficiently in our MT Pipeline Light tool

Schema design issues

Will this always work?
Alas, no. Several pre-conditions:
- Schema validators are not required to keep going after finding an error
  - But most (all?) of them actually do
- In order for REC's error recovery strategy to work, only top-level element declarations can be used
  - Because once the content model is blown, we don't know which local declarations to use
  - [Although some validators use them anyway]
Will it always make sense?
Likewise no. Syntactic additions are not always semantically clean

The additive assumption

David Orchard points to the success of HTML's "must ignore" strategy wrt unknown markup
But Michael Sperberg-McQueen points out this is not the same as what's required for David's own examples
- HTML just ignores the tags it doesn't understand
  - But it processes the content
- Our story needs to ignore the whole unknown subtree

And additional elements are not necessarily purely additive in meaning

Ignoring nad:country can get you in trouble:

<shipTo>
  <ad:name>HM Queen</ad:name>
  <ad:street>Buck House</ad:street>
  <ad:city>London</ad:city>
  <nad:country>UK</nad:country>
 </shipTo>

Changing W3C XML Schema

Lots of people use local element declarations
They shouldn't be disenfranchised
The W3C XML Schema WG was already looking at a change behind the scenes
- Re-interpreting local element declarations as just that:
  - Declarations scoped to their enclosing type definition
- Conceptually, all content model particles would then be references to declarations by name
Such a change would make local element declarations available for wildcards
- And for error recovery
So this change would allow local declarations to work with V2S

Conclusions

W3C XML Schema's partial validation is powerful
Pipelines are cool
Watch for free MT Pipeline beta
E-mail me (ht@markup.co.uk) if you are interested in participating in an alpha programme

Versioning made easywith W3C XML Schema and Pipelines