XSD and XML Web Services

I have 2 signs hanging on my cube at work. The first is a photo with the caption "The Problem With Metadata", which usually catches people's attention and gets a laugh. The other is a page from Tim Bray's On Semantics and Markup, with the following quote highlighted:

To oversimplify, XML is winning and ASN.1 is losing. There are a variety of reasons for this, but one of them is that it seems to be more important to know what something is called than what data type it is.

That statement is my personal heresy against corporate development practices.

Dare's post on RELAX-NG, XSD and XML Web Services leads to a really interesting discussion on Tim Ewald's blog. Tim's reaction is a little odd since the post he's referring to contains the quote "Many people think that Relax NG is the answer, but I think this view misses the deeper problems caused by its continued reliance on the Infoset as a data model. While some people do actually want to shove markup around, the more prevalent use case by far is simple data.". Still, I have a pretty strong preference for RELAX-NG over XSD, and use RELAX whenever I want to design a schema on my own. My personal bias here is that Schematron annotations add a great deal of value to XSDs, and Schematron is something that deserves more attention. But I've been wrapping around to Tim's quote. If what we want to do is assign names to structure, aren't we back at DTD?

The environment I develop for is strongly tied to using XSD for building message contracts. In fact, the environment is entirely XSD-first; you must define your messages in XSD, compile them through Castor into data objects (I'd almost call them beans, but I don't know if they meet that definition), and then write the code to produce an output message from those objects. But the reality is that we barely use XSD. We use some derivation by restriction, which maps neatly into traditional inheritance. But it turns out that marshalling concrete types into abstract definitions isn't exactly standardized yet (Axis and the .NET Web Services stack do it differently), and poses an interop problem, so we don't take much advantage of the fact we have a polymorphic type. And as far as assigning types to data, there are boolean, string, and dateTime, and a very few instances of int. I haven't seen a single restriction on simple types, even though such restrictions could give important hints to the party on the other end of the wire; field length limits being the obvious one. I think this is partly laziness and partly because the modeling tool that generated many of these schemas didn't capture that information, and partly because things like length specifications on strings get lost in the XSD-Object mapping in every tool I know of. You can't really blame XSD for either of those. But I keep coming back to this - DTD would give us everything we have right now, minus type annotations to differentiate boolean, dateTime, and int from string. That seems underwhelming.

This brings up another point, that many of these schemas were designed from a modeling tool, and not in an editor. Many of the technologies around XML are designed to hold the wire format at arm's length, and with XSD being as verbose as it is, the desire's even greater. Coupled with the belief that boxes and lines are more comprehensible, and even meaningful, than text, and the result is that the XSD becomes an artifact and not a goal of the modeling process. On top of that, there's the desire to map this into objects on either end of the wire. It's a contradictory situation, where we've gone to great lengths to hide the fact that there's XML on the wire, but we still can't get around the fact that are data model is specified in XML.

It seems to me that one useful thing we could have, but don't, is a way to annotate a schema to indicate request / response. We use the same schemas on request and response; on request, we might just specify a primary key for some entity, and expect the schema to be fully populated on response. Or if you want to modify an entity, you might use the same schema, but you'd need to populate a few more elements, and you'd receive just the modified elements in return. Maybe that's just a bad message-oriented design, but it would certainly be useful right now.

— Gordon Weakliem at permanent link