Thoughts on Diff and PATCH · 18 February, 10:35 AM

I’ve been following the discussion around PATCH a little bit, though not in depth. James Snell got my attention though with a great point: if you’re talking about sending diffs for XML, the problem is that so far, there’s no standard format, and the formats that exist are too complicated. I’ve written about this before. Most of my experience has been with the Microsoft Diff and Patch format, which definitely has both of these problems.

The first issue is absolutely true: there’s no particular reason the format should be XML, it’s just that XML has become the de facto format for representing everything, in spite of the fact that XML is a horrible syntax for a programming language. I don’t think there’s any specific reason that XML should work better for a diff format.

The second issue is a bit trickier. The issue is with coming up with a language for describing transformations on trees, but as usual, XML attributes and whitespace in content create significant issues. In capsule form:

  1. Order of attributes is not significant, which is the main reason that a plain old text diff won’t work with XML. Also, if document is being interpreted with a DTD or Schema, the presence of an attribute may not be significant – attributes can have default values.
  1. Whitespace in text content for nodes may be significant, again, depending on schema.

I think that there are conservative approaches you could take that would solve these problems, but I suspect there are corner cases in these areas which would make defining a general purpose format difficult. Still, I do believe that a good diff format for XML would be a very good thing to have, and it’s better to start simple. Given that, though, I’m not comfortable starting with XQuery. I don’t know a whole lot about it, but some people whose opinions I respect (on XML at least) seem to dislike it.

— Gordon Weakliem

---

Comment

Commenting is closed for this article.