Comments: The Openness of XML formats

An open, XML-based file format is good for two reasons:

A. its open (which is good for #2 above)
B. its XML

An XML-based file format is good for a variety of well-known reasons:
- it can be validated;
- it is easy to parse (compared to binary and non-XML text formats; XML parsers are common; many people understand XML)
- third parties can extend the file format with a variety of standard techniques;
- its a better foundation for content reuse, document automation, and other good practices.

It is clear that both open and XML is better than just open.

Posted by Jason Harrop at June 5, 2003 12:19 AM

A point of clarification: assuming the reference below is still current, if you buy the Microsoft Office Professional 2003 Enterprise Edition or Microsoft Office Professional Edition 2003, you get support for "Customer-defined XML schema" (as defined on that web page - but it means a plain old XSD file)

See http://www.microsoft.com/presspass/newsroom/office/factsheets/OfficeSKUFS.asp

Posted by Jason Harrop at June 5, 2003 12:32 AM

Agreed. Open *and* XML is better than just open. But I would rather have open and ugly-binary (with real documentation) than XML and no documentation.

Posted by Adam Barr at June 5, 2003 06:50 AM

I agree completely about documented vs. undocumented. However, it might be useful for the success of your project if you would try to clarify at the top level:

1) What kinds of data formats you're talking about. XML is a good choice for comparatively simple, non-realtime document types, like office applications, however for things like playable media data (audio, video, etc.) XML would ballon file sizes absurdly, transmission rates would plummet to the point of unusability, etc. In other words, for technical reasons there are domains where 'documented but binary' has its places.

2) Whether or not you're addressing the issue of IP-rights encumbrances. In other words, a file format could be both documented and XML-based, but nonetheless copyright- or patent-protected, such that it cannot be legally used without a license agreement, and perhaps even payment of money. I suggest you consider adding 'IPR-unencumbered' to you short list of important critria. This is probably your assumption, but disambiguating it would be good.

Don't mean to nit-pick, just trying to be helpful as I think your basic idea here is a good one.

Posted by Chris Grigg at June 5, 2003 02:50 PM

1) I am not talking about any particular data format--whatever manufacturers want to define. I think XML and open is better than open, but I don't want to codify a preference for any particular format in the bill, because then what does that say about a company with an open non-XML format? I don't want to make anybody move to a new format unless they want to.

2) Patents etc. are a sticky issue. Again I don't want to take away any rights someone already has or make a format they own be worth less. So you can't just say "only formats with no IP encumbrance." However I will put something in there that if there is a patent on a data format, the company has to license it to the government in a Reasonable and Non-Discriminatory (RAND) way.

If I was being more hardcore I would say they had to license it to the government for free, but that would be an easy target for industry opposition.

- adam

Posted by Adam Barr at June 5, 2003 03:15 PM