However, it's not clear that "open" and "XML" are necessarily joined at the hip.
The debate is muddled because "open" data formats can mean different things to different people:
Data stored in XML is described using a schema, which defines what various tags mean. You could think of standard HTML as defining a schema, which every browser supports; with XML, anyone can define a schema.
An XML format would satisfy rule #1 and arguably satisfy #2, in the sense that XML schemas tend to be self-documenting. Microsoft's current binary format for Word satisfies none of those, so it is certainly not open. With ODFI I am trying to get Microsoft (and other companies) to satisfy #2 only--and even with XML formats I want companies to provide actual written documentation, not simply say "here is our schema, that's all the documentation you need".
Having a data format that satisfies #1 could be useful, but is not a requirement. And I am opposed to pushing for a data format, XML or other, that satisfies #3.
Microsoft has announced that Office 2003 is going to support storing data in XML. So that should make Scott McNealy happy, right? Well, not exactly. The article "At Microsoft's Mercy" [4/23/03] by Kendall Grant Clark captures some of the feelings about Microsoft's use of XML, from conspiracy theories that it is all a publicity scam, to those who think XML is over-hyped in any case.
Microsoft has defined one schema for Word, called WordML, but is also allowing users to define their own schemas in certain versions of Office. Will this help data interchange? As the Register puts it [4/25/03], "In the future, you may be faced with two flavors of nonsense. XML Word documents that have been mangled by Microsoft's XML-creation tools, and XML Word documents that have been mangled by users who add their own non-standard entities."
To really allow complete exchange of data between Word and other word processors, Microsoft would need to support not WordML, but a standard XML schema, one that satisfies rule #3 above. Many people seem to think that storing data in XML would automatically satisfy #3, based on the misperception that XML defines one overall standard schema for all data, or that computers would be able to automatically interpret the semantics of any XML schema. Others feel that doing XML "correctly" requires using a standard schema. Neither of these are true, as Microsoft has pointed out, and it apparently has no intention of supporting a standard schema.
The article "Why Standards?" [5/18/03] by Jim Waldo points out that standards that codify existing practice are much better than those that attempt to define something from the ground up. The problem with standards bodies is that they are slow and they can get political. If Microsoft wants to include a new feature in Word and therefore in its WordML schema, what should it do if the standards body that is certifying it a) takes too long to approve it or b) refuses to allow it altogether? Keep in mind that one of the main goals of ODFI is to allow information to be retrieved from a data file long after the program that reads it is gone. The key to this is having the format documented, and it doesn't matter if the documentation comes from one company or from a standards body.
I'll also point out that Microsoft is not going to make XML the default way to store data in Office 2003; the old .doc format will still be used unless the user choose to save as XML. Microsoft has to do this; otherwise, when one user in an organization upgrades to Office 2003 and starts producing XML documents, everyone else will have to upgrade at the same time or be left unable to read them. In fact Microsoft got roasted for causing this type of disruption when it changed its binary format between Word 95 and Word 97. The only way XML can become the default is to allow several versions of Office to ship that can all read XML; then perhaps in Office 2008 XML can become the default way to save files.
That is not to say that Microsoft's support of WordML has no benefits. To begin with, XML is text-based, not binary, so it is less susceptible to corruption, and a minor typo can be fixed with any editor (the same is true of the existing standard RTF). Also, as this post on XML-DEV [4/18/03] by John Cowan points out, most users do not crack open data formats themselves, but they do want third-party utilities that can do so. While it may take a little while for third parties to support a new flavor of WordML that accompanies a new version of Office, it is easier and more reliable for a third party to change its code to support reading a new XML schema than it is for them to reverse-engineer a new binary data format.
Posted by Adam Barr at May 21, 2003 02:20 PM