June 02, 2003

Why Open Data Format Laws Are Better Than Open Source Laws

With a variety of open source bills introduced, both in the United States and elsewhere, there has been a lot of discussion about open source laws. However open source laws have problems, both structurally and politically, and I think open data format laws would work much better.

(NOTE: I use the term "open source laws," although in fact some of the laws refer to "free software" instead.)

The reasons are as follows:

  1. Open source laws are too easy to argue against

    The three points mentioned most often in favor of open source laws are cost, security, and open data formats. In the lobbying against open source laws, I have never seen any negative comments about open data formats; the focus is on the cost and security arguments.

    When discussing cost, opponents of open source laws can point out (correctly) that the actual cost of the product is only one part of the total cost; Microsoft quotes a Gartner Group survey putting the number at 8%. Presumably they found the study with the lowest number, but the general fact is correct. Plus, the cost issue likely favors open source more on the server, where administration costs may be lower with open source software; on the client, where Windows is bundled with almost any computer anyway and support involves helping end users with unfamiliar software, open source won't come out looking as good.

    Now, you could argue that even the study that Microsoft is pushing shows that the total TCO of open source is only 92% of what it is for proprietary software. The problem is that this then leads to a long debate about how open source affects the other costs of software (installation, support, administration, etc) and no clear winner will emerge.

    Meanwhile, the security issue can easily get embroiled in a FUD battle between the two sides, each claiming that the other has more crashes and remote exploits, each waving studies that support their claims. If you want to convince a legislature to pass a law causing significant, possibly risky changes in government procurement, you can't get stuck in a battle like that. Keep in mind that properly designed secure file formats are not dependent on keeping the file format itself secret, so nobody should be able to argue that open data formats compromise security.

    When the debate can be framed in terms of cost and security, the issue of open data formats can be conveniently ignored by opponents. Requring only open data formats would remove the abililty of opponents to attack the cost and security arguments, leaving them to come up with arguments explaining why open data formats are bad, whch I have not seen so far. Finally, governments have presumably always considered cost as a factor when evaluating software purchases, and these days they no doubt consider security too; having a law that focussed only on open data formats would open their eyes to something new, that they have probably missed in the discussion of open source laws.

  2. Open source laws are either too inflexible, or require too much work

    Many open source laws seem designed to force a government to replace Windows/IIS/Office with Linux/Apache/*Office, but of course they aren't written that way; they discuss open source and proprietary software in general. This can take one of two tacks; either requiring open source or free software with no alternative, or making it difficult to buy proprietary software (for example requiring each purchase be accompanied by written justification).

    The first approach takes too simplistic a view of the type of software that governments use. Much of the it is customized for specific tasks such as processing drivers' licenses, and the market for providers of such software is presumably small. If software vendors release their software as open source, they may find that cash-strapped governments in other states gladly help themselves to it for free, so the vendor may get only one paying contract instead of fifty. Therefore, it's quite possible that governments won't be able to find companies willing to provide them with open source software, and then what alternative do they have?

    The second solution, requiring case-by-case justification of proprietary software purchases, is putting pressure on the wrong people. In a market with few providers, it is not unreasonable to assume that all companies will refuse to provide their software as open source. The fact that government employees will have to do extra paperwork to justify the purchase is of no concern to software companies as long as the same would be true if the purchase was made from their competitors. Thus the pressue on government employees caused by the need for written justification will likely not be transferred to the actual software vendors who would make the decision to go open source.

    In contrast, with open data formats, you are much more likely to find a software vendor who will open its data formats, since it does not impact their ability to sell to others and is an easy way for them to get a marketing advantage on their competitors. And if all companies bidding are willing to open their data formats, then so much the better. An open data format law could be written such that in the case where the software is not generally available, but is sold only to governments, the data formats themselves would only need to be made available to the governments that purchased the software.

  3. Open source laws likely require migration to new software

    Although proponents of these laws may dream that Microsoft is going to open up their source to avoid losing government business, in practice that won't happen. Thus, moving to open source will entail a risky and difficult migration, especially for desktop users. It is much more likely that companies will open up their data formats to hold on to government contracts. It's certainly a gamble, since companies may refuse to even open their data formats and thus force a migration also, but it's a gamble that is much more likely to succeed.

  4. Open source laws don't help me personally

    It would be nice for governments to save money on software, but unless you are someone who thinks that governments waste a large percentage of taxpayer money on long lunches and excessive highway landscaping, it won't make a huge difference in your life, and it's not clear open source software would be significantly cheaper in the long run. Similarly, it would be great if governments ran software that didn't allow my personal data to be hacked, but there's no guarantee that open source would be more secure than proprietary software.

    The software that the government buys has little effect on the software that I use every day. Whether the government uses Linux or Windows isn't going to change the operating system that I run. In contrast, open data format laws would have a dramatic effect on my life because (hopefully) the data formats that I use will now be made public. If the government uses Office and the Office data format is made public then suddenly the copy of Office that I use will be using a public data format.

    With open source software the data format can be determined by examining the source code, so if Microsoft opened the source to Office (which is highly unlikely) then I would, eventually, get some documentation on the Office data format. But this is because people would leap at the chance to document Office; as you consider programs of decreasing popularity, it is less likely that someone is going to bother examining its source code enough to produce complete documentation on its data formats. I would much rather have a authoritative, company-produced, human-readable specification of the data format than depend on the source code and the kindness of strangers.

    5. Open source laws are too difficult politically

    Open source laws are going to be disruptive to someone: Either Microsoft and other vendors are going to have to completely change their business model, or the government is going to have to do a whole lot of migration and/or justification. Thus an open source law is bound to get significant opposition from both camps, as we saw in the debate on Oregon's law.

    Open data format laws, on the other hand, don't necessarily require much disruption. Microsoft has the Office data format fully documented; it just needs to make the documentation public. Since the format has been reverse-engineered anyway, is this such a disruption? The company can continue to charge whatever it wants for software, under any license it wishes, and can keep complete control of the format, including changing it as often as it wants--as long as it documents the changes.

    Microsoft will undoubtedly still lobby against open data format laws, but its arguments will be weakened significantly. When deploying its usual counter-arguments about cost and security, it can argue at a higher level, claiming that the opponents' arguments are just wrong. It doesn't have to get into reasons why open source laws would be bad for Microsoft personally, which would come across as much whinier and self-serving. But I don't see how it can make a philosophical argument that open data formats are generally bad, so I would be interested to see what kind of excuses it can come up with as to why they shouldn't be required.

    In an ideal world, perhaps an open source law would work. But in the real world, we should focus on open data format laws as a battle that can be won.

    Posted by Adam Barr at June 2, 2003 09:58 PM

Comments

1 - say a wordprocessor developer decides to get rid of the concept entirely of files - does that mean they have to disclose how they store the information? ie: perhaps using their own disc format and writing sectors/tracks directly?

2 - where does the difference in data and program exist? ok, most of the time it is plain and clear, but what if the text of a wordprocessor wasn't stored as text, but rather a program that was interpretted to give the text layout? Also, is a BASIC program tokenised not data for the BASIC interpreter?

3 - Why should a programmer document how all their data is? They may not even know, it may be purely left to how the language that they used writes their data structures. Some of the data may be specifically not designed for 'users' to exchange? ie: indexes of a database.

4 - forcing people to disclose how their data is - is just not right. A programmer should have the right not to disclose how their program works.

Julian

Posted by: Julian at June 4, 2003 07:42 PM

Bear in mind, the following reflects how *I* would implement ODF, not necessarily how Adam or others see it...

1. What you mean is "suppose a developer decides not to use the OS's filesystem", and a database would be a better example, as some actually do do that. I would say that the ability to export to a known data format that maintains 100% functionality would be acceptable though (something I'm very ticked to find that Quicken can't even do to itself: the Mac and Windows file formats are different, the programs can't read both, and the export format doesn't include all the information you need to recreate the database).

2. The important fact is being able to get the data out to use in a competitive program. In this case, the text of the Basic program would be the data for the interpreter.

3. In that case, it's documented as a "struct foo { int xyzzy; };" or some such. On the one hand, being able to dump the sql out, as many databases can do would suffice for backups and porting, and would be acceptable on one level. One the other hand (picking on Quicken again, because of personal experience), for example, at one point I was forced to basically toss out my Quicken for DOS database and start over with Quicken for Windows, because it got too large (so they said) and corrupted itself. That is why the formats should be disclosed: I was screwed. They wouldn't do anything, and it would have been enough work to try to recover the data if I'd had the file format, it was out of the question without it.

4. Although there are some clever proprietary data structure techniques that could be exposed, most of "how a program works" is more than that. And the customer has the right to do what they want with their own data --- being locked into a proprietary technology is "just not right" either.

Posted by: Alan at June 4, 2003 08:19 PM

Ok, fine. A programmer shouldn't have to disclose the way he does something. But we don't have to use his program then either. I don't agree that _all_ data formats should be open, but if it is something the government is using, then it should be open. This way, the government can't tacitly force me to choose one company's software over another's.

It seems to me that julian's examples are somewhat contrived. I suppose we could all take a huge step backward 20 or so years and write directly to harddisk, rather than using the filesystem. But in that case, I doubt that the data will be being shared anyways.

Posted by: tim at June 4, 2003 09:01 PM

I'm all for making data generated by a program based on user input usable by more than just that program, but why must we always turn to lawyers to accomplish what we want?

Posted by: Tim Louden at June 4, 2003 09:05 PM

Think for a second about farmers that overproduce to make more money, bringing about inflation through economic choices that are the best for them, but not for the other members of his business.
To put it simply the actions in our own self-interest are not always in the general self-interest of the American people, thus laws become neccesarry to prevent economic misuse.

Posted by: Jonathan Nicol at June 4, 2003 09:29 PM

You're on the right track but MS will argue that they are moving completely to open standards; XML and a raft of other w3C formats.

Their implementation however makes this in practice no different from their previous proprietary formats because their formats are so integrated with their software that it's dificult for anyone else build software to use it.

Dig around on XML.com and you'll find a number of articles talking about this.

This is why the OASIS project to create an open XML document markup language is not supported by MS, but XML in general is.

The cost of software IS important. Despite the 92% figure, MS makes their money from the sale of software, not from support. And they are moving from a model of software sales to software leasing. Don't pay your bill every month and they cut you off and you can't open any of those Word files or Excel spreadsheets on your hard disk. Open formats won't help that situation at all.

Open Source and Free Software (as I'm sure Richard Stallman has already written to you about) is about the freedom to share and adapt tools to meet your needs. Back when I was living in the States I worked for a few years as a logger. When I bought an axe, the first thing I'd do when I got home was to modifiy both the handle and the head. I'd reshape the handle, put a hole in it so I could attatch it to a rope when I was using it up in a tree and then paint it safety orange. I would regrind the head so that it was better suited for use as a wedge rather than an axe etc. Open sourse ensures that I can do the same thing with software that I could do with my axe.

Large organizations need to tweak their tools to meet the specific needs of their organizations. Shrink-wrap software is like pounding a square peg into a organizations which have a bunch of holes which vary from department to department, from user to use. You want to add a feature to a package which could save a measurable amount of work in your organization? You can either beg MS to include it or adopt open source and add the feature. Then after it's added you toss the feature out into the community and others will take that feature, fix the bugs and add more features.

Second, if I meet another logger who admired how I modified the axe I could give or sell it to him, and then buy and modify a new one. I bought a used in laptop in a shop in Osaka seven years ago and was mortified to find that they had to wipe Windows off the box before they sold it. They couldn't even give me the disks with the MS distribution that was sold with the laptop. Because the laptop used a specially tweaked version of Windows, when I tried to load windows from a distribution I got from MS, half of the functions on the laptop didn't work. If the laptop company had used an Open Source OS this would not have happened.

Open Source is also very important for security -- governments outside of the States are worried that MS has included backdoors in their software that would let them, or foreign governments like the United States to compromise their security. Open Source ensures that you can check to make sure that software isn't doing anything it's not supposed to do before you use it.

But you know all of this -- the thing is, that open source laws for governments to adopt open source is about ensuring that government can do all of these things. Money and data exchange are only part of the problem. MS' only purpose is to make money, not good, or secure or useful software. Open Source is an approach that is designed to make software better, more secure and more useful whether it makes money or not.

From recent comments from MS, they believe that their data formats are in many ways more important than their applications. This is because they lock users into their products. If you have an open format which allows you to use that file anywhere then you can easily move to another platform. MS wants to make it as difficult as possible to switch.


You have a good idea, but it will be complex make it work. Open standards will have to ensure interoperability, portability, and full documentation of the format and API. And all of this has to be done in a way that doesn't kill innovation.

Open software laws won't ensure this, so it's a good thing to work for governments to adopt both. Good luck....

Posted by: Brad Collins at June 4, 2003 09:54 PM

What I see as a primary benefit of open data formats would be that it would keep the government from being tied down to a single software provider. If they decided that the royalties they were paying were too much, they could contract another company to write software that used their existing data. I think this basic idea should be the driving force behind any bill. For office programs, the data documentation would be simple, just itterate the file format. The answer for how to treat databases (or those who for some reason wrote to and from a disk directly) is similarly easy. As long as another company could step in and (relatively) easily write a program that reused the data, then you would meet the requirements. If, on the other hand, the new company could not easily follow the data specifications, then the original software provider should be held financially liable for any increases in cost due to not providing enough information about their data format. One wrench in this thought would be if the original company went out of buisness, but from the length and wordiness of bill's I've seen I imagine they could make adequate provisions.
My second thought would be how to treat the data from operating systems. I don't know enough about operating systems to offer a solution, but I do know enough to know that the data used by operating systems is highly complex. Other operating systems can already read Windows partitions, so is that enough? Personally, I don't think so. But at the same time it would be unrealistic to require cross-platform compatability for software the governments purchases. Furthermore, the situation is made more complex by the fact that software companies (usually) write their code for a specific platform. Since the major point (in my opinion) of the open data initiative would be to keep the governments from being tied to a specific company, it would be nonconstructive for their software purchases to tie them down to a certain operating system. In my humble opinion, the OS issue of open data formats is the most sticky, and is the one I am the most interested in seeing addresed in the future.

Posted by: Chris at June 4, 2003 09:57 PM

How do you tell if documentation for a format is adequate? The idea is to have enough information that someone can write code to read and interpret data in the format, but that doesn't make a good definition. The only way to know that it has been met is to have someone write the code, which is a pretty expensive test. Consider a format that has been reverse-engineered -- zero documentation turned out to be adequate, though it took much more work than it should have. At the other extreme, many people couldn't write code to parse the simplest format.

Where does the format end? If you take a proprietary binary format and wraps it in XML (e.g. as a huge CDATA element), it's in XML, which is an open format, but the data isn't in an open format. Similar stunts can be pulled using extensions. If the format contains macros, then you need specs for the macro interpreter. And if the spec tells you how to extract a bunch of numbers, but doesn't say what the numbers mean, is it adequate? In some cases it might be -- but can a judge or a bureaucrat tell?

Ho about if a format is the native format for some app and isn't used by anything else. There may not be a spec. The code and the format may have co-evolved haphazardly. And you can't trust a a FUDing weasel of a company to do much quality control on their specs. So, even though source is an extraordinarily inconvenient definition of a format, it's the only definition of the format you can be sure is adequate in these cases.

For me, the practical definition of an open format is that there must be a feature-complete implementation that either (A) is open source, or (B) was written using only the spec and other publicly available information. That's awfully close to requiring use of open source.

An aside -- be sure that your definition of a data format is broad enough to capture any means of interchange, not just bits on a disk. This explicitly includes both protocols and data formats used in networking. If only one vendor can implement a protocol, it's awfully difficult to migrate away piecemeal, so you still have vendor lock-in (though you aren't likely to lose data).

Posted by: Dan at June 4, 2003 10:03 PM

I'd like to take a different tack on this...

The data stored in a program belong to us, and the metadata for that data as well(text formatting, or which account a transaction was performed against in some accounting programs: they have a file per account, and don't document account numbers). Shouldn't a law stating that storing data and metadata in a format that's obfuscated is not done for any company bidding for public data?

In a government's case, it's obviously even more important... As the ownership of the data is not to any one person... And in many cases, the data is so important that impartial validation of the data is required... (Think about saving data about who gets a pardon, and who doesn't, without some form of data redundancy AND proper paper archive storage and you've obviously playing with someone's life and freedom)

Now obviously, such data doesn't get software upgrades all that often... Mostly because of the cost of mistakes... But vendor lock-in is obviously also a factor...

Data storage standards already exist in some cases, but aren't mandatory... Wouldn't setting mandatory open standards be a bit tricky without having standards in all those cases(before people think about XML, you gotta remember that an XML schema is trademarkable, copyrightable, etc...)

Or would saying that any data stored for governmental purposes has to be documented in a format whose full technical description is in the Public Domain be sufficient?

I am not a lawyer, so I'll conclude with a question:
to cover the angle of non-file storage, would the law be allowed to say that ALL data and metadata be exportable "at any time of the lifetime of the product plus ten years" to any other software product the government is licensed to own resist the legal test?
the idea is that the software company does not own any interest whatsoever in the data itself... yet the openness of the data guarantees the software cannot contain easter eggs, or export file bugs, or any other kind of "deniable intended feature" who would prevent the client and data owner, in this case, the government, to take its business elsewhere?

Posted by: perlchild at June 4, 2003 10:05 PM

I have another quick thought from the post Brad put up while I was writing mine. The term "data" can be applied many ways. Most simply, it is a file a program uses. Most complex, it could define what gets passed to and from different parts of a program. To continue my thoughts about OS's, they are designed to pass "data" back and forth from different programs and also from programs to devises. If the "data" in "open data format" is applied like I used it for OS's, then I think it could solve many problems. Basically, require any software to have additional features applied to them by anyone. There are already programs that allow users to write modules for them without even seeing the original source code. Thus, any new software provider wouldn't have to re-invent the wheel if the government wanted new features. This obviously would make the definition of "data" much more complex, but nothing in life worth doing is easy.

Posted by: Chris at June 4, 2003 10:17 PM

I'm not a lawyer or a programmer, just some troll who is trying to see the story.

From what I have read that ODF is a competing standard to proprietary formats.

Didnít TCP/IP win out over other transmission standards because it was open, free, and standardized? Microsoft made a try with itís own standard and eventually gave up. Novell NetWare is an also ranÖ

When it comes to SQL, we have something close to a standard, getting closer to an ODF format. Oracle, Microsoft, and IBM implement slightly different versions, compatibility gives many contractors employment.

CSS and HTML are another case of a varied format because no one has put a foot down. MS and Sun Java muddles the picture even more.

Iím all for the U.S. Federal Government for defining the standard in which they will do business. Open up the Data Format to play and those who care to compete, all others are free to make their own way. As always.

I suspect that if the U.S. Federal Government adopted a Standard (ie. Open) Data Format, that many private structures would adopt it as a matter of fact

Am I off the mark?

Posted by: John T. Leipold at June 4, 2003 10:23 PM

Some answers to questions raised:

General: I do *not* consider an XML format to automatically be "open"...there is an article on the site called "The Openness of XML formats" (see under "Recent Entries") where I discuss this.

Julian: I doubt any company is going to change to write raw data to disks. How can you back it up, attach files, copy it over the network, etc, etc?
As for forcing companies to disclose...you are not doing that, just making it a condition of getting government sales.

Alan: Good point about requiring export of data in a known format. Maybe that could be one of the preferred alternatives. Although it requires a "known format" and I don't want this to get into defining any new standards.

Chris: I almost did not put in anything about file formats, but I know that some companies do want to know those details (for things such as restoring wiped disks), and it seemed arbitrary to limit it to only application formats. Plus, one of the main goals is to allow recovery of data, so let's say you have a disk by itself with no OS around to read it--then recovery would require that the file format be documented also.

Dan: The free software bill proposed in Peru did define an "open data format" as one that has source code available (among other things). I agree some way to verify the validity of a spec would be nice, but I did not want to try to write it into a bill. Is there really a company out there writing software that reads/writes a file format and they have no internal documentation on the format? Ack.

John: ODFI is not meant to define any new formats--just get documentation on existing ones.

Thanks for all your comments.

- adam

Posted by: Adam Barr at June 4, 2003 11:06 PM

Isn't this data format stuff somewhat of a red herring in terms of migration? Say the government or whoever wants to switch to different software.. I can't speak for every custom developer, but at least for MS office, you can export things into ascii text for word, csv for excel.. sure you lose pretty formatting, but i can't imagine it would be enough to prevent you from switching if you really wanted to. 90% of users use 10% of the features of MS Office anyway, so those formats are probably good enough. I like many of my fellow Linux users once had reams of documents in Wordperfect, MS Word, and other formats... I've also switched from StarOffice to AbiWord on the linux side...

I think the real bugger about proprietary formats is when some bonehead DOESN'T export into ascii or whatever and assumes you're using whatever software they are... its totally annoying to get an email with a .doc attatched.. drives me crazy.
I honestly don't give a shit what format people put their data in, as long as they're curteous enough to put it into some exchange format I can understand. Though I find it difficult to try and legislate good manners...

Posted by: David Donovan at June 4, 2003 11:57 PM

It seems to me there are a number of major issues involved in generating any type of law. Specifically:

1) any law must not promote or prohibit the use of either "open source" or proprietary solutions.

2) laws and processes for governmental use do not necessarily have to be modelled after private sector use, further complicating the need to focus any law on distinguishing these attributes.

The private sector is a more difficult to maintain and will eventually work itself out - who really wants to pay a monthly/yearly fee for a product that has bugs and makes it impossible (not impossible - you merely have to break copyright code protection laws or pay an outrageous fee to be able to support their proprietary format that you have "chosen" not to support) to interoperate with other "non-supported" systems.

So, focusing on the slightly easier governmental application, there are a number of items that can be addressed on a data-type basis. Data can categorically be described as one of the following (not necessarily inclusive of ALL data):

-Public data not requiring protection
-Public data that must be protected
-non-public data not requiring protection
-non-pulic data that requires protection
-data that requires purpose-driven differentiation (such as leading to the always-popular "paper-trail" for voting, counting, and reporting results yet MUST protect the integrity of the individual's data without leaking personal information).

With the abuses hinted at by various organizations that are coming down the road at us about the relative near-term, it is the governments job not to tell private industry "how" to work, but to determine "what" the government is willing to accept for its use. Thus, while government should not tell company A that a subscription-based OS is not acceptable, it should choose not to use it.

With the coming availability of the various flavors of OS options available and the always increasing increase of use in most, the OS market will eventually work its way out of the currently looming possible disaster. The immediate problem is that the impact (i.e. not being able to use my files, rapid inability to effectively use the internet, etc) is not understood/recognized by most non-technical people.

Posted by: Jason at June 5, 2003 05:27 AM

I wonder if there is a closed format many governments use whose closed-ness is preventing work from getting done and causing problems that could be solved by a conversion program? What if the open source community identified such a format, wrote the conversion program for it, and showed governments how to use it?

If we did this and scored a big win before introducing such open format bills, we could keep bringing it up during the debate on the bill as an example of how much money/effort could be saved.

The surest way to convince people of the utility of open formats would be to show them how to solve a serious problem they are helpless to solve any other way.

Posted by: Chris Marshall at June 5, 2003 07:46 AM

Although there may be merit in such a bill, my feeling would be strongly against it, however...

My feeling is that the Governments should NEVER promote a proprietry format (ie: PDF files). It is the most irritating thing in the World to me to have to go to another computer to look at some file formats. I say PDF as an example because our government here in Australia always uses PDF files, but the neglect to acknowledge that PDF files only work on computer with PDF viewers.

My main system doesn't have a good PDF view, definately not compatible with the latest PDF formats, so I have to resort to my Mac or Windows machine to look at these which I just despise.

MS Word documents are just as bad, the Government actively promotes the use of MS Word.

but again, that is my opinion ;)

Posted by: Julian Cassin at June 5, 2003 08:07 AM


Adam:

"Julian: I doubt any company is going to change to write raw data to disks. How can you back it up, attach files, copy it over the network, etc, etc?
"

I have several programs that do just that. You are assuming perhaps a Windows or Mac system?

Posted by: Julian Cassin at June 5, 2003 08:11 AM

Dave:

You have hit the nail on the head!

"I think the real bugger about proprietary formats is when some bonehead DOESN'T export into ascii or whatever and assumes you're using whatever software they are... its totally annoying to get an email with a .doc attatched.. drives me crazy."

I can accept that companies can be losers and do this, but I think that it should be law that the Government does NOT do that.

Julian

Posted by: Julian Cassin at June 5, 2003 08:14 AM

What about serialization? Java and C# offer a feature that will automatically save an object without the programmer having to manually write each data member into its own pre-determined slot. It's a hell of a timesaver when coding. (Keep in mind that a file that's a serialized object is much easier to reverse-engineer then a traditional data file.)

Posted by: Andy R at June 5, 2003 09:02 AM

David: The problem with allowing programs to get by if they export to another format is 1) those formats are not as rich as the native format and 2) people don't use them by default, so restoring data at some point in the future (when the hardware/OS to run an application may not be around anymore) would still be impossible.

Jason: Perhaps I should take out the mention of open source as the #1 preferred alternative, and put something more generic like "software written in such a way that it is likely that third parties will produce documentation on its fiele formats" (that phrasing needs work). I agree government is a great way to a) encourage companies to change their ways b) focus attention on issues that private users may be unaware of.

Chris: Well, we don't want proprietary companies to say "see we don't need to doc our formats because those open source people are documenting them." Actually you remind me of a very important point, which is that reverse engineering itself may be or become illegal. If you go look at Bruce Perens' Sincere Choice site (which I discuss a bit elsewhere on the ODFI site) one of his goals is Open Standards (http://www.sincerechoice.org/Principles/Open_Standards.html), in which he states: "We support reverse-engineering for purposes of compatibility, and oppose legislation that would restrict it. Reverse-engineering is the only tool that competitors can use against a vendor who is not receptive to open standards." I agree 100% with this but I did not want to include it in the open data format bill, both to keep it simpler and more focussed, and to take away an argument from the opposition.

Julian: I think PDF would be considered generally "open", since the format is known, although it is a bit tricky since authoring in it requires proprietary tools.

If a program writes raw files, then that would be covered under the "on disk file format" provision in the bill, so they would have to document it.

Andy: Serialization is tricky because the documentation responsibility would be split between two people, the author of the code to document their data structures, then the author of the underlying language (Java, C#, etc) to document their standard algorithms for storing data structures. I will have to think about that.

Posted by: Adam Barr at June 5, 2003 09:25 AM

One *really important* thing you missed: the document standards must be *royalty-free*! M$ isn't the only company which will happily publish an "open standard" and then bury you in paper when you dare to use it without paying them a Million Bill-Bucks. ;)

Posted by: A. N. Mouse at June 5, 2003 05:28 PM

It's an interesting problem. If a standard already exists and is patented and royalties currently charged, you don't want to abscond with someone's intellectual property. But you also don't want companies running out and patenting their heretofore secret data formats the day before they release them, and then charging for them.

Perhaps requiring royalty-free licenses for the government, but only for purposes of reading the data format, not writing it.

- adam

P.S. Is your middle name "Nonny"?

Posted by: Adam Barr at June 5, 2003 11:10 PM

I disagree with you on one point, which in the greater idea is rather small.

It *does* matter which operating system a government uses if the operating system that the government uses does not permit the citizens of the country to communicate via electronic means with the government. In this, I must disagree, but I think constructively so.

The issue is open standard usage of proprietary vendors. If the proprietary vendors were to agree to open standards, then your point would be true. However, it is not the case now.

The government requires of developers for the DoD to have processes which allow software to be recreated; the government could also enact legislation which would require proprietary vendors to utilize open standards such that your statements are true.

If all governments did this, then the world would be a better place. :)

Posted by: Taran at June 5, 2003 11:34 PM

Adam:

I think that solving a problem on a lot of government employee's minds by using reverse engineering would build sympathy within governments toward keeping reverse engineering legal, don't you?

Actions speak louder than words in convincing people of a proposal's worth.

If companies tried to say,"We don't need to document our file formats because open source hackers are doing it for us" not only would they be conceeding that reverse engineering should remain legal, they would be admitting that open source is better. Let them say that please!

Chris Marshall

Posted by: Chris Marshall at June 6, 2003 08:42 AM

Taran: Do you have information on what rules the DoD has about allowing software to be recreated? There may be some useful language in there I could use in an open data format bill. In particular I have a part in there now that I don't like, saying that open source software is the first best alternative to having a documented data format. I don't like it because it sounds like the bill is pushing open source. So maybe I can change it to be some language about "allowing the data formats to be recreated"...and the DoD rules may have some good ideas on how to phrase that.

I think the rest of what you write is talking about two things, open network protocols and requiring everyone use standards. I think open network protocols are a great thing but I don't want to encumber an open data format bill with them. Plus I think a) there are fewer network protocols out there then there are data formats and b) most of them are standards anyway.

Requiring everyone to use standards I do not agree with. I think it is unfair to require companies to wait for standards bodies to support new things. It's different with network protocols; once you have TCP defined, you can send all kinds of new things over TCP. But with a file format, if you want to put new things in, you have to change the file format.

Chris: You may be right, but I don't want to take a position on open vs. closed source in the bill. Reverse engineering is a related, but separate, issue to open data formats. In fact I am not sure right now what its legal status is everwhere. I will try to look into that and post information on the site.

Posted by: Adam Barr at June 6, 2003 09:23 AM

Heh. Good thing I checked back. :)

The case is (at least during the late 90s it was) that DoD contractors had to conform to a Software Engineering Institute Level of 3 when selling software to the Department of Defense. In a way it was self defeating, because of the very nature of the Capability and Maturity Model (By making it a requirement, the DoD undermined the CMM in my opinion).

The underlying idea was good. The idea is that the software manufacturing process should be repeatable, which means that everything has to be documented - and available to do so. It's good software engineering practice. With FLOS, it's done automatically because of the nature of the beast.

Basically, if the DoD contractor goes bankrupt, all information to recreate the software sold to the DoD (whole or part of a parcel) would have to be available so that another company could do the same. If the senior engineers involved get hit by a beer truck, it shouldn't be the end of the project.

As far as the standards - my point on that is not that everyone must conform to standards (standards are sometimes created by not following standards!), but rather that the standards that are used should be open. Maybe I worded it wrong. Admittedly, I was tired when I commented.

If the government requires me to file a tax return, I should not be required to use specific software that either costs myself money, or does not run on my system. If the format is proprietary then one stands the risk of creating a monopoly for someone at the cost of the tax payer.

I hope that clarifies what I was trying to say. :)

Posted by: Taran at June 6, 2003 11:55 AM

Interesting. I assume part of the information a DoD contractor would have to put together would be the file formats used for any data?

I have thought about making it some sort of "escrow" situation where the companies would not have to release their formats publicly, but just put them somewhere that the government could get to them if the company went bankrupt. If the bill were *purely* about preservation of data in the future, that would work. But another benefits of open formats is it makes it easier/cheaper for the government to access data right now, because it, or another company, can write software to do so. And also there is idea of the government doing something that has spillover benefits to the public, if the public also gets access to the data formats.

I am wondering how to handle data formats that are currently licensed for a fee. I am worried about disallowing that because the law could be accused of trying to take away existing intellectual property rights. Plus the vendor of the software may not have the authority to grant a royalty-free license to a data format. But I don't want every company that has a proprietary data format to turn it into an open but expensive-to-license format. The Oregon bill talks about data formats that are "free for all to implement with no royalty or fee except for a fee or fees required by the standards organization for certification of compliance"...I need to figure out what to do about this. Suggestions welcome.

Posted by: Adam Barr at June 6, 2003 01:35 PM

Right. I'd love to give some specific examples, but it's probably better to make one up.

Let's say that a DoD contractor has a special data format that they made up to meet the requirements of the DoD given a specific widget. This data format is proprietary, though this information is shared amongst other DoD contractors who help design, build, test and use this widget.

Licensing agreements and NDAs between these companies involved with the widget exist on a 'need to know' basis, but the data format remains the property of the particular company that created it. This changes on a case by case basis.

With data formats where interfacing with the public, though, it seems that there shouldn't be a license fee required of the public, since this is a double fee (you pay the government, then you pay the company that owns the format). The danger in this argument is that the licensing company could very well charge the government the total of both, and so it costs the same. Shouldn't the government therefore purchase the format, and allow it's use?

These are two extremes, and there's plenty in the middle. Things like law enforcement should probably not have their data formats public, and yet they should be open to competitition for software - lest the government limit itself to less than it should.

The easy way out is to say "It will be looked at on a case by case basis", but that is certainly not infallible. Perhaps the problem is defining a proper process for evaluating each case?

Posted by: Taran at June 6, 2003 02:16 PM

Perhaps one thing is that this discussion relies heavily on licences.

All through my computer career, everything I have written is totally licence free, and therefore anyone can do as they wish with it, sell it under their own name even for all I care - even my "commercial" software. Although I can say that my business is not in the business of making money from software, it shows how little I try to protect that. Point is I am trying to make, is that licences are not always needed or required or even have anything to do with anything - in my view, whoever thought of the idea should be shot.

Regards, Julian

Posted by: Julian Cassin at June 7, 2003 09:07 AM

One problem is where the license is from a third party (such as a standards body), not the vendor of the software. So the vendor is not the one who can decide to grant a royalty-free license (or not to require a license).

- adam

Posted by: Adam Barr at June 7, 2003 12:32 PM

One answer is for the government's evaluation process for software purchasing to treat the cost of licensing the data format as part of the cost of the project: tell us how much we're going to have to pay you to do the project (this is all we're going to pay you) and how much we'd have to pay to make the data format available to everyone to use/implement/... freely (we're not going to pay you this today, but you have to enter into a binding contract to be in a position to deliver this to us, at any future date of our asking, for this price, plus no more than inflation). When we compare two bids for the contract, it's the total of these two numbers we'll be considering; but we'll only be paying the first.

Not a pretty idea, but one which might work. It'd make it prohibitive for the software provider to bid using a data format which the software provider isn't in a position to make public. Good.

The worry about robbing someone of their IP isn't worth any time; specifically, the point of open standards should be that the government and public should not be encumbered in their ability to use the data; *any* kind of IP control, whether trade secret or patent, should be a black mark against the data format, since it encumbers access to the data.

The case of standards with formal compliance certification is interesting. I suspect the correct solution is for the government to accept the cost of having the software tested for compliance, but for the contract to be void (i.e. the contractor doesn't get paid) if the software doesn't pass (within a modest number of iterations of the test-fix-retest cycle; and each iteration loses some slice of the payment the contractors would recieve).

The problem of deciding what's an open standard is indeed an intricate one. If a public domain or copyleft "reference implementation" is available and there are no legal restrictions on someone else producing their (copyleft, proprietary or public domain) own implementation, that should be good enough. Crucially, no matter how widely the standard is published, if it's encumbered by a patent or other restrictions on who may implement it, it's not open enough to let the citizenry write our own software to enable us to communicate with the government about the data.

Posted by: Edward Welbourne at December 12, 2003 12:51 PM