Trust but verify: Are you sure this document is real?

Continuing on the theme of a “leaked” document that was posted last week from a systems supplier in the community.  One thing that few asked initially regarding this document is: “Is it real?”  In this case, not 24 hours after the document was “released”, it was confirmed by the author that he had written the document and that it had been circulating for some time. However, it is amazing the stir that can be started by posting a PDF document anonymously on the Wikileaks website, regardless of its provenance.

Last week was the 40th anniversary of the “birth” of the internet, when two computers were first connected using a primitive router and transmitted the first message from two computers: “Lo”.  They were trying to send the command “Login”, but the systems crashed before the full message was sent. Later that evening, they were able to get the full message through and with that the internet – in a very nascent form was born.  During a radio interview that week, Dr. Leonard Kleinrock, Professor of Computer Science, UCLA, who was a one of the scientists that was working on those systems that night, spoke about the event.  During one of the questions, Dr. Klenirock was asked about the adoption of IP Version 6. His response was quite fascinating:

Dr. KLEINROCK: Yes. In fact, in those early days, the culture of the Internet was one of trust, openness, shared ideas. You know, I knew everybody on the Internet in those days and I trusted them all. And everybody behaved well, so we had a very easy, open access. We did not introduce any limitations nor did we introduce what we should have, which was the ability to do strong user authentication and strong file authentication. So I know that if you are communicating with me, it’s you, Ira Flatow, and not someone else. And if you send me a file, I receive the file you intended me to receive.

We should’ve installed that in the architecture in the early days. And the first thing we should’ve done with it is turn it off, because we needed this open, trusted, available, shared environment, which was the culture, the ethics of the early Internet. And then when we approach the late ‘80s and the early ‘90s and spam, and viruses, and pornography and eventually the identity theft and the fraud, and the botnets and the denial of service we see today, as that began to emerge, we should then slowly have turned on that authentication process, which is part of what your other caller referred to is this IPV6 is an attempt to bring on and patch on some of this authentication capability. But it’s very hard now that it’s not built deep into the architecture of the Internet.

The issue of provenance has been a critical gap in the structure of the internet from the very beginning.  At the outset, when the number of computers and people who were connected to the network was small, the issue of authentication and validation were significant barriers to a working system.  If you know and trust everyone in your neighborhood, locking your doors is an unnecessary hassle.  In a large city, where you don’t know all of your neighbors, locking your doors is a critical routine that becomes second nature.  In our digital environment, the community has gotten so large that locking doors, authenticating and passwords to ensure you are who you claim to be is essential to a functioning community.

Unfortunately, as Dr. Kleinrock notes, we are in a situation where we need to patch some of the authentication and provenance holes in our digital lives.  This brings me back to the document that was distributed last week via Wikileaks.

There is an important need, particularly in the legal and scientific communities that provenance be assured.  With digital documents, which are easily manipulated or created and distributed anonymously, confirming the author and source of a document can be.  Fortunately, in this case, the authorship can be and was confirmed easily and quickly enough.  However, in many situations this is not the case, particularly for forged or manipulated documents.  Even when denials are issued, there is no way to prove the negative to a doubtful audience.

The tools for creating extremely professional looking documents are ubiquitous.  Indeed, the same software that most publishers companies use to create formal published documents is available to almost anyone with a computer.  It would not be difficult to create one’s own “professional” documents and distribute them as real.  The internet is full of hoaxes of these sorts and they run the gamut from absurd, to humorous, to quite damaging.

There have been discussions about the need for better online provenance information for nearly two decades now. Some work on metadata provenance is gaining broader adoption including PREMIS, METS and DCMI, some significant work on standards remains regarding the authenticity of documents.  The US Government and the Government Printing Office has made progress with the GPO Seal of Authenticity and digital signature/public key technology in Acrobat v. 7.0 & 8.0.  In January, 2009, GPO digitally signed and certified PDF files of all versions of Congressional bills introduced during the 111th and 110th Congresses. Unfortunately, these types of authentication technologies have not been broadly adopted outside the government.  The importance of provenance metadata was also re-affirmed in a recent Arizona Supreme Court case.

Although it might not help in every case, knowing the source of a document is crucial in assessing its validity.  Until standards are broadly adopted and relied upon, a word of warning to the wise about content on the Internet: “Trust but verify.”