Skip to content
Login

Making Digital Durable

Posted by Paul Miller at 8:05pm on Monday, 21st November 2005

At the end of the middle pier is a lecture hall that seats about 300 people. In the lobby outside an audience waits patiently smiling and chatting, browsing a table that has become an impromptu book shop. Listening in on the conversations about IPOs and new web tools you soon realise that this is a technologically savvy crowd. There�s not a suit amongst them. You get the feeling these guys know their Perl from their Python.

When the audience file in to the theatre, the stage is black with a huge screen in the middle and a lectern placed just to the left. On the lectern sits a laptop, the familiar Apple logo glowing white from its lid.

Stewart Brand of the Long Now Foundation is up on stage. He says we are destroying information quicker than we ever have before because our world is going digital. Data produced even a decade ago is unreadable because it has either degraded or is incompatible with today's machines. So the question that tonight�s speaker is going to address is: how do you make digital durable?

Clay Shirky steps up on stage. Wearing jeans, a shirt and a discrete wireless microphone he begins by talking about a book he wrote in 1993 about online culture that he no longer has an electronic copy of. Somewhere along the line, both he and his publisher lost the electronic files that were used to print the book. Meanwhile Shirky's posts in Usenet flame wars from that same period are preserved intact to embarrass him apparently indefinitely. Why did one form of his writing survive and the other not?

Imagine if the US constitution had been drafted in the 1980s rather than the 1780s, using not paper and quill but an Apple II, says Shirky. How would you go about preserving it? Put the whole machine in preservative oil and checking that it worked, say, once a century by drying it off and booting it up to see if the file was still there? Well that might work - but the problem would come in a couple of hundred years when they came to start it up and found the power plug and thought what the hell is this?

The point is that there are a huge number of social systems that need to remain intact for digital data to be preserved. We'll never know which of these will survive, all we can do is reduce the risk of data being lost by making sure that many copies exist in many different formats.

He talks about the BBC�s misadventure in recreating the Domesday book on videodisc for the BBC Micro. Within just 10 years it was unreadable while the original Domesday book, written on good old fashioned paper is now nearly a thousand years old and going strong. Shirky describes the way we lose digital information as the "I thought you had the car keys" problem. Everybody assumes that somebody else has a copy of the file but when machines are upgraded or archives consolidated, things get deleted, never to be found again.

He then moves on to categorisation. He points out how our library classification systems, which have been around for just a couple of hundred of years have become dated very quickly. Look at the Dewey Decimal system for Religion:

Dewey, 200: Religion
210 Natural theology
220 Bible
230 Christian theology
240 Christian moral & devotional theology
250 Christian orders & local church
260 Christian social theology
270 Christian church history
280 Christian sects & denominations
290 Other religions

Or the Library of Congress classification of History:

D: History (general)

DA: Great Britain
DB: Austria
DC: France
DD: Germany
DE: Mediterranean
DF: Greece
DG: Italy
DH: Low Countries
DJ: Netherlands
DK: Former Soviet Union
DL: Scandinavia
DP: Iberian Peninsula
DQ: Switzerland
DR: Balkan Peninsula
DS: Asia
DT: Africa
DU: Oceania
DX: Gypsies

Neither is particularly reflective of the modern world.

Then Shirky takes us through a whistle stop tour of how Yahoo and Google tried to create new directories of the burgeoning world wide web by using library like categories but allowing crosslinks when there are conflicts within the systems. However both companies eventually opted for pure search as their primary tool. Top-down categorisation just didn�t seem to work on the internet.

Now, says Shirky a new system is emerging that does work. New web tools like Flickr for sharing photos and del.icio.us for sharing internet bookmarks have the ability to tag information. What tagging does is allow users to categorise information according to what it means to them. They don't have to follow the rigid classifications of libraries. When lots of people tag things interesting patterns and value begin to emerge. "The only group that can categorise everything is everybody," says Shirky.

Shirky proposes that watching tagging might be one way of noticing when important files disappear and then being able to retrieve them. When something dies in nature, it stinks so other creatures notice. "How do we make digital information stink when it disappears?", asks Shirky. At the moment it just vanishes without anybody noticing until it's too late.

In his closing remarks he hits upon a really interesting point. He says that digital rights (DRM) management is the enemy of longevity. He says he has no doubt that we will lose a great deal of culture that rights holders are putting copy protection on. Open access and open source data on the other hand have a much greater chance of surviving because they get round the "I thought you had the car keys" problem. Demos pamphlets might last longer than the New York Times.

The problem, admits Shirky, is that it's such early days for digital data. We don't know what will work. Preservation, after all, is an outcome not a process. All you can do is reduce the risk of things disappearing and we�re just beginning to learn how to do that.

For more about Clay Shirky see here.

Comments

(no comments at the moment)

LOGIN to add comments