Will history survive the digital age?

But what happens when letters, diaries, official government records and newspapers no longer exist in this form – when they are not tangible? The primary sources available to historians and genealogists in the future may never have existed on paper – they will have been ‘born digital’. Photographs are an obvious example of this development. When was the last time that you printed out your holiday snaps? It’s much simpler to share them with friends and family, even people you don’t know, using a cloud-based storage service like Flickr or Instagram. Most likely of all, though, is that you’ll simply post your favourite images on Facebook.

Leaving aside questions of who ultimately owns personal data published in this way, social media platforms are not archives. They might store information for as long as it’s useful to them, either commercially or for reputational purposes. But will it be accessible in 10 or 20 years’ time? Will it be available to your children and grandchildren when they are asked to research an aspect of their family history for a school project? Will that once dominant social media company even exist?

The past couple of decades are littered with digital services that outgrew their usefulness or were overwhelmed by a competitor. To take just one example, Friends Reunited was launched in 2000 and quickly became a popular way of getting back in touch with old school pals, to share memories and photos and update them on the progress of your life. It was closed down on 26 February 2016 and, while the company offered its users the opportunity to download some of their own information, a huge amount of data died with it.

Over the past 25 years, the web has enabled more people than ever before to act as content creators, to write about their lives and what’s important to them, and to publish that information for other people to read. This is an unprecedentedly informative and varied source for historical research, but also an exceptionally vulnerable one if we don’t plan how to collect, preserve and provide access to it. This is the responsibility of national and local archives and libraries, and one that they are working very hard to address. But individuals can also play an important role by recognising the value of their digital footprint and taking steps to secure it. These days death can be digital as well as physical, and most of us don’t give a second thought to what happens to our digital stuff when we’re no longer here.

These are not, of course, entirely new problems. The box of letters in the attic was almost certainly not stored away with a view to posterity but because its contents had immediate personal significance. Nobody was thinking about the interest they might hold decades later. But while a 13th-century manuscript might become illegible over the centuries, it’s still accessible. You might need specialist knowledge to read it, but the document itself remains familiar and susceptible to study. It’s a recognisable object, in the same way that a 16th-century printed book or a 19th-century newspaper is recognisable, and we know how to handle it and how to extract information from it. This is not the case for born-digital material, which may be locked away in a private user account or reliant on the survival of a particular technology to be accessed.

I have a collection of floppy disks in my office dating back to the late 1990s, which I haven’t been able to bring myself to throw away. But I also no longer have a computer with a floppy disk-drive which would allow me to read them. In less than two decades they have become impossible to use without specialist, or in this case, old technology. And because I’m not an archivist, my labelling of the disks left a lot to be desired. “Backup 1” (undated) is not the most useful description I could have come up with.

Personal digital data is clearly at risk, but the problem for historians is wider than that. Sources of all kinds are being reimagined in digital form and we will have to find new ways of working with them. Research undertaken by the National Archives showed that, in 2016, 12 government departments would begin to transfer born-digital records for the first time; by 2021 this number is predicted to grow to 50. Paper records will continue to be deposited for many years to come, but increasingly the history of government will be a digital one. This means vast quantities of emails, which are unlikely to have been organised and weeded as paper files would once have been.

This is a challenge of scale – billions of digital conversations – but also one of mess and complexity. A single email thread might involve 10 people in a department. Each response to the first message will contain some new information, but also duplicate everything that has already been said. Nested within the main narrative will be multiple email signatures, disclaimers about data ownership and requests to take care of the environment and think before you print. There may be attachments, of various types, or perhaps a warning that potentially suspect content has been scanned and removed.

All of this provides a fascinating insight into the culture of a particular organisation, into interpersonal relationships and office hierarchies, but it also creates a significant problem for storage, access and interpretation.

Another fascinating but challenging aspect of working with email as a historical source is the fact that it’s a random mixture of the personal and the official. Who hasn’t used a business account to organise a birthday lunch or to arrange a visit from a plumber (as well as, of course, circulate the minutes of a meeting)? How much of this should we keep, and is it possible to separate out the different types of correspondence?

Historians have always taken a keen interest in the internal workings of states, institutions and businesses. But we also now have to consider how such bodies present themselves on the web and via social media: their digital public face. All organisations – from the smallest local history society to the largest international corporation – have an online existence. This might consist of a couple of pages of information on a website or it could be a full and detailed record of activity updated over many years. A website or social media account might be one of the main ways in which a company communicates with its customers or a local council consults with the people who voted for its members. This information will be essential for anyone interested in the history of the late 20th and 21st century. But how do we preserve it and make it accessible for researchers?

And this is not a challenge that can be put off or added to a pile marked ‘too difficult to deal with for now’, because the information is disappearing every day. There has been some debate about the average lifespan of a web page, and any figure is bound to change as the web itself evolves. But there is consensus about one thing – it is not very long at all. The best estimates vary between 44 and 100 days, which provides the narrowest of windows for archiving. Certain kinds of social media are even more ephemeral, or indeed have ephemerality built in to their DNA.

Our understanding of history has always been shaped by what is lost and what survives by chance. It’s always been the historian’s lot to tentatively attempt to build a picture of the past from the fragmentary evidence left to them. But, for all that, it’s hard to escape the feeling that digital data presents a new, and daunting, challenge. We know that a piece of parchment kept dry and away from rats and mice will last for centuries, but we don’t know how successful our efforts at digital preservation will be because we have nothing against which to judge them. This simply hasn’t been done before, and we are still feeling our way.

This sense of malaise hasn’t been lost on the media. Ominous reports of a ‘digital dark age’ have appeared in the press with increasing regularity in recent years. The future of history, so the story goes, is in jeopardy – and nobody is doing anything about it. This makes for compelling copy but, in fact, nothing could be further from the truth. National archives, charitable organisations and international consortia are investing a huge amount of time, effort and money on safeguarding our digital data and preserving our digital histories.

Some of this activity is already visibly shaping how we research the recent past. One of the most high-profile organisations involved in digital preservation is the Internet Archive (IA), a non-profit ‘Internet library’ that operates from a converted Christian Science church in San Francisco, California. The IA collects and makes freely available a range of historical sources, including broadcast television news, digitised books and computer games. But it is best known for its Wayback Machine, which provides access to an archive of more than 279 billion web pages. It is very easy to focus on what we no longer have – the IA only began to ‘harvest’ web pages in late 1996, so the first seven years of the web have not been preserved there – but we should be celebrating the fact that we have managed to keep two decades’ worth of the historical web.

While the IA archives the web as a whole, many libraries and archives are actively involved in preserving digital cultural heritage at a national level. The British Library, for example, has been archiving selected UK websites since 2004 (searchable in the open UK Web Archive) and, following the extension of legal deposit legislation to include non-print material from 2013, it now undertakes an annual harvest of all websites that fall within the .uk domain.

The National Archives has a statutory obligation to collect and preserve the public record, and this includes government’s online presence. The UK Government Web Archive is free to use, fully searchable, and is already an essential resource for anyone studying British political and administrative history since the mid-1990s. It is complemented by Twitter and video archives, which contain some surprising ‘official’ information, including the Twitter accounts of the 2012 Olympic and Paralympic Games mascots, Wenlock and Mandeville. Twitter records belonging to the attorney general and cabinet offices have also been preserved for future historians.

Libraries across Europe may only be preserving their own national web space – the legislative frameworks that govern this activity are often quite restrictive – but they are certainly not working in isolation. Practitioners and researchers come together under umbrella bodies like the International Internet Preservation Consortium to share knowledge and to build new tools and develop new methods. The scale of the task has resulted in an extraordinary degree of openness and collaboration.

Archiving the web, which encompasses everything from fanzines to Hansard’s transcripts of parliamentary debates, is just one kind of digital preservation. But solutions are also being developed for other kinds of data. Some of the most interesting work is happening in the field of personal digital archiving, which presents a rather different kind of challenge. In 2011, the British Library acquired the personal archive of the poet Wendy Cope. It is partially digital – containing 40,000 emails and numerous Word documents – but is also substantially analogue: boxes of papers and account books. It is a hybrid, as the majority of personal archives are likely to be. Preserving and presenting this material as a coherent archival whole is far from straightforward.

Acquisitions of this kind will become more and more common and the variety of their digital content will only increase. The author Will Self’s personal archive, acquired by the British Library in December 2016, includes not just emails (around 100,000) but also his computer hard drive. Few of us are going to find a home for our personal digital data in a national library, but the innovative approaches that are now being developed will come to influence how we deal with such data ourselves, and how we make it available to other people – assuming we make it available at all, of course.

Choice is the key word here. For most of human history, the prospect of leaving behind a personal trace in the archival record has applied only to elite groups in society. If the work of digital preservation is successful – and the early signs are very promising indeed – this is a possibility for many more people than ever before. Debates in Europe about the ‘right to be forgotten’ indicate that not everyone is happy to have a digital afterlife, and certainly not one over which they have no control. But we can at least begin to think about what we might like to leave behind, and some of the technologies to help us plan how to do this are already being developed.

History is certainly not facing a crisis, but its future is an increasingly digital one, and this will require continuing work and planning. The primary sources available to us may look different to the ones we are used to, and they will certainly present new problems for researchers. But they will be enormously rich and diverse. They will contain many different voices, perhaps including yours.

Jane Winters is professor of digital humanities at the School of Advanced Study, University of London.

This article was first published in the March 2017 issue of BBC History Magazine

Sign up for the weekly HistoryExtra newsletter