“A tidal wave of digital material will overwhelm recordkeeping institutions”: the future of history

Matt Elton: How do the kinds of records created now and in the recent past differ from those left in previous centuries?

Jane Winters: Records and sources now come in a whole range of formats. In some cases, they are very familiar analogue sources that have become digital, whereas in others they are in a completely new format. Newspapers offer a great example of how this has changed: we used to read them only in physical form. Then those physical newspapers were digitised, keeping essentially the same structure.

We now have both physical and digital newspapers, which differ in both form and content. An issue of The Guardian published in print today will not contain the same information as the online version: there will also be multimedia material and lots of online comments and conversations, whereas in the past you would have had only letters to the editor, for instance.

And to take the example of people’s reflective writing, that’s a type of source that has also moved into new formats. A diary is not exactly the same as a blog entry or a post on social media, for instance.

John Wills: One example that offers an interesting way of thinking about some of these issues is the ‘Barbenheimer’ phenomenon – the point in 2023 when the movies Barbie and Oppenheimer were released at the same time. That’s a very recent instance of the kind of cultural moment that’s really interesting to research – but doing so is not easy, because you’re dealing not just with the films themselves but also the digital content produced by the films’ fandoms, the actors’ response to that content, and the wider public responses to those two movies, in a way that we’ve never been able to do before.

A movie poster shows characters from the films Oppenheimer and Barbie, creating a layered montage to promote "Barbenheimer". That word is written in pink across the bottom — A poster by graphic designer Sean Longmore celebrating the simultaneous release of Barbie and Oppenheimer in 2023. Today’s wealth of online content is a boon and a challenge for historians (Image by Alamy)

Jane Winters: An important characteristic of a digital archive that doesn’t have an analogue or print equivalent is the mixture of formats it contains. It’s not just text, images and moving images, but also all the other novel elements and forms of communication – memes, gifs, emojis – and they’re not all always available in an easy-to-search format. The Wayback Machine (web.archive.org) is one of the major online archives, and it contains a huge amount of this kind of material – but in a very unstructured form.

Focusing on the new kinds of sources that emerged in the late 20th and 21st centuries, are there particular examples that reveal new things about society?

John Wills: One of the areas I explore is the history of video games, which we still tend to see as a recent phenomenon despite it dating back at least 50 years. It’s a medium in which the huge volume of material can be both a great opportunity and faintly overwhelming.

There are a few ways in which video games can be useful historical sources. First, a game set in the past can reflect cultural views about that period of history at the time the game was made.

Read more | Artificial intelligence: a modern advance or an ancient nightmare?

Second, they can also provide ways for the public to get into history: the Red Dead Redemption game series [set on the US frontier in the late 19th and early 20th century], for example, has proved to be an entry point to ideas about and narratives of the American West for many people in the same way that TV or film can.

And finally, you can sometimes go into the online world of a game itself and talk to its players about their memories and experiences of that game. That’s so different from how we would have worked on the history of memory in previous decades! These kinds of unique opportunities are why we’ve started to build video games into our understanding of history over the past few decades.

Jane Winters: Another recent source of data that’s particularly exciting for historians is social media, because the voices and stories of people who would never have previously ended up in archives – unless they encountered the state at some point –are now doing so. It’s easy for anybody with access to the internet to create all kinds of content, and there are chances that at least some of it will end up in a national archive. That brings its own problems, because people aren’t always considering this when posting an angry Instagram comment! But that really rich conversation is something we haven’t had before, and has huge potential. There are substantial social media archives already, and they are going to grow.

People aren’t always necessarily considering the historical record when they post an angry Instagram comment

This presents a few challenges. One is an issue of preservation and storage: how much do we want to keep, particularly given the environmental impact of digital archiving? We often think of digital records as existing in the ‘cloud’, but that’s in many ways a very unhelpful term, because of course they exist in physical infrastructure.

Another key issue is that this is often not a truly global record: there is inequity in terms of who has access to the internet, and therefore who is able to have their voices recorded in this way. As a concrete example, an exercise led by the International Internet Preservation Consortium collected the websites and other digital media produced in response to the Covid pandemic. It’s a pretty diverse collection, but there’s almost nothing from Africa and very little from China. So it skews in its story about that pandemic – which was, of course, a global phenomenon.

A black and white photograph shows four men huddled around a newspaper — Men read a newspaper during the UK general strike, 1926. Historian Jane Winters highlights print media as a particularly revealing example of how online records have changed across recent decades (Image by Getty Images)

Despite these limitations, is the depth and extent of this inclusion in the historical record unprecedented?

Jane Winters: I think so, and certainly in the sense that people are now creating the material themselves rather than being described by someone else, such as a state body. That ability to tell your own story, and the fact that different kinds of people will routinely end up in archives, is exciting.

John Wills: I agree – particularly about the issues of inclusivity and diversity. Historians have always been making those kinds of choices about how to process that historical record, so in some ways that’s nothing new.

What potential issues face historians using records created with, or stored on, more recent forms of technology?

Jane Winters: A key issue is that digital records are often embedded in systems and structures, and we need to navigate those in order to replay them. Indeed, it’s very striking that the word ‘replay’ is used a lot in relation to web archives – it’s less about accessing material, and more about replaying content. I think this is only going to increase as we move as a culture towards using more and more multimedia platforms, so we need to collect the information that lets us understand the structures and systems used to store this content.

One really useful way of framing this is the idea of the ‘material culture’ of these records. We need to preserve the original form of these digital records so we understand how to work with them and how to reuse them. Memory institutions – places such as the British Library and the National Archives – are making sure they have rooms with equipment and versions of particular computers to replay floppy disks, for example. Things like digital forensics come into play, too, to make sure that you are not altering a file when you open it.

Read more | "Museums are magical places": Kavita Puri on a new initiative encouraging close encounters with history

John Wills: Video game consoles, for example, have very short lifespans. Even one that sells well might be publicly available for only 10 years, and will then degrade. An emulator can show what a game was like, and you can watch people playing it on YouTube, but that will give you only one view of that game world – what one player did, rather than a wider sense of what the game’s about.

Institutions have been set up to preserve video games and their history – for example, Sheffield’s National Videogame Museum. But this is a vast history – one console might play a few thousand games – and there are questions about who’s responsible for preserving the technology, and how.

A soldier wearing gold and red armour and mask and a red and black headdress holds a sword up, fighting an unseen opponent — A scene from 2023 video game Assassin’s Creed Nexus VR. Such electronic forms of entertainment can prove to be useful historical sources, argues John Wills (Image by Games Press)

So for these sources to be as useful as possible, we need to preserve and understand the context in which they were made and used?

Jane Winters: Yes – and there’s a concern that context disappears very easily from digital records. ‘Context collapse’ is something that gets talked about in relation to social media – the idea that it doesn’t take long before something is completely decontextualised and reused. It can become hard to trace where a digital record started, particularly if a crucial piece of information has gone – for example, if the original account was closed, it might exist in a very different context from that in which it was created. Because so much of this is happening in commercial platforms, there isn’t the attention to public documentation that you would get in a library or an archive.

Talking about commercial platforms, there’s been a recent phenomenon of TV and film companies deleting material from their archives – or, in some cases, not releasing films at all. Does this point to any specifically 21st-century issues?

John Wills: This is something you could call ‘digital demolition’, because it can be very conscious and deliberate. It’s sometimes the result of restricting access to archive material in order to create a paywall and charge for access. The online archive of MTV News, which is an incredible resource for people trying to understand the entertainment of the 1980s and 90s, was removed from the internet last year, for example.

But it’s important to remember that this isn’t new, and that TV and film records have often been lost. We’ve got very few of the silent movies of the early 20th century, and Charlie Chaplin destroyed one of his own movies [as a tax write-off], for instance. One of the dangers is that we often don’t see TV or film productions as useful or valuable.

A classic example is the tranche of Doctor Who episodes that went missing in the late 1960s and early 70s.

Jane Winters: We also have a tendency to think about online platforms and content-creation companies as archives. They’re not, and they don’t have an archival remit or responsibility – they think of that data in very different ways from historians. In 2019 [social media site] Myspace lost years of photos, video and music because of a faulty server migration, for example.

Social network Myspace hit headlines in 2019 for losing 12 years’ worth of user content. Such platforms should not be seen as archives, says Jane Winters (Image by Alamy)

Another issue is the vast volume of records. How will experts make sense of all of the data being created and stored?

Jane Winters: The National Archives described this volume as a “tidal wave of digital material… which in its scale, diversity of format, and complexity of structure will overwhelm many existing recordkeeping institutions and practices”. And the scale is mind-boggling: The Wayback Machine has more than 946 billion web pages, largely accessed through keyword searching. You can find what you’re looking for, but you may have no sense of how vital or representative it is because there’s so much material.

We will have to develop strategies to find out what’s in some of these digital archives before we can start to dig into them and do the kind of research we want to. We talk about ‘web archives’, but they’re not archives in a traditional sense, and that creates interesting challenges for researchers, both now and in the future.

Read more | Eastern promises: the long and troubled history of Britain's relations with China

Can we identify a moment when this avalanche of records began?

Jane Winters: It’s not been the same in different sectors: the British Library was collecting millions of websites as early as 2013, so that’s already a huge collection. But I think the anticipated digital deluge of government records hasn’t really happened yet, though it is coming.

John Wills: It does depend on the area. If you’re working on events in the 1980s or 90s, which saw the advent of 24-hour news coverage, the volume of visual culture can sometimes seem overwhelming. And that information may be supplied in an unreliable way. One of the big new challenges facing historians and other experts is the way that platforms and algorithms work, because they can potentially mislead us as scholars by pushing the most common citation or the view that’s been promoted hardest.

The advent of 24-hour news coverage means that the volume of visual culture can sometimes seem overwhelming

We’re not necessarily getting an open, full picture of the past – we are getting a version shaped by artificial intelligence (AI) and other forms of machine learning. The data has already gone through a level of selection and adaptation – and that’s a real issue for us.

Is AI going to become a growing problem for historians?

Jane Winters: It will be a huge issue. That’s not to say there isn’t already misinformation in archives – we need only think about fraudulent medieval charters, for instance! But the scale is different, as is the entanglement between trustworthy information and misinformation. It’s not the role of archives to say if material can be trusted, but as historians we need enough information to make judgments for ourselves. We have the skills – we spend our time assessing information – but we need to have enough contextual material to judge its validity.

Finally, considering all these challenges, how might historians in the future make sense of the late 20th and early 21st centuries?

John Wills: It’s interesting to think about the image we have of future historians. An episode of recent Star Trek series Discovery features an ‘eternal gallery and archive’: a beautiful repository of items and books from around the universe, all presented uncritically as completely reliable information. It’s an example of the idea that we’ll have access to sources in a utopian way, whereas the reality looks like being one in which, as Jane says, things are complex and entangled, and historians have to negotiate increasingly problematic routes to access material.

Jane Winters: Historians thrive on challenges: having to put together evidence and perhaps take it in a different direction because you found a new piece of information. We’ve only ever been able to tell partial stories; the challenge for us now is to think about how partial those histories might be. That’s why discussions like this are so important – to think about how we can ensure that as much of the material as possible is preserved, where the conversations around that need to happen, which organisations should be involved, and really work together.

So many communities of practice are developing around this, and I have confidence that the work happening in our national libraries, archives and other places will ensure that historians of the future will have enough material to work with.

Jane Winters is professor of digital humanities at the School of Advanced Study at the University of London

John Wills is professor of American media and culture at the University of Kent

This article was first published in the October 2025 issue of BBC History Magazine