The Trouble With Digitizing History

The Netherlands spent seven years and $202 million to digitize huge swaths of AV archives that most people will never see. Was it worth it?

The Trouble With Digitizing History
[Photo: Flickr user Jai Mansson]

Driving through the Dutch countryside near the town of Hilversum, I have an overwhelming feeling that the surrounding water will wash out the road, given that my car is almost level with it. So it’s surprising that the Netherlands’ main audiovisual archives at the Sound and Vision Institute reside in a multilevel underground structure here, ostensibly below sea level.


Sound and Vision, together with two other national institutions, finished digitizing the bulk of the Netherlands’ audiovisual archives last year, for a cost of $202 million over seven years. The project ran smoothly and transparently, digitizing 138,932 hours of film and video, 310,566 hours of audio, and 2,418,872 photos.

When Brewster Kahle, digital librarian of the Internet Archive, spoke at a conference here in March to commemorate the project’s end, he called it “world-class” and told the audience, “It is almost exactly what Google has spent on their whole Google Books project, digitizing 20 million books.” (Google says it has digitized over 25 million books but declined to confirm exact cost figures for this story.)

For all of Sound and Vision’s efforts, though, only 2.3% of its digitized archive is publicly available online. Schools and researchers are allowed to access 15% of the archive on Sound and Vision’s website. For the rest, Sound and Vision’s administrators have to ask the copyright holders’ permission to release their clips outside of the building. Frequently, it involves making calls to several people, and sometimes they say no. “Maybe digital formats are exploitable, but how much are old newscasts worth?” says Tom de Smet, Sound and Vision’s head archivist.

Among the world’s memory institutions, Sound and Vision’s seven-year mass digitization project was one of the largest of its kind at its start in 2007, but few can reap its benefits today. Even as critics pressure the United States Library of Congress to digitize its collections, the Library must also solicit complex licensing agreements with corporate rights holders to release the files online.

“Record labels would want to make sure that we weren’t putting up Louis Armstrong,” says Gene DeAnna, acting chief of the recorded sound section at the Library of Congress’s National Audio-Visual Conservation Center. “But maybe B-sides of Bing Crosby and things like that might be possible if they really had no plans to release them.”


And even if memory institutions clear copyright hurdles—after countless hours of digital transformations and metadata documentation—they still need to make the sound and video clips findable. Their own online channels stand at odds with other online media platforms in engaging the public. YouTube will inevitably reach more eyes than a little-publicized Library of Congress sub-webpage. If going digital comes with so many caveats, is digitizing a country’s entire audiovisual history ultimately more trouble than it’s worth?

One Tiny Country’s Digitized Archives

“It doesn’t make sense to digitize everything,” de Smet says in his office at Sound and Vision. “You have ask yourself, ‘Who are you doing this for?’” Researchers may be interested in a narrow set of media, while the public may prefer a skim of the archives.

“Honestly, only a little bit of the funding should go towards digitization and the rest, towards digital preservation,” says de Smet.

De Smet, a 36-year-old Belgian, is part of a new crop of archivists and librarians who aim to capture digital media’s impact on the culture as it’s unfolding. He believes that digitization on its own won’t bring memory institutions into modernity. Rather, giant leaps of innovation will come from refining the methods used for preserving the digital files.

Now, Sound and Vision is negotiating broad licensing agreements with Dutch broadcasting companies, among others. One possible scenario under consideration is that Sound and Vision would be able to release a television show online 25 years after its initial broadcast. That way, neither party would have to dispute each creative work, file by file, saving time and administrative resources.


Sound and Vision’s collection of television shows, newsreels, radio shows, music, and commercials dates back to 1898. Around 90% of the collections come from the Nederlandse Publieke Omroep, the national broadcasting agency. The other 10% comprises media of the institute’s choosing. It even houses all of the Dutch royal family’s audiovisual memories, but those will invariably stay private.

YouTube’s “Media Memory”

Any new technology that better preserves and increases public access to these audio and video materials should aim to fulfill the greater mission of any national audiovisual archive: to be the “media memory” of the country. But what happens when YouTube pretty much serves that purpose today?

“By the time the project got underway, YouTube became immensely popular,” de Smet says. It became clear to him that Sound and Vision’s website was not the best way to present the available clips to the public.

“Collecting everything in one place online, it’s a very linear way of thinking,” de Smet says as he opens up a few tabs in his browser. He shows me Wikipedia, telling me that Sound and Vision is one of the biggest contributors of video to Wikimedia Commons.

To keep on top of the media culture, which increasingly happens digitally and online, de Smet sees an archive’s role in society changing. Instead of centralizing the selection decisions in house, he and his team have struck up direct relationships with the creative community. Indeed, Sound and Vision has eliminated all of its curators and now trusts the community to curate its media memory. “Archives need to be at the start of the creative process,” de Smet says.


Sony Steps Up

Gene DeAnna, from the Library of Congress’s National Audio-Visual Conservation Center, launched the National Jukebox web app four years ago. For one year, throughout 2010, he and his team digitized 10,000 78 rpm music records, recorded between 1901 and 1925. All of those files are available online, thanks to a gratis license that Sony Music Entertainment gave to the Library of Congress.

“Sony has stepped up a lot. Others could do a lot more,” DeAnna tells me in a telephone interview. He hopes to augment the National Jukebox this year with more Sony recordings, but plans to make material from other record companies available through the app are much further off.

“I have no interest in competing with record labels that are interested in reissuing historical recordings. To me, that’s access,” says DeAnna. Still, he sees more value in educating the public through the material—both the recorded media and the liner notes–than in trying to make money from the work of bygone artists.

Only a fraction of the Library’s collections has been digitized. For perspective, the Library of Congress, the largest library in the world, acquires up to 100,000 audio artifacts per year but can only digitize around 15,000 recordings annually at capacity.

Professional Archives Vs. Consumer Platforms

The LOC reserves a lot of online real estate for the media that it can put on the Internet. Sites like American Memory, the Performing Arts Encyclopedia, and the American Folklife Center all reside under the larger Library of Congress property. The newly launched American Archive of Public Broadcasting, a collaboration with public radio broadcasters across the country, has a standalone site. Although the Library of Congress has a YouTube channel, it only uses it to disseminate video from its ongoings and conferences.


No matter which online content platform is used to host creative media, DeAnna stresses how important it is for the library to continue professionally archiving today’s digital recordings.

“We are the de facto national library, so there’s a sense of security there, and there’s a long history of quality, data integrity, and cataloging. More longevity, more commitment to posterity than having content on a website. No matter how well established that website is, they just don’t have capabilities that the Library of Congress does,” says DeAnna.

Amazingly, the Library doesn’t yet have a way to receive born-digital work as digital files. The record industry burns digital material onto recordable CDs, sends it to the copyright office for copyrighting, and the Library then rips the digital work from the CD-Rs to archive them on its servers. DeAnna says it is working with the copyright office, which is a department within the library, to implement a digital repository pilot program.

“It’s a challenge with so much being produced to identify things that aren’t copyrighted. Every year we find labels that haven’t deposited, and then we ask the copyright office to write a letter to get the copies for us,” DeAnna says.

A Global Effort

At the end of July, the Associated Press and British Movietone released 550,000 clips of historical footage to YouTube. More countries are undertaking or planning similar digitization projects as VHS tapes, audiocassettes, film, and other aging media carriers become unsalvageable in the next couple of decades. One can only hope that that work would also appear on the Internet.


The Library of Congress’s collections comprise more than 450 languages, representing most of the world’s cultures, so its preservation efforts won’t only affect America’s ability to remember its cultural past. With every innovation in communications—first the radio, then the television, and now the Internet–a country’s media memory has increasingly spilled outside of its borders. Digitization will only speed up this cultural spread.

“There is no one single place that can serve the world’s creative output. The more we can collaborate, nationally and internationally, the more successful we’re going to be,” says DeAnna.

As I leave Sound and Vision’s in-house museum with de Smet, I see a blown-up print of Gene Simmons on a floor-to-ceiling wall montage. I ask de Smet what Simmons and his band Kiss mean to the Dutch media memory.

“When you grow up with it at the top of the charts, who’s to say that it doesn’t constitute a part of our national collective memory?” de Smet says.


About the author

I write about science and technology in the global marketplace, with a bent towards women in STEM. My work has appeared elsewhere in Quartz, Fortune, and Science, among others. I'm based in Amsterdam. Follow me on Twitter @tinamirtha.