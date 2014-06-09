Software developers at NPR are currently tackling a huge problem. Its massive library of archival content is gathering dust and the system organizing it all needs an upgrade. The organization is weighing its options of building the software versus buying an off-the-shelf solution–looking at spending upwards of six figures to address the problem of taxonomy and cataloging the growing content the organization creates.

Because NPR creates lots of content in different formats, this is a pretty complex problem. Naturally, the solution won’t be simple either.

“We’ve got our radio folks who are producing stories and programs and providing their own set of metadata,” says Jonathan Epstein director of software development at NPR. “The library, in charge of their cataloging and archival of 40 plus years of our content, is doing their own effort. Then you’ve got bloggers who are tagging [web posts]. This is all sort of happening independently of each other and we really see a problem of wanting to connect the dots here.”

This problem isn’t unique to NPR. The recently leaked New York Times innovation report details how even an organization as well-staffed and forward-thinking as the Times can struggle with structured data. At the Times, the lack of metadata surrounding much of its archival content has led to major headaches.

In the digital world, tagging is a type of structured data — the information that allows things to be searched and sorted and made useful for analysis and innovation,” says Epstein. “Some of the most successful Internet companies, including Netflix, Facebook and Pandora, have so much structured data — by tagging dozens or even hundreds of different elements of every movie, song and article — that they have turned the science of surfacing the right piece of content at the right time into the core of thriving businesses.

Epstein says NPR is currently figuring out the best path to take, whether by building something internally or buying a ready-made solution. Ideally the library would own and manage this yet-to-be-discovered tool, which would connect all the meta data between different areas of the business.

“We don’t always know exactly what we want when we start,” says Epstein. “We know some details of what we want to build, but we really have to get our hands dirty with things. This is where research spikes come in and are key to this.”

During the organization’s most recent “serendipity days”–personal time every quarter dedicated toward projects of interest–two software engineers presented a project that touched on several of the metadata problems NPR is looking to address.