You music lovers out there probably think we’re living in a Golden Age. iTunes, Pandora, Rhapsody, music distribution and discovery couldn’t get any better, right? With the proliferation of music sites and apps, we must be at some sort of saturation point, after all, the telos of digital music technology.
But spend a bit of time talking to Brian Whitman, cofounder of The Echo Nest, and you realize that we’re really in a digital music Stone Age. Sure, we’ve come a long way, but there’s still plenty we can’t do–our recommendation engines are limited, as is our ability to sift information automatically from songs (to tell the sex of a singer just from his or her voice, for instance). The Echo Nest, a five-year-old company devoted to aggregating, indexing, using, and sharing vast troves of music data, just announced a collaboration with Columbia University’s LabROSA (Laboratory for the Recognition and Organization of Speech and Audio) on something called the Million Song Dataset, free to use for non-commercial music researchers.
The Echo Nest crawls the web in search of music and writing about music; it also partners with major labels like Universal and aggregators like 7Digital. It then devours data about the music, on both the “acoustic side”–tempo, key, etc. (Echo Nest’s system crunches that sort of data in about 10 seconds for a song)–and the “cultural side”–what reviewers are saying about the music for instance. It crawls the web, Google-style, ravenous for new musical information. If you tweet about the band you saw last night, “we have that in our databases within the hour,” says Whitman.
What are the uses of data on 30 million songs? Broadly, there are two categories: commercial and academic.
One of the first fun apps that came out using The Echo Nest’s data was inspired by the Saturday Night Live sketch in which Christopher Walken urges Blue Oyster Cult’s percussionist to go nuts on the cowbell.
Didn’t you think Jay-Z’s “Hard Knock Life” just was missing a little … something? Me too.
|Make your own at MoreCowbell.dj|
“It was a crazy thing I never would have imagined,” says a bemused Whitman. “This was my dissertation work, and people are now making joke apps from it.”
The other use category is academia, and that’s where the free-to-use Million Song Dataset comes in. Researchers in, say, physics, share the same reality, so they can replicate each other’s experiments and advance the science. But researchers in music information retrieval haven’t had the same reality to share, so to speak–they haven’t had a large shared data set. Until now. “This is me giving a gift to my graduate school doppelganger, 10 years younger than me,” says Whitman, a PhD graduate of MIT’s Media Lab.
The world of academic digital music research is one many of us haven’t considered. Whitman identifies a few major research problems. Though the human ear can easily separate the sound of a guitar from the sound of drums from the sound of a voice, computers can’t do that yet, making full transcription of songs a hugely labor-intensive task. A program that can listen to a song and transcribe each instrument’s role would be a major leap forward. Others are trying to devise a program that could identify the year or decade a song was made, just from listening to it (sifting things like production values, whether the song’s in mono or stereo, and so on). There are already programs that are very good at identifying the genre of a song.
We can’t know yet what the full fruits of Echo Nest’s datasets will be. “Surprise us,” says The Echo Nest’s site, in a challenge to researchers and developers everywhere. In the meantime, crank up the cowbell.
|Make your own at MoreCowbell.dj|