SoundHound Unveils Hound, A Next-Generation Audio Assistant A Decade In The Making

SoundHound harnessed its song-recognizing technology to create a voice recognition platform that understands complex question criteria.

You know SoundHound as a Shazam competitor. But SoundHound isn’t relying on song recognition for its future. Over the last decade, SoundHound has been tinkering in its lab, working on the next generation of natural language processing to build an audio assistant, Hound, that might make Siri look like an AIM chatbot.


Natural language processing is a fancy term for the computer science of figuring out what your sentences actually mean. When I say, “I ran into Jill and we talked about how Carly is surviving first grade,” your brain doesn’t imagine that Jill and I physically collided, nor that Carly’s life is actually in danger, and you probably assume that Carly is both a first grade student and related to Jill. These are contextual connections that it took your brain years to develop while analyzing conversational nuances. Likewise, SoundHound spent a decade building the raw code library and program architecture to make a truly context-savvy, intuitive voice interface.

The result: Hound can parse what you say even when you add complex criteria, exclusions, and negation to the question. For example, ask one of the voice recognition platforms you currently use to “show restaurants that aren’t Chinese food” and the platform will latch onto keywords and list Chinese restaurants. Hound understands that negation. Like Siri, it also understands previous question criteria, so asking “What is the population of Japan?” and then “What about China?” gives the population of China. Leave out context and add criteria, it doesn’t matter: I saw a demonstration of Hound handling the dizzying request “Give me a hotel room that’s more than $300 but less than $400, has Wi-Fi, has air conditioning, picks me up from the airport, and don’t show me rooms that don’t have air conditioning” –it gave a list of accurate results within a few seconds. Being able to get results from such organic phrasing is amazing. No longer will I have to consciously cut out extraneous words to make sure Siri doesn’t get my request wrong.

Traditional voice recognition platforms—e.g., Siri and Cortana—perform speech recognition first, parsing the audio into consonants and vowels that it translates into text words. That digitized text is sent to language analysis software. If the initial speech recognition misidentifies what’s said, it feeds that mistake forward into the language analysis process, says SoundHound CEO and founder Keyvan Mohajer. Hound, on the other hand, conducts both speech recognition and language analysis simultaneously, which speeds up results and increases accuracy.

Hound is a voice interface for consumers, but anticipating what developers might want to integrate Hound’s voice interface into their own products, SoundHound built the developer platform Houndify.

SoundHound isn’t terribly worried about possible Hound competition from other startups—after all, SoundHound spent years and years developing their nuanced tech, so a startup with six months of work in language processing can’t hope to compete, says Mohajer. SoundHound cut its teeth improving and speeding up the music-recognizing tech behind their flagship SoundHound product. In the years since SoundHound beta launched in 2007, Mohajer has been pushing his team to shave down the processing time, sometimes even by 1%, in order to improve the user experience.

That makes Hound as fast as Siri in the tests I saw, though SoundHound admits that not being accessible via a hard button on the iPhone like Siri is a real disadvantage. SoundHound hopes that getting developers to use Houndify and getting Hound’s voice recognition tech into cars or video games will help solidify Hound’s future.


Hound is out today on Android, with an iOS version on the way.