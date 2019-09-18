Now that the genie’s out of the bottle on deepfakes— the AI technique that generates fake video or audio of a person— Descript is putting it to use for benign purposes.

The podcast production startup has launched a private beta test for a feature called “Overdub,” which can use audio samples of a person’s voice to generate new words or phrases. Descript is looking for podcasters, YouTubers, audiobook creators, and other audio pros to help test the new feature, which is supposed to help save time and money on rerecording.

“The idea here is really to save people a trip back to the recording booth, which is such a pain if you’re doing any kind of recording,” says Andrew Mason, Descript’s CEO. “This just really opens it up for people to be able to make editorial corrections on the fly that generally sound really good and usable.”

Typing in audio

Mason, who cofounded Groupon more than a decade ago, created Descript in 2017 as a spinoff from his previous startup, an audio tour app called Detour. In the process of creating audio tours, Detour built its own tools that would let editors modify audio by editing a speech-to-text transcript. Delete a stray word or jumbled sentence from the transcript, for instance, and it will vanish from the audio recording as well. This turned out to be pretty useful for podcast editing, which is now the main application for Descript’s Windows and Mac software.

Overdub is supposed to address the biggest missing piece in Descript’s “word processor for audio” concept, letting users generate new words in addition to just deleting or shuffling existing ones. In a demo, Mason showed me how he could type into a voice actress’s existing transcript to synthesize new audio that matched her voice. When limited to a single word or a short phrase, it sounded just like the real thing.

“It will not only generate speech, but it’ll do it in a way where it’s trying to do a tonal connect-the-dots between the audio that came before and after,” Mason says.

Behind the Overdub feature is another startup called Lyrebird, which Descript is now acquiring for an undisclosed amount and billing as its AI research team. Until now, Lyrebird was letting people clone their own voice with a tool on its website. The process involved recording a series of random sentences so that Lyrebird could train its AI model, and it only took a few minutes. That tool will be shutting down as Lyrebird folds its audio synthesis features into Descript.