For the past few years, Descript has offered an intuitive way to edit podcasts, letting you cut, paste, and delete from a written transcript to modify the corresponding audio.
Now, Descript is applying that same idea to multitrack video editing. Instead of having to fiddle with video timelines in tools like Final Cut Pro or Adobe Premiere, creators on YouTube or elsewhere can edit their videos just by rearranging the text of what they said.
Andrew Mason, Descript’s CEO, hopes the new software will make video editing more accessible, just as it’s already done for podcasts. “We hear from a lot of newcomers, that this has allowed them to become a creator when they wouldn’t have otherwise,” he says.
Descript has technically offered a video editor before, but only for basic single-track editing. The new version lets users manage multiple video files in a single project and adds new features including titles, image overlays, and transition effects. Mason acknowledges that Descript still isn’t a fit for full-blown film editing, but that wasn’t the goal. Instead, the startup is trying to offer something simpler and more convenient.
“We’re not Final Cut Pro,” he says. “We have a more limited set of features, but you have the stuff that most vloggers are going to need, and most business users are going to need.”
Mason is also looking beyond podcasters and vloggers with a new screen recording tool, which lets users select part of their screen to capture as they narrate with audio. He points out that a lot of tech companies use screen capture for things like pitch decks, bug reports, tech support, and internal documentation. Descript could provide companies with an easier way to edit those videos.
“In many ways, the first version of this release is built for startups like ours, where every employee is in some form communicating with video,” he says.
One feature that Descript isn’t yet bringing to the video side is Overdub, which lets users add new words or phrases to their podcasts with synthesized audio based on their speech patterns. That feature is essentially deepfake-proof, since users have to train the speech generator with phrases that Descript generates on the spot. Mason says Descript is experimenting with similar features for video, but getting them to work in a useful way that still prevents misuse is trickier.
“When we look at applications of generative AI, we tend to be very careful before we dive into them,” he says.
As for what’s next, Mason says Descript will be adding some “low-hanging fruit editor functionality” such as clip speed adjustments, color correction, and rotation. It’s also working on new AI features such as matching the ambient room noise of the recording, so creators can add natural-sounding pauses to their audio.
And while Descript is essentially offering two editing tools in one now, its pricing isn’t changing. The startup, which raised $15 million from Andreessen Horowitz and Redpoint last fall, still charges $15 per month for its basic “Creator” service and $30 for its “Pro” plan. (The latter adds extra features such as Overdub and the ability to auto-remove filler words such as “like” and “um.”) Newcomers can try Descript with three hours of transcription time and 20 screen recordings.
Mason says that outside the realm of music creation or film editing, rolling video and audio editing into one product just makes sense, especially when they’re built around the same concept of editing the transcript.
“While these tools have existed in separate categories historically—we have DAWs for audio editing and NLEs for video editing—we really view that as an accident of history,” he says. “With narrative media creation, whether it’s audio or video, you should be able to build one tool that’s equally adept at both.”