As a journalist, I’m always looking for a better way to record and go over my interviews without spending hours on transcription. Lately I’ve been leaning toward using Otter.
The free app, which launched on iOS, Android, and the web in February, records audio and converts speech to text on the fly using voice recognition algorithms. It also synchronizes the audio with the text during playback, so you can tap on any word to hear exactly what was being said at the time. While Otter’s algorithms don’t produce perfect transcriptions, but it’s accurate enough to help you pick out which passages deserve more time for manual cleanup.
On Wednesday, AISense, the startup behind Otter, launched a premium version of the service. Free use is now capped at 10 hours of recording per month; users can pay $10 per month (or $80 per year, or $3 per month for students) to raise the recording limit to 100 hours and add some advanced exporting options, including audio exporting. Otter is also releasing a way to record and transcribe phone calls on Android devices–both for free and paid users–and adding a tool to embed images within the transcription.
I’m still torn between trusting Otter and sticking with my longstanding method of taking handwritten notes alongside synchronized audio (currently with an iPad, Apple Pencil, and Notability). But with a few improvements, Otter could offer the best of both worlds, and the subscription pricing would be well worth the time saved.
How Otter Works
When you start using Otter, you’re prompted to record a sample of your voice, which is supposed to help the app distinguish between you and other speakers. You can also import contacts so that Otter can share transcripts with other collaborators.
While some speech-to-text services require you to upload prerecorded audio, Otter’s app does the recording itself and begins transcribing right away, so words show up on the screen as they’re being spoken. (The text becomes temporarily unavailable after the recording is over, as Otter indexes the transcript, processes keywords and speaker details, and compresses the audio.) The app can even distinguish between speakers; if you tag a line of dialog with a person’s name and hit a “rematch speakers” button, Otter does a pretty good job of applying it to the rest of the transcript.
As you might expect, Otter’s AI-driven transcriptions are far from flawless. Misinterpretations are common, and the app has a strange tendency to chop one stretch of dialog into multiple lines, sometimes even mid-sentence. Otter does allow you to clean up the dialog yourself, and can slow down the audio to assist with this, but my sense is that if you’re aiming to transcribe an entire conversation with perfect accuracy, Otter won’t be much better than manual transcription software. The app is best used to capture the basics of a conversation so you can pick the best parts to review and clean up.
What I’d like to see most from Otter, then, is a better way to mark up those conversations in real time. During an interview, I’m often writing follow-up questions, summarizing the current topic, or noting important responses. While Otter offers a search bar and tries to tag some keywords automatically, being able to add my own annotations would make sorting through the wall of text even easier.
For now, I’m considering a hybrid approach. Because Otter and Notability can both record audio at the same time on my iPad, I have Otter produce a written transcript while taking notes in a small Notability window on top. (Otter currently doesn’t support Split View multitasking, which would allow the two apps to run side-by-side.) It’d be better, though, to have all my notes and audio in one place.
The Dropbox Of Voice Recording
AISense is a venture-backed startup–it raised a $10 million Series A round in November–so I’m somewhat concerned about its long-term viability as a consumer-facing business. In addition to the stand-alone app, the company has a licensing deal with web conferencing company Zoom for transcribing video calls, and is pursuing some other deals in the enterprise and educational arenas. It’s reasonable to wonder if Otter might eventually pivot to enterprise use, sell itself to an enterprise company, or focus entirely on licensing the tech.
But Sam Liang, AISense’s founder and CEO, and Seamus McAteer, the company’s general manager of revenue and partnerships, both say the consumer side is an important part of the business. They draw frequent analogies between their startup–a 15-person operation in Los Altos, California–to unicorns Dropbox and Slack, which make nearly all their money from enterprise use, but gather valuable data and feedback by making their products freely available to individuals. They also don’t show advertisements to their free users.
Likewise, AISense expects that the vast majority of Otter users won’t pay anything for the service.
“We believe that we’re creating a new category-defining application, like Dropbox, like Slack,” McAteer says. “And we really are focused, like they are, on getting this disseminated as broadly as possible.”
The firm can still access user recordings, though, and it reserves the right to share user data to respond to “lawful requests” and “court orders.” Users can delete that data themselves, but not if law enforcement has already asked the company to preserve its records. Journalists who are concerned about records requests from overzealous law enforcement agencies should probably just stick to pen, paper, and offline recordings.
“We do encrypt everything and make sure everything is secure and confidential, and the user has ownership of their own data,” Liang says. “If you delete it, we definitely erase it.”
To secure a future for its consumer apps, AISense will work on both its core speech-to-text AI and additional transcription features. Liang says being able to annotate transcripts is on the roadmap (but only after the conversation is over), and the company is looking to add more AI-driven analysis to its transcripts. For instance, the app might tell users what a conversation was about or extract action items. Later this year, Otter will also add transcriptions of phone calls for iPhone users, though iOS’s restrictions on direct call recording will likely require a workaround, such as merging the call with a third party that handles the recording.
Those kinds of features could give Otter an advantage over tech giants like Google, Amazon, and Microsoft. While Otter already undercuts some other auto-transcription startups such as Trint ($15 per hour or $40 per month) and Descript ($10 per month for “early adopters, but with only 30 minutes of free recording), larger tech firms could drive prices down even further if they made conversational voice transcription a priority. Google currently offers speech-to-text services for developers, but at $0.024 per minute, or $1.44 per hour after the first 60 minutes, the cost quickly becomes much greater than an Otter subscription.
But again, Liang and McAteer are eager to compare themselves to Dropbox and Slack, both of which are holding their own against Big Tech.
“We’re not standing still,” McAteer says. “We expect that they will have an offering, just like they have an offering today for cloud storage, just like they have an offering today for collaboration. It hasn’t stopped Dropbox and Slack from enjoying massive adoption as a pure play.”