Siri, Alexa, and the Google Assistant all use spliced up human voices to tell you the weather or wake you up in the morning. It’s standard text-to-speech technology, where voice actors record thousands of sentences and then a computer chops up the recordings into pieces that can then be algorithmically rearranged.

But these voices sound like the robots that they are. For IBM’s latest AI project, in which the company created an AI that would be able to hold lengthy debates on a wide variety of topics with a real person, the company needed a voice that would be persuasive and dynamic–one that would sound more human.

But how do you create an AI voice that’s layered with real emotion?

That was the challenge for Andy Aaron, a researcher at IBM who led the search for the perfect voice for Project Debater, as it’s called. Aaron isn’t a computer scientist or an engineer; he’s a sound designer who previously worked on dozens of Hollywood films and television shows before a friend convinced him to try out creating text-to-speech voices for IBM. Aaron was hooked immediately and has worked at IBM for two decades on all kinds of projects, including creating the voice for Watson.

But unlike Watson’s voice, the voice Aaron envisioned for Debater was in a different category altogether. Unlike his typical text-to-speech projects, which require a voice actor to read a few thousand sentences before handing the work off to an algorithm, Debater’s voice needed to be far more complex. To understand which components were necessary, Aaron and his team watched dozens of real, human debates and analyzed a variety of tones people use to make their arguments: an anecdotal voice, rebuttal voice, a voice you’d use when addressing the audience directly, and more. Then, he set out to find the person who had enough control over her voice that she could speak incredibly consistently while also talking in these different cadences.

“This is the hardest narration job anybody will ever have,” Aaron says. “It’s really difficult material to read and it’s endless.”

To find his voice, Aaron met with about 20 people, split evenly between men and women, and had them read an incredibly difficult script that included tongue twisters and foreign names cold without looking at it beforehand. Then, the five most promising actors each recorded 1,000 lines and Aaron created makeshift computer voices from these sentences. Once he had the rudimentary versions of the actors’ computer voices, he programmed each of these voices to say another 10 sentences. Whichever actor-synthesized computer voice sounded the best got the job.