AI Is Coming for Voice Actors. Artists Everywhere Should Take Note

No one knows how automation will upend all the arts. But the current turmoil in the voice-over industry may offer some hints

A photo illustration of a voice actor's silhouette filled with binary code.
The Walrus / iStock

As a voice actor, I know how passionately people can get attached to cartoons, how visceral the sense of ownership that comes from loving a character can be. Figures I’ve voiced have inspired fan art both wholesome and kinky. They’ve even inspired fan art of me as a person (thankfully, just the wholesome kind, as far as I know). I get emails asking me to provide everything from birthday greetings to personal details. Sometimes the senders offer a fee. If I were savvier, I would be on Cameo—or maybe OnlyFans.

All of this probably means I should be worried about recent trends in artificial intelligence, which is encroaching on voice-over work in a manner similar to how it threatens the labour of visual artists and writers—both financially and ethically. The creep is only just beginning, with dubbing companies training software to replace human actors and tech companies introducing digital audiobook narration. But AI poses a threat to work opportunities across the board by giving producers the tools to recreate their favourite voices on demand, without the performer’s knowledge or consent and without additional compensation. It’s clear that AI will transform the arts sector, and the voice-over industry offers an early, unsettling model for what this future may look like.

In January, the Guardian reported that Apple had “quietly launched a catalogue of books” narrated by AI voices. Apple positions the move as a way of “empowering indie authors and small publishers” during a period of audiobook growth, allowing their work to be taken to the market within a month or two of publication when it might not otherwise get the chance at all. Their offering makes the costly, time-consuming process of converting text to audio—of selecting and contracting an actor, of booking studio space, of hiring a director and engineer, of painstakingly recording every page and line until it’s perfect—more accessible to writers and publishers with fewer resources. Eligible writers get a one-time choice of the type of voice they’d like to narrate their book—the two options are “Soprano” and “Baritone”—and “Apple will select the best voice based on this designation paired with the content.” The guidelines explain that fiction and romance are “ideal genres” for this treatment and add, somewhat prissily, “Erotica is not accepted.”

Listening to the sample voices, I was impressed, at first, by the Soprano option. Soprano sounds like a soothing, competent reader—but, I soon realized, one with a limited emotional range that quickly becomes distracting (the “no erotica” policy started to seem more like an acknowledgment of the system’s limitations than mere puritanism). There’s no doubt in my mind that a living artist would do a better job, which, when it comes to conversations around AI-generated art, feels less and less like a novel conclusion—with any gain in efficiency, you of course give up something vital in the exchange. In this case, it’s the author as well as the audience who lose out.

When I audition for audiobooks, I send a sample recording of a few pages. It is subject to review by both publisher and author, who gets a say in whether they find my voice suitable for telling their story. Unlike Soprano, I’m also a package deal—I can adapt my voice instantly to offer a range of characters, an ability that my AI competition still lacks. The Apple guidelines specify that “the voice selection cannot be changed once your request is submitted.” The process foreshadows an industry adept at producing more content faster and for less, but it’s not necessarily one that produces good art. Flat narration may not bother the listener who takes in their audio at 1.5x speed or those who consider books nothing more than a straightforward information delivery system. But until AI gets good enough to render a wider emotional spectrum and range of character voices—and I worry it will—it might well let down the listener who’s into narrative absorption or emotional depth.

B ut the risks of AI go beyond worries about a flood of bad art. Rather, there are immediate concerns about its effects on actors’ livelihoods and, most importantly, their intellectual property rights. A month after the audiobook news broke, Motherboard published a piece about a rise in the use of a disconcerting contractual clause that asks performers to consent to the synthetic generation of their voice, at times without additional pay. Actors, understandably, began to panic. “I’m honestly not sure how to combat all the AI companies stealing our voices, but this should be a concern for all people, not just actors,” tweeted Tara Strong, whose extensive career includes voicing Timmy Turner in The Fairly OddParents and Miss Minutes in the Disney+ series Loki.

Other actors noted concerns about a future where their voices proliferate online without their consent. Think about all those emails I got from fans asking me to send a birthday greeting: Why involve the performer at all if you can just feed a few hours of their voice into a generator and make them say whatever you want?

There’s a chance AI could even upend how we think about casting. It’s common to hire celebrities for animation projects, banking on the recognizability of their voice to draw viewers. What if Morgan Freeman has a scheduling conflict or simply wants a new passive income stream and licenses his voice to Pixar for the next decade? The actor in question doesn’t even need to be alive—a few key decisions by a celebrity’s estate and, all of a sudden, contemporary voice actors might find themselves going up against the legendary Mel Blanc for their next gig.

Performers’ unions recognize these risks and are beginning to fight for protections. In March, SAG-AFTRA, the labour union that represents film and TV actors in the US, issued a statement saying that any contract involving AI and digital dubbing requires companies to bargain with the union. In Canada, ACTRA Toronto acknowledged AI’s encroachment on voice work in their 2023/24 operating plan, where they vowed to take an “artist first” approach that involves “strengthened copyright and intellectual property laws for digital performance.” In Italy, voice actors are already on strike. Historically, these kinds of negotiations arise with changes in media and technology. I’m optimistic that unions will implement protections preventing voice actors from being totally replaced—but we’ll have to move quickly to keep pace with developments in tech.

Maybe “replacement” is the wrong framework altogether. What AI software is being trained to do is fundamentally different from what artists do. Even as automated voices get more sophisticated, they’re still trained to mimic rather than create. If they aren’t already, I’m sure directors will soon be prodding algorithms to read a line three different ways (“try the next one sadder”) and the AI will give them exactly what they want. But it won’t give them the thing they don’t know they want—a devastatingly funny ad-lib or a moment of improv between cast members. Voice-over actors, like all artists, will persist as long as creators and audiences value authenticity. Yes, animation is where adult humans play two-dimensional woodland creatures, but the industry has matured to a point where it prizes casting children as child characters and racialized actors to voice racialized characters. Intention is part of what distinguishes art from mere content. It’s easy to replace a process, but it’s harder to dislodge an entire value system, no matter how much overhead it saves.

Tajja Isen
Tajja Isen is a contributing writer for The Walrus.