Voice AI is Scary Good Now. Video Game Actors Hate It.
By Aimee Hart, Input
June 26th, 2021 – Reprint
Game modder nikich340 recently released a The Witcher 3 PC mod called A Night to Remember. The mod takes off after events from the Blood and Wine expansion, with protagonist Geralt of Rivia once more taking up the hunt for Orianna. Fans of the game were stoked to have a new, albeit unofficial addition to The Witcher 3 to play. Voice actors and some other observers, however, were less than thrilled. You see, A Night to Remember not only features new content, it features new voice lines. Specifically, the modder used AI trained on voice actor Doug Cockle’s speech to generate brand new voice lines for Geralt, the character he portrays.
“If this is true, this is just heartbreaking,” tweeted Jay Britton, a voice actor with credits in Divinity: Original Sin 2 and Pathfinder. “Yes, AI might be able to replace things but should it? We literally get to decide. Replacing actors with AI is not only a legal minefield but an utterly soulless choice.”
The Witcher 3 mod controversy came on the heels of video game developer Obsidian releasing a video about its work with Sonantic, which involves using AI voices as placeholders before adding voice actors into the game. The video explains that Obsidian uses Sonantic’s AI voices because the developer finds it helpful to listen back to its dialog to find out what works and what doesn’t.
Naturally, these sorts of advances make video game voice actors nervous. “There’s the kneejerk concern of ‘It’s going to take our jobs,’ and I do believe, in some cases, that will happen,” Natalie Winter, a voice actor in games such as Assassin’s Creed: Valhalla, tells me via DM. With ever-improving technology and affordable internet, there’s been a boom in voice acting that can be done from home, and, as Winter tells me, competition is pretty fierce. “It is sad to think that if AI voices become good enough to be widely used,” she says, “then those opportunities will decrease again.”
The pseudonymous modder behind the A Night to Remember project used software called CyberVoice to create new dialogue and voice lines for Geralt. CyberVoice is the product of the Russia-based Mind Simulation Lab, which is also behind CyberMind, which uses AI to form digital personalities for non-playable characters (NPCs) that gamers interact with. CyberMind provides the information used — for example, Geralt’s knowledge of the many monsters he encounters in The Witcher series — and CyberVoice gives that understanding a voice.
The CEO of Mind Simulation Lab, Leonid Derikyants, explains to me via email why such technology is needed in gaming. “We create digital versions of [actors’] voices so that live NPCs can answer questions from players, outside of story missions, with the same voice,” he says. “And since they form their answers independently and remember new facts, it’s impossible to voice it all in advance. It would be strange if, in that case, they spoke with a different voice. That is why we need the advanced technology of voice parodying.”
Though Mind Simulation Lab has worked with voice actors, Geralt’s voice was created through the use of free audio tracks meshed with another voice. As Derikyants explains, Mind Simulation Lab carries out “sound engineering work” that helps to manually change the voice so that it’s similar to the original. Then the company trains its speech synthesis on the audio. Derikyants says “parodying,” but this is more like parroting — the quality is that good.
Coming Soon From SOVAS
This brings up a whole host of issues. What’s to stop anyone — be it a solo developer or a triple-A game studio — from using the voice of someone to express something, say, racist or homophobic without their consent?
Though Sonantic CEO Zeena Qureshi doesn’t agree with the use of voice AI for A Night to Remember DLC, she says that, at least with her company, offensive speech shouldn’t be a concern. Qureshi says that when Sonantic models the voices of their actors, the company makes sure that actors agree with the content their voice will be used for throughout the whole process.
Qureshi points to the company’s “disclosure system,” which lets Sonantic run content past actors before they accept anything. “If our actors are not comfortable with something, then it’s a no-go,” she says. “We take misuse very seriously.”
I reached out to a spokesperson for actors’ union SAG-AFTRA about how voice AI could potentially affect an actor in regard to credit. If misuse did indeed occur and the actor left the project, would that seriously affect crediting rights? Their response, via email: This would ultimately be up to the actor, as currently “video game companies do not have a right to continue using the performer’s voice to create something new without their permission.
“If the performer chooses to allow for this, we would negotiate for fair compensation,” the rep continues, adding that the union wants “to ensure there are protections in place that allow visibility into how the AI voice is used, for how long, that the data is protected, and that [the actors] are aligned with projects and companies that they choose.”
Mind Simulation Lab also says that it also pays special attention to the ethics of AI. According to Derikyants, the parody voice of Geralt is not available for public use. However, the modder of A Night to Remember, nikich340, tells me that they’d asked for specific voice lines from the lab, which Mind Simulation Lab then provided — a turn of events that would seem to undercut Derikyants’ argument.
The Cockle parody audio, according to Mind Simulation Lab, will not be accessible for commercial purposes unless the actor joins the CyberVoice platform. If Cockle (who declined to comment for this piece) did take issue with the audio being used, however, Derikyants informed me that, in reality, there’s nothing to be done, seeing as the audio is not his actual voice “but simply similar.” Derikyants concedes that it is an issue: “Unfortunately, now no one can prohibit synthesizing any voice.”
Thomas Mitchells, a voice actor and voice director currently working on Baldur’s Gate 3, is wary of the new technology. “Companies like Sonantic offer ‘protection,’ but with time people will be able to get their hands on this type of kit,” Mitchells says. He’s right: There are a variety of voice-cloning systems being shared on GitHub at this very moment.
While Mitchells says he’s supportive of indie studios using voice AI for one-liners or call-outs (“Good job!”), there’s something very different about using voice AI for characters in an immersive world. “No AI is perfect,” he says. “You don’t get spontaneity, you don’t get a person’s personal experience, you don’t get that essence of humanity within the lines. You get a by-product of someone tweaking a bunch of knobs and dials in a plug-in to make delivery sound as plausible as possible.”
Mitchells cites the story of Sir Christopher Lee and his correction to director Peter Jackson on the set of Lord of the Rings about what sound a person makes when being stabbed in the back. In this situation, Lee’s experience during World War II brought something to the character that no AI could.
Winter, meanwhile, stresses the importance of the act of breathing in voice acting. “Breath is so key to expressing ourselves, especially through voice,” she says. “If your AI voice doesn’t breathe, it’s never going to carry the emotional weight that a human’s performance can.”
Ultimately, for Mitchells and other voice actors, it’s the diminishment of their craft that feels unforgivable. “Actors love to act,” he says. “That’s why they sacrifice so much to do it as a job. It is creatively fulfilling and when a character ends up with a fanbase behind it, it’s the most rewarding experience.
“Now, imagine becoming a character loved by many but you didn’t do a single thing to contribute towards that role,” Mitchells adds. “Zero creativity from the actor. Zero fulfillment. Zero art.” ♦♦♦
Coming Soon From SOVAS