Are These Super-Impressive AI Audiobook Voices Good or Bad?

Depends on how you feel about a robot reading your bedtime story

Key Takeaways

  • DeepZen uses AI (artificial intelligence) to create startlingly realistic audiobooks from text. 
  • The tech uses real human voice actors to provide the building blocks.
  • Amazon and Audible don’t currently accept computer-generated audiobooks.
A person mixing audio on a computer.

Kelly Skiiema / Unsplash

DeepZen is a company that creates computer voices used in audiobooks, based on the real voices of human actors. The quality is scary—easily good enough to listen to for hours at a time. The gimmick here is the AI (artificial intelligence) component, which can read the text and infer the correct emotional response based on context. It then puts that emotion into the voice.

It's impressive and very convenient. But do we really want a homogenized audiobook experience? And what about those voice actors?

"From the indie publisher's perspective, anything that reduces the cost of audiobook production is very interesting," Rick Carlile, owner of independent publisher Carlile Media, told Lifewire via email.

"But that attraction assumes that the product would be of equal quality to traditional narration. I don't think we're one hundred percent there yet. Don't get me wrong, DeepZen is astonishingly good. It's a tremendous breakthrough, and its creators deserve immense praise and success. But it's not yet perfect."

Audio That's 'Good Enough'

The best way to understand the quality of DeepZen is to listen to the samples. If you didn't know they were computer-generated, you might not even realize. Not for a while anyway. Let's assume that DeepZen's AI is perfect and that it never misinterprets the emotional notes it's supposed to be hitting.

A plastic figuring with only minimal features, holding a tablet as if reading from it.

Brett Jordan / Unsplash

Even then, a human can offer more nuanced and often more surprising interpretations. An actor might put an unexpected twist on the words that a computer would never even consider. And in reality, the AI interpretation surely isn't yet as good as that of a professional voice actor. 

"As one who works on movies and most recently in the world of audio narration, while I am impressed with the AI—I know for a fact that there's deep depths of meaning that a machine cannot interpret," professional voice actor Paul Cram told Lifewire via email.

"Will there be a surge of unknown authors using it? I guarantee there will because it's 'good enough.'"

Being good enough, combined with the convenience and cost savings, might be sufficient to drive indie publishers to the service. 

"Audiobooks can cost up to $500 per finished hour of audio (much more for a celebrity voice), and that doesn't include the time cost of management and admin," says Carlile. "Being able to halve that cost by simply uploading a manuscript to a provider like DeepZen is extremely attractive." 

Talking Trouble

It's not yet quite as easy as firing your voice actors and uploading manuscripts to DeepZen. There is currently one barrier to easy audiobook AI oration, and it's from Amazon.

Someone recording voice audio in a studio.

Joel Muniz / Unsplash

"Currently, ACX, the self-publisher's route to Audible and Amazon audiobook distribution, will not accept audiobooks that a human did not record," says Carlile.

Why? Quality. Here's the FAQ entry from the website:

"Text-to-speech or other automated recordings are not allowed. Audible listeners choose audiobooks for the performance of the material, as well as the story. To meet that expectation, your audiobook must be recorded by a human."

This means that DeepZen-generated audiobooks are out—for now, at least. This is pure speculation, but DeepZen would seem like a pretty good acquisition for Amazon, letting it sell the service and keep it solely for Audible books. And even if that doesn't happen, if the quality of computer-generated audiobooks is as good as this, then there seems little reason not to make an exception to this rule. 

Would you be happy to listen to audiobooks made this way? When it happens, most people won't even suspect. Some might prefer the perfection of computer-generated voices because they'll be free of the vocal tics and habits that can sometimes distract. The technology is also suitable for video games, TV and radio ads, and any other scenario where you'd hire a voice actor. 

DeepZen's tech also would make a great way to automatically create news podcasts from written articles, which could be handy for the commute. 

And what about those voice actors? Well, there will be at least one opportunity: They can go and work for DeepZen.

Was this page helpful?