Why Experts Say AI That Clones Your Voice Could Create Privacy Problems

Computer-generated speech is getting better

  • A new kind of software will be able to clone your voice. 
  • The ability to copy voices could lead to security problems. 
  • Cloning programs could eventually overtake voice recognition software.
A robot speaking into a microphone.

Devrimb / Getty Images

The sound of your speech might soon no longer belong to you, thanks to artificial intelligence (AI).

New Microsoft software will soon be able to clone anyone's voice by just listening to a three-second audio example. Experts say the technology raises a host of security and privacy issues. 

"Artificial intelligence in the hands of adversaries has the potential to amp up social engineering exponentially, which is currently one of the most successful scamming tactics available," Zane Bond, the head of product at the cybersecurity company Keeper Security, told Lifewire in an email interview. "This is very likely a real problem that is going to lead to high-profile breaches in the coming years."


Microsoft researchers say their new application, called VALL-E, can be used for text-to-speech applications. "Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker," the researchers wrote in their paper

VALL-E isn't the only software that can copy voices. Mike Parkin, a senior technical engineer at Vulcan Cyber, noted that several commercial text-to-speech systems do an excellent job synthesizing a human voice. Many applications are built into standard software like Google Docs that will read documents aloud.

"Their ability to mimic a specific human voice varies, and that's the special feature being emphasized here," he added. "Microsoft's voice synthesis AI can replicate a specific human voice with only a few seconds of sample to work from. That's both very impressive and somewhat disturbing."

Resemble AI is one company that uses artificial intelligence to power its voice cloning technology. The company claims that its software is being used by some of the largest companies in the world to create hyper-realistic voices. 

"The privacy implications of AI that can clone people's voices are significant," Zohaib Ahmed, the CEO of Resemble AI, told Lifewire in an email. "The ability to clone a voice can be used for nefarious purposes, such as impersonation and fraud. It's important for companies developing this technology to consider and address these potential privacy concerns."

VALL-E could be used to imitate people's voices without their knowing, tech analyst Bob Bilbruck, the CEO at Captjur, told Lifewire via email. He added that the technology might make it impossible to distinguish real from fake recordings. 

"It could also lead to security issues as many voice-activated technologies could be manipulated to believe they are talking to one person and in reality, they are not," he added. 

Patrick Harr, the CEO of the cybersecurity company SlashNext, told Lifewire in an email that being able to mimic a person's voice will greatly enhance cybercriminals' ability to launch successful vishing attacks (fraudulent phone calls or voice messages purporting to be from a known contact). 

An AI voice imprint overlaying a button that displays a small robot, with a finger about to push the button.

jittawait.21 / Getty Images

"This technology could be extremely dangerous in the wrong hands. In addition to vishing attacks, it could be used by malicious actors as a follow-up technique to more traditional phishing attempts," he added. "For example, a bad actor sends a victim a scam via text message and then follows up that message by calling the victim directly. The combination of contact methods makes the phishing attempt all the more convincing and adds to the sense of urgency that so often is critical to cybercriminals' success rates."

Keeping Your Voice Your Own

Protecting against voice copying technologies like VALL-E could be a challenge. Harr predicted that within the next few years, everyone would have a unique digital DNA pattern powered by blockchain that can be applied to their voice, the content they write, and their virtual avatar. 

“This would make it much harder for threat actors to leverage AI for voice impersonation of company executives, for example, because those impersonations will lack the 'fingerprint' of the actual executive,” he added. 

Voice cloning is likely to get so sophisticated that it will make voice recognition programs useless for security, Bilbruck said. “I believe security will go to tri-authorization, which means there will be three or more variables that enable you to open a door, say, or access a secure area that used only to require your voice to access,” he added.

Was this page helpful?