AI May Be Spying on Your Conversations

But help may be on the way

  • A growing number of programs can understand your speech.
  • New technology generates custom audio noise in the background as you talk to confuse the software that could be listening. 
  • The new technique achieves real-time performance by forecasting an attack on the future of the signal or word.
millennial working in home office telecommuting using laptop computer

YinYang / Getty Images

Many programs can understand your speech during phone or video calls, and experts say they may pose a privacy threat. 

A new technology developed by Columbia University researchers, called Neural Voice Camouflage, may offer a defense. It generates custom audio noise in the background as you talk, confusing the artificial intelligence (AI) that listens and transcribes voices. 

"The presence of AI transcription raises issues of trust," Michael Huth, co-founder of Xayn, a privacy-protecting search engine, and head of the Department of Computing at Imperial College London, who was not involved in the research, told Lifewire in an email interview. "Meeting participants may be more careful about which points they raise and how their speech is being transcribed. This can be a good thing as it may improve respectful behavior, but it can also be a bad thing as the conversation may be less open because of reservations about the technology used."

Listening and Learning

The Columbia researchers worked to design an algorithm that could break neural networks in real-time. The new approach uses "predictive attacks"—a signal that can disrupt any word that automatic speech recognition models are trained to transcribe. In addition, when attack sounds are played over the air, they need to be loud enough to disrupt any rogue "listening-in" microphone that could be far away. 

"A key technical challenge to achieving this was to make it all work fast enough," Carl Vondrick, a professor of computer science at Columbia and one of the authors of a study describing the new approach, said in a news release. "Our algorithm, which manages to block a rogue microphone from correctly hearing your words 80% of the time, is the fastest and the most accurate on our testbed." 

The new technique achieves real-time performance by forecasting an attack on the future of the signal or word. The team optimized the attack, so it has a volume similar to normal background noise, allowing people in a room to converse naturally and without being successfully monitored by an automatic speech recognition system. 

Meeting participants may be more careful about which points they raise and how their speech is being transcribed.

The scientists said their technique works even when you don't know anything about the rogue microphone, such as its location, or even the computer software running on it. It camouflages a person's voice over-the-air, hiding it from these listening systems, and without inconveniencing the conversation between people in the room. 

"So far, our method works for the majority of the English language vocabulary, and we plan to apply the algorithm on more languages, as well as eventually make the whisper sound completely imperceptible," Mia Chiquier, the lead author of the study and a PhD student in Vondrick's lab, said in the news release.

Keeping Your Conversations Private

As if all of that wasn't enough, advertisements could be targeting you based on audio collected from your smartphone or smart home devices, too.

"With devices like [the Amazon Echo] and their counterparts, these devices are not only always in your home, constantly listening to everything you say or do, but they—through years of data collection from their users—have perfected natural language processing (turning spoken word into text/usable data for devices via a combination of microphones, software, and AI)," Erik Haig, an associate at Harbor Research, a strategy consulting and venture development firm, said in an email. 

woman talking with virtual digital voice recognition assistant

Luis Alvarez / Getty Images

AI transcriptions of conversational speech are now a standard part of standard commercial software, Huth said. For example, Microsoft Teams has a record meeting option with built-in AI transcriptions that can be seen by all participants in real-time. The complete transcript can serve as a record of the meeting. Usually, such transcripts allow minute-taking (aka note-taking), where minutes would be approved at the next meeting.

"People may be concerned about being spied on when AI transcription is on," Huth added.
"This seems very similar to the concern of having a conversation recorded without consent or clandestinely."

But not everyone agrees that smart devices are a threat. Most people don't need to worry about programs listening to your conversations, Brad Hong, a customer success lead at the cybersecurity firm Horizon3, told Lifewire via email. He said the most significant concern now is not who is recording you, but rather how they store the data. 

"All the stories one hears about a microphone on their computer or mobile devices being activated, Alexa or Google Home listening in, or even government surveillance, it's true that all of these make the layman's stomach churn," Hong added. "But all in all, people are rarely in a situation that actually requires camouflaging of their voices."

Was this page helpful?