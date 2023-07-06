Google’s updated privacy policy lets it use internet data to train AI.

The use of training data could lead to privacy violations.

Experts say there needs to be more regulations about what data can be used to teach AI.

Teaching artificial intelligence (AI)-powered chatbots require massive amounts of data, and it turns out that information may be coming from you.

Google's updated privacy policy lets it take public information and use it for training AI like the company's Bard technology. Enormous data sets are necessary to allow chatbots to communicate like humans, but some observers say limits must be set.

"The privacy implications of the use of personal data in training AI models are still being figured out," Irina Raicu, an internet ethics expert at Santa Clara University, told Lifewire in an email interview. "There is research, for example, that shows that some prompts may lead to outputs that 'leak' such information rather than transform it into something that can't be traced to particular individuals. There is also the broader issue of people's data being used in ways and for purposes that those people hadn't even considered, let alone consented to."

School for AI

In its privacy policy, Google says it uses information scraped from the web "to improve our services and to develop new products, features, and technologies that benefit our users and the public" and that the search giant may "use publicly available information to help train Google's AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities."

Any regulatory approach should be flexible enough to accommodate advancements in technology while mitigating the privacy and security risks of generative AI.

Previously, the policy surrounding publicly accessible sources was limited to mentioning Google's "language" services, like Google Translate, Dominic Sellitto, a professor at the University at Buffalo School of Management who writes about governance, said in an email.

"This update just expands the services they explicitly mention, likely in response to lawsuits currently being filed against other AI firms," he added.

From a privacy perspective, some AI training proponents might argue that internet data was posted publicly, and users should be conscious of the implications of that, Sellitto said.

"However, that doesn't address a few key points, such as information posted without an individual's consent and when a user invokes their regulatory rights to remove their data from platforms," he added. "What happens when the algorithm has been trained on this information from the past and consent is retroactively revoked? To my knowledge, there's no good answer to this question, and it's a huge gap in the regulatory landscape."

AI models learn from the data they are trained on, making the source of this information critical, Ani Chaudhuri, the CEO of the data company Dasera said in an email. "The new policy suggests that publicly available online is fair game for their AI's learning process," he added. "While this might lead to more efficient and advanced AI models, it raises privacy concerns."

Even though the data is 'public,' users might not fully realize or consent to the idea that their public interactions online could be used to train an AI model, Chaudhuri said. "The challenge here is to strike the right balance between advancing technology and respecting user privacy," he added.

Keeping AI Data Private

As excitement grows about the potential for AI, along with fears about misuse, there's a movement to regulate AI. Observers say that training data needs to be included in any AI regulation.

Some existing laws regulate some aspects of the training and use of generative AI, data privacy lawyer Frida Alim said via email. For example, although not specific to generative AI, the European Union's General Data Protection Regulation has rules about data collection and requires companies to notify individuals when their personal data is collected, even if that personal data was publicly available when collected.

"Any regulatory approach should be flexible enough to accommodate advancements in technology while mitigating the privacy and security risks of generative AI," she added.

Even without legal protection, Raicu said that users could designate more of their communications as private. They can also demand, from companies, the opportunity to opt out of their data being used to train AI models.

"But the onus can't really be put on users …," she added. "Users need to call for more public education by companies developing such models; more clear disclosures; and more regulation of the use of personal data for a variety of purposes, including AI development."