Table of Contents
ToggleGoogle’s Revolutionary Shift in AI Training Policies
In a recent policy update, tech giant Google chose to collect information from all online resources to train Bard, one of their AI models.
Despite the allure of cost savings, many firms are wary of implementing AI tools for a variety of reasons. One of the major concerns is the legal ambiguity surrounding whether these AI models genuinely have the legal right to utilize all of the data they train on and whether they might accidentally produce content that contains protected parts.
Google’s most recent privacy policy inclusion, which essentially says that the entire internet is its domain to scrape for the production of its AI products, looks to have blasted right through this discussion like a bull in a china shop.
ALSO READ: Photoshop’s New Generative AI Feature: A Revolution in Image Editing
Google will be permitted to gather information under the new policy from a range of open sources, including site content, posts on social media, and public documents. To train AI models for various tasks including spam filtering, fraud detection, and language translation, this data will be used.
In the words of Google, accurate and efficient AI models must be trained using public data. The business also declares that it will take precautions to safeguard user privacy, such as de-identifying data before its use in model training.
How Is Google’s New Policy Shaped?
On the Edge of Legal Discrepancies
The revised privacy policy represents a tiny (but very significant) revision to language that has been in place for a while. Google had previously stated that it employed open internet resources to develop “language models,” citing Google Translate as an illustration.
The identical section of the privacy statement that previously mentioned “language models” has now been renamed to refer to “AI models,” and Google expressly cites current initiatives like Bard and its Cloud AI as examples.
Content posted to Google’s free services, such as Blogger and Sites, could reasonably be expected to be used in this manner. However, the wording of the privacy policy seems to suggest that Google believes everything it can access on the open internet is fair game for it to utilize to improve its products.
One of the major incidents is Clearview AI, which without anyone’s knowledge or agreement generated a sizable biometric facial recognition database by harvesting publicly available images. They have not fared well in the debate thus far; Facebook and other social media companies have restricted access to their platforms because of violations of their terms of service and privacy policies, and some nations (such as Canada) and states (such as Illinois) have outright banned them from their soil, and they have racked up significant fines in the EU for privacy and data handling infractions.
ALSO READ: The Power of AI in Ayurveda: Enhancing Herb Identification for Safe and Effective Products
However, the basic issue in these cases is that when people upload content to the internet, they have some legitimate expectations about how those private businesses would use it, and being incorporated into a private for-profit database is not usually among those reasonable expectations.
With AI models involved, the situation is much more complicated since they might spit some of this information back out in a way that could give rise to legal action.
Personal Concerns and Data Scrapping
The serious incidents of data scraping and privacy problems are the results of this new policy amendment. While it’s customary for businesses to keep customer information private for future use in the development of new products, Google’s new policy permits the company to utilize any publicly accessible data to train its AI models.
Google now has access to all online data, including personally identifiable information. Although the company claims to de-identify the sources, problems can still arise.
It can firstly infringe on people’s right to privacy. People may not be aware that their data is being taken or how it is being used when it is scraped without their consent. Numerous issues, including identity theft and financial fraud, may result from this.
Second, biased AI models can be produced by data scraping. When AI models are trained using data that is downloaded from the internet, the biases that are already present in the data may be reflected in the models. This may result in AI systems that are biased against particular social groupings.
ALSO READ: Protecting Your Pictures from AI Manipulation: Introducing PhotoGuard
Data scraping can also cause problems on the internet. The Twitter outage is the most recent illustration of this. Websites can become sluggish and challenging to use when data is scraped from them.
Conclusion
Given that it controls an estimated 90% of the search advertising industry, Google in particular faces legal and regulatory risks along antitrust lines. According to Google’s “Search Generative Experience” tool’s beta release, AI-generated text (combined with advertising links) will occupy the entire first page of search results, with actual links to websites appearing “below the fold” and requiring users to scroll down and click a “show more” button to view.
The new strategy will undoubtedly aid Google in producing powerful AI, but it will also pose a safety risk. Additionally, the policies can result in more data mining and privacy abuses. It is crucial to pay close attention to how Google applies these rules.