In Today’s AI Race, Don’t Gamble with Your Digital Privacy

In Short
  • In today's AI era, we must be mindful of our digital privacy. AI companies are relentlessly gathering public and private data in various ways.
  • Google and OpenAI not only process private conversations to train their models but also employ humans to further annotate and review the data.
  • Transparency is something lacking from popular AI service providers and many fail to disclose how personal data is handled by the company.

There is no doubt that we are living in the AI age, with chatbots and single-use AI hardware being launched left and right. In the coming years, AI is only going to encompass every facet of our lives. AI companies are relentlessly collecting data, both public and personal, to train and improve their models. However, in this process, we are giving away our personal information which may put our privacy at risk. So, I looked into the privacy policies of popular AI chatbots and services and have recommended the best ways that you as a user can protect your privacy.

Google Gemini (Formerly Bard)

gemini homepage

To begin with, Google’s Gemini stores all your activity data by default. It doesn’t seek the user’s express consent before storing the data. Google says all your interactions and activities on Gemini are stored for up to 18 months. In addition, your Gemini chats are processed by human reviewers who read and annotate the conversation to improve Google’s AI model. The Gemini Apps Privacy Hub page reads:

To help with quality and improve our products (such as generative machine-learning models that power Gemini Apps), human reviewers read, annotate, and process your Gemini Apps conversations.

Google further asks users to not share anything confidential or personal that they don’t want the reviewers to see or Google to use. On the Gemini homepage, a dialog appears informing the user about this. Apart from conversations, your location details, IP address, device type, and home/ work address from your Google account are also stored as part of Gemini Apps activity.

Data Retention Policy

That said, Google says that your data is anonymized by disassociating your Google account from conversations to protect your privacy. Google also offers the option to turn off Gemini Apps Activity and lets you delete all your Gemini-related data. However, things get a bit murky here.

Once your conversations have been evaluated or annotated by human reviewers, they do not get deleted even if you delete all your past Gemini data. Google keeps the data for three years. The page reads:

Conversations that have been reviewed or annotated by human reviewers (and related data like your language, device type, location info, or feedback) are not deleted when you delete your Gemini Apps activity because they are kept separately and are not connected to your Google Account. Instead, they are retained for up to three years.

In addition, even when your Gemini Apps Activity is turned off, Google stores your conversation for 72 hours (three days) to “provide the service and process any feedback“.

As for uploaded images, Google says textual information interpreted from an image is stored, and not the image itself. However, goes on to say, “At this time [emphasis added], we don’t use the actual images you upload or their pixels to improve our machine-learning technologies”.

In the future, Google might use uploaded images to improve its model so you should be cautious and refrain from uploading personal photos on Gemini.

If you have enabled the Google Workspace extension in Gemini, then your personal data accessed from apps like Gmail, Google Drive, and Docs, don’t go through human reviewers. These personal data are not used by Google to train its AI model. However, the data is stored until the “time period needed to provide and maintain Gemini Apps services“.

If you use other extensions such as Google Flights, Google Hotels, Google Maps, and YouTube, the associated conversations are reviewed by humans so keep that in mind.

OpenAI ChatGPT

OpenAI’s ChatGPT is by far the most popular AI chatbot used by users. Similar to Gemini, ChatGPT also saves all your conversations by default. But unlike Gemini, it only informs the user not to share sensitive information for the first time after a new user signs up.

There is no static banner on the homepage informing the user that your data could be used for reviewing conversations or to train the model.

Nevertheless, coming to what kind of personal data ChatGPT collects from users, it stores your conversations, images, files, and content from Dall-E for model training and improving performance. Besides that, OpenAI also collects IP addresses, usage data, device information, geolocation data, etc. This applies to both free ChatGPT users and paid ChatGPT Plus users.

OpenAI says content from business plans like ChatGPT Team, ChatGPT Enterprise, and API Platform are not used to train and improve its models.

OpenAI does let you disable chat history and training in ChatGPT from Settings -> Data controls. However, the setting to disable chat history and training does not sync with other browsers and devices where you are using ChatGPT with the same account. Thus, to disable history and training, you need to open the settings and disable it on every device where you use ChatGPT.

Once you have disabled chat history, new chats won’t appear in the sidebar and they won’t be used for model training. However, OpenAI will retain chats for 30 days to monitor for abuse, and in that period, it won’t be used for model training.

As for whether human reviewers are used by OpenAI to view conversations, OpenAI says:

“A limited number of authorized OpenAI personnel, as well as trusted service providers that are subject to confidentiality and security obligations, may access user content only as needed for these reasons: (1) investigating abuse or a security incident; (2) to provide support to you if you reach out to us with questions about your account; (3) to handle legal matters; or (4) to improve model performance (unless you have opted out). Access to content is subject to technical access controls and limited only to authorized personnel on a need-to-know basis. Additionally, we monitor and log all access to user content and authorized personnel must undergo security and privacy training prior to accessing any user content.”

So yes, just like Google, OpenAI also employs human reviewers to view conversations and train/improve their models, by default. OpenAI doesn’t disclose this information on ChatGPT’s homepage which seems like a lack of transparency on OpenAI’s part.

You have the option to opt out and ask OpenAI to stop training on your content while keeping the Chat history feature intact. However, OpenAI doesn’t offer access to this privacy portal under the Settings page. It’s buried deep under OpenAI’s documentation which regular users can’t find very easily. At least, on the transparency point, Google does a better job than OpenAI.

Microsoft Copilot

Of all the services, I found the privacy policy of Microsoft Copilot to be the most convoluted. It doesn’t lay bare the specifics of what personal data is collected and how those data are handled by Microsoft.

On the Microsoft Copilot FAQ page, it says that you can disable personalization aka chat history. However, there is no such setting on the Copilot page. There is an option to clear all your Copilot activity history from the Microsoft account page, but that’s all.

The only good thing about Copilot is that it doesn’t personalize your interaction if it deems the prompt sensitive. And it also doesn’t save the conversation if the information seems to be private.

If you are a Copilot Pro user, Microsoft uses data from Office apps to deliver new AI experiences. If you want to disable it, disable Connected Experience from any one of the Office apps. Head over to Account -> Manage Settings under Account Privacy and turn off Connected Experiences.

Remini, Runway, and More

Remini is one of the most popular AI photo enhancers out there with millions of users. However, its privacy policy is quite dicey and users should be mindful before uploading their personal photos on such apps.

Its data retention policy says that personal data processed is kept for 2 to 10 years by the company, which is quite long. While images, videos, and audio recordings are deleted from its server after 15 days, processed facial data are sensitive in nature and are kept for many years. In addition, all your data can be handed over to third-party vendors or corporations in case of a merger or acquisition.

Similarly, Runway, a popular AI tool that deals with images and videos, retains data for up to three years. Lensa, a popular AI photo editor, also doesn’t delete your data until you delete your Lensa account. You have to email the company to delete your account.

There are many such AI tools and services that store personal data, particularly processed data from images and videos, for long years. If you want to avoid such services, look for AI image tools that can be run locally. There are apps like SuperImage (visit) and Upscayl (visit) that allow you to enhance photos locally.

Data Sharing with Third-parties

As far as data sharing is concerned with third parties, Google doesn’t mention whether human reviewers who process conversations are part of Google’s in-house team or third-party vendors. Generally, the industry norm is to outsource these kinds of work to third-party vendors.

On the other hand, OpenAI says, “We share content with a select group of trusted service providers that help us provide our services. We share the minimum amount of content we need in order to accomplish this purpose and our service providers are subject to strict confidentiality and security obligations.”

OpenAI explicitly mentions that its in-house reviewers along with trusted third-party service providers view and process content, although the data is de-identified. In addition, the company does not sell data to third parties and conversations are not used for marketing purposes.

In this regard, Google also says that conversations are not used to show ads. However, if this changes in the future, Google will clearly communicate the change to users.

Risks of Personal Data in Training Dataset

There are numerous risks associated with personal data making its way into the training dataset. First of all, it violates the privacy of individuals who may not have expressly given consent to train models on their personal information. This can be particularly invasive if the service provider is not communicating the privacy policy to the user transparently.

Apart from that, the most common risk is data breach of confidential data. Last year, Samsung banned its employees from using ChatGPT as the chatbot was leaking sensitive data about the company. Despite the fact that the data is anonymized, there are various prompting techniques to force the AI model to reveal sensitive information.

Finally, data poisoning is also a legitimate risk. Researchers say that attackers may add malicious data into conversations which may skew the model output. It can also add harmful biases which may compromise the security of AI models. Founding team member of OpenAI, Andrej Karpathy has explained data poisoning in extensive detail here.

Is There Any Opt-out Mechanism?

While major service providers like Google and OpenAI provide users a way to opt out of model training, in the process, they also disable chat history. It seems like companies are punishing users for choosing privacy over functionality.

Companies can very well offer the chat history which can help users find important conversations from the past, while not being part of the training dataset.

OpenAI, in fact, lets users opt out of model training, but it doesn’t advertise the feature prominently, and it’s nowhere to be found on ChatGPT’s settings page. You have to head to its privacy portal and ask OpenAI to stop training on your content while keeping your chat history intact.

Google doesn’t offer any such option which is disappointing. Privacy should not come at the cost of losing helpful functionality.

What are the Alternatives?

Coming to alternatives and ways to minimize your data footprint, well, first of all, you have the option to disable chat history. On ChatGPT, you can keep chat history and opt out of model training via its privacy portal page.

Apart from that, if you are serious about your privacy, you can run LLMs (large language models) on your computer locally. Many open-source models out there run on Windows, macOS, and Linux, even on mid-range computers. We have a dedicated in-depth guide on how to run an LLM locally on your computer.

You can also run Google’s tiny Gemma model on your computer locally. And if you want to ingest your own private documents, you can check out PrivateGPT which runs on your computer.

Overall, in today’s AI race where companies are looking to scrape data from every corner of the internet and even generate synthetic data, it’s upon us to safeguard our personal data. I would strongly recommend users not feed or upload personal data on AI services to preserve their privacy. And AI companies should not discount valuable functionalities for choosing privacy. Both can co-exist.

Comments 0
Leave a Reply

Loading comments...