As ChatGPT continues to gain popularity, people are questioning more and more the potential risks it poses. One of the big concerns is with regard to personal data. Does ChatGPT have access to personal information? If so, how does it use it? And, does it store it? We posed these questions to ChatGPT. Here are the key points that emerged from our interaction:
- The dataset ChatGPT trained on includes content people have shared publicly online such as social media posts, online reviews, public databases, etc.
- ChatGPT doesn’t actively collect personal information.
- However, ChatGPT can come across sensitive personal information such as credit card details and social security numbers, underlining the security risks it can cause as part of its functioning.
- There are significant ethical and privacy concerns (e.g., consent) about how ChatGPT processes data.
Here is the recording of our chat, which we discuss in detail below.
Is ChatGPT trained on personal information?
In order to understand how ChatGPT engages with personal data, we entered the following question:
What type of “personal information” does ChatGPT have access to?
In its response, it acknowledged that the dataset it trained on includes “personal information that people have shared online,” and gave the following examples (see Figure 1):
- Social media posts, comments, and messages that people have shared publicly.
- Blog posts and articles that they have written and shared online.
- Public records and databases (e.g., property records).
- Online reviews, ratings, and feedback that they have left on various websites.
ChatGPT stated that it doesn’t actively collect personal information but, importantly, added that:
It’s possible that it might come across sensitive information such as credit card numbers, Social Security numbers or personal identification numbers, and so on. (see Figure 1)
This is, of course, a big issue and has important implications for security risks associated with ChatGPT.
Ethical and privacy implications
As a final question, we asked the following:
Is it fair to say that all “personal information” that ChatGPT has is within the public domain?
Interestingly, it answered by saying “most of the personal information that ChatGPT has access to is within the public domain.” It didn’t say “all” of it! However, we didn’t push further on that.
There were other useful details in its answer as well:
However, it’s important to note that the information that is publicly available online may not always be accurate, complete, or up-to-date, and in some cases, it may be shared without the knowledge or consent of the individual. (see Figure 2)
Obviously, “consent” is a huge issue and lies at the heart of privacy regulations. Hardly anyone would feel cheerful if they found out that their personal data is used and shared by ChatGPT without their knowledge. In that sense, there are question marks concerning ChatGPT’s compliance with existing regulations.
People are right to be worried about ChatGPT’s access to personal information no matter how benign its interests may be. While ChatGPT says it doesn’t actively collect personal data, this gives little peace of mind given that it admits it is trained on datasets that contain content that people have shared online. Such content can vary from social media posts to online reviews. Furthermore, there is a real risk that ChatGPT can come across sensitive data such as credit card details and social security numbers.
Both ChatGPT’s and its competitors’ future will critically depend on how OpenAI, Google, Microsoft, regulators, and other key stakeholders will tackle these ethical and privacy issues.
We hope you’ve enjoyed reading this article. We would love to hear back from you if you have got any comments or questions. You can reach us here.