Sharing sensitive business data with ChatGPT could be risky

“Those queries are stored and will almost certainly be used for developing the LLM service or model at some point. This could mean that the LLM provider (or its partners/contractors) are able to read queries and may incorporate them in some way into future versions,” it added. Another risk, which increases as more organizations produce and use LLMs, is that queries stored online may be hacked, leaked, or accidentally made publicly accessible, the NCSC wrote.

Ultimately, there is genuine cause for concern regarding sensitive business data being inputted into and used by ChatGPT, although the risks are likely less pervasive than some headlines make out.

Likely risks of inputting sensitive data to ChatGPT

LLMs exhibit an emergent behavior called in-context learning. During a session, as the model receives inputs, it can become conditioned to perform tasks based upon the context contained within those inputs. “This is likely the phenomenon people are referring to when they worry about information leakage. However, it is not possible for information from one user’s session to leak to another’s,” Andy Patel, senior researcher at WithSecure, tells CSO. “Another concern is that prompts entered into the ChatGPT interface will be collected and used in future training data.”

Although it’s valid to be concerned that chatbots will ingest and then regurgitate sensitive information, a new model would need to be trained in order to incorporate that data, Patel says. Training LLMs is an expensive and lengthy procedure, and he says he would be surprised if a model were trained on data collected by ChatGPT in the near future. “If a new model is eventually created that includes collected ChatGPT prompts, our fears turn to membership inference attacks. Such attacks have the potential to expose credit card numbers or personal information that were in the training data. However, no membership inference attacks have been demonstrated against the LLMs powering ChatGPT and other similar systems.” That means it’s extremely unlikely that future models would be susceptible to membership inference attacks, though Patel admits it’s possible that the database containing saved prompts could be hacked or leaked.

Third-party linkages to AI could expose data

Issues are most likely to arise from external providers who do not explicitly state their privacy policies, so using them with otherwise secure tools and platforms can put any data that would be private at risk, says Wicus Ross, senior security researcher at Orange Cyberdefense. “SaaS platforms such as Slack and Microsoft Teams have clear data and processing boundaries and a low risk of data being exposed to third parties. However, these clear lines can quickly become blurred if the services are augmented with third-party add-ons or bots that need to interact with users, irrespective of whether they are linked to AI,”  he says. “In the absence of a clear explicit statement where the third-party processor guarantees that the information will not leak, you must assume it is no longer private.”

Aside from sensitive data being shared by regular users, companies should also be aware of prompt injection attacks that could reveal previous instructions provided by developers when tuning the tool or make it ignore previously programmed directives, Neil Thacker, Netskope’s CISO for EMEA, tells CSO. “Recent examples include Twitter pranksters changing the bot’s behavior and issues with Bing Chat, where researchers found a way to make ChatGPT disclose previous instructions likely written by Microsoft that should be hidden.”

link