With the rapid progress of large language models (LLMs), many downstream NLP tasks can be well solved given good prompts. Though model developers and researchers work hard on dialog safety to avoid generating harmful content from LLMs, it is still challenging to steer AI-generated content (AIGC) for the human good. As powerful LLMs are devouring existing text data from various domains (e.g., GPT-3 is trained on 45TB texts), it is natural to doubt whether the private information is included in the training data and what privacy threats can these LLMs and their downstream applications bring. In this paper, we study the privacy threats from OpenAI's model APIs and New Bing enhanced by ChatGPT and show that application-integrated LLMs may cause more severe privacy threats ever than before. To this end, we conduct extensive experiments to support our claims and discuss LLMs' privacy implications.
翻译:随着大语言模型(LLMs)的快速进展,很多下游NLP任务可以在良好的提示下得到很好的解决。尽管模型开发人员和研究人员努力在对话安全方面工作,以避免从LLMs生成有害内容,但是引导AI生成的内容(AIGC)以实现人类利益仍然具有挑战性。由于强大的LLMs正在吞噬来自各个领域的现有文本数据(例如,GPT-3已经训练了45TB的文本数据),自然而然地会怀疑训练数据中是否包含私有信息以及这些LLMs及其下游应用程序可能带来什么隐私威胁。在本文中,我们研究了来自OpenAI的模型API和由ChatGPT增强的New Bing的隐私威胁,并显示应用程序集成的LLMs可能比以前产生更严重的隐私威胁。为此,我们进行了大量实验来支持我们的主张,并讨论了LLMs的隐私影响。