The adoption of natural language generation (NLG) models can leave individuals vulnerable to the generation of harmful information memorized by the models, such as conspiracy theories. While previous studies examine conspiracy theories in the context of social media, they have not evaluated their presence in the new space of generative language models. In this work, we investigate the capability of language models to generate conspiracy theory text. Specifically, we aim to answer: can we test pretrained generative language models for the memorization and elicitation of conspiracy theories without access to the model's training data? We highlight the difficulties of this task and discuss it in the context of memorization, generalization, and hallucination. Utilizing a new dataset consisting of conspiracy theory topics and machine-generated conspiracy theories helps us discover that many conspiracy theories are deeply rooted in the pretrained language models. Our experiments demonstrate a relationship between model parameters such as size and temperature and their propensity to generate conspiracy theory text. These results indicate the need for a more thorough review of NLG applications before release and an in-depth discussion of the drawbacks of memorization in generative language models.
翻译:采用自然语言生成模型(NLG)可能会使个人易受到这些模型(如阴谋理论)对有害信息进行记忆的模型(如阴谋理论)的生成; 虽然以前的研究在社交媒体背景下研究阴谋理论,但他们没有评估其在基因化语言模型新空间中的存在; 在这项工作中,我们调查语言模型生成阴谋理论文本的能力。 具体地说,我们的目标是回答:我们能否在无法查阅该模型的培训数据的情况下测试用于记忆和引证阴谋理论的经过预先训练的基因化语言模型? 我们强调这项工作的困难,并在记忆化、概括化和幻觉的背景下讨论这一问题。 利用一个由阴谋论主题和机器生成的阴谋理论组成的新数据集帮助我们发现,许多阴谋理论深深植根于预先训练的语言模型。 我们的实验表明,规模和温度等模型参数及其生成阴谋理论文本的倾向之间存在关系。 这些结果表明,在发布之前,有必要更彻底地审查NLG的应用,并深入讨论基因化语言模型中的记忆化图。