There are concerns that the ability of language models (LMs) to generate high quality synthetic text can be misused to launch spam, disinformation, or propaganda. Therefore, the research community is actively working on developing approaches to detect whether a given text is organic or synthetic. While this is a useful first step, it is important to be able to further fingerprint the author LM to attribute its origin. Prior work on fingerprinting LMs is limited to attributing synthetic text generated by a handful (usually < 10) of pre-trained LMs. However, LMs such as GPT2 are commonly fine-tuned in a myriad of ways (e.g., on a domain-specific text corpus) before being used to generate synthetic text. It is challenging to fingerprinting fine-tuned LMs because the universe of fine-tuned LMs is much larger in realistic scenarios. To address this challenge, we study the problem of large-scale fingerprinting of fine-tuned LMs in the wild. Using a real-world dataset of synthetic text generated by 108 different fine-tuned LMs, we conduct comprehensive experiments to demonstrate the limitations of existing fingerprinting approaches. Our results show that fine-tuning itself is the most effective in attributing the synthetic text generated by fine-tuned LMs.
翻译:人们担心语言模型(LMS)生成高质量合成文本的能力可能被滥用于启动垃圾邮件、虚假信息或宣传,因此,研究界正在积极制定方法,以发现某一文本是有机的还是合成的。虽然这是一个有益的第一步,但必须能够进一步指纹作者LMM,以说明其来源。以前关于指纹LMS的工作仅限于将少数受过培训的LMS(通常小于10)生成的合成文本归为一类。然而,GPT2等LMS(GPT2)等合成文本通常在用来生成合成文本之前,以多种方式(例如,关于特定域文本)进行微调,然后加以微调方法;由于微调LMS的宇宙在现实情景下要大得多,因此很难对微调LMS进行指纹鉴定。为了应对这一挑战,我们研究了野生微调LMS的大规模指纹鉴定问题。使用108种经过微调LMS制作的合成文本真实世界数据集,我们进行全面的实验,以展示现有指纹方法的局限性。我们的成果显示,通过合成MSimimalimalaldations制作的微调制本身是最有效的。