Two interlocking research questions of growing interest and importance in privacy research are Authorship Attribution (AA) and Authorship Obfuscation (AO). Given an artifact, especially a text t in question, an AA solution aims to accurately attribute t to its true author out of many candidate authors while an AO solution aims to modify t to hide its true authorship. Traditionally, the notion of authorship and its accompanying privacy concern is only toward human authors. However, in recent years, due to the explosive advancements in Neural Text Generation (NTG) techniques in NLP, capable of synthesizing human-quality open-ended texts (so-called "neural texts"), one has to now consider authorships by humans, machines, or their combination. Due to the implications and potential threats of neural texts when used maliciously, it has become critical to understand the limitations of traditional AA/AO solutions and develop novel AA/AO solutions in dealing with neural texts. In this survey, therefore, we make a comprehensive review of recent literature on the attribution and obfuscation of neural text authorship from a Data Mining perspective, and share our view on their limitations and promising research directions.
翻译:在隐私研究中,引起越来越多的兴趣和重要性的两个相互关联的研究问题是作者归属(AA)和作者不同意(AO)。鉴于一件艺术品,特别是某个文本问题,一个AA解决方案的目的是将许多候选作者的真作者准确地归结于其真正的作者,而AAO解决方案则旨在修改其真正的作者,以掩盖其真正的作者身份。传统上,作者身份的概念及其伴随的隐私关切只针对人类作者。然而,近年来,由于NLP的神经文本生成(NTG)技术突飞猛进,能够综合人质量的开放文本(所谓的“神经文本 ” ),因此,我们现在必须考虑人类、机器或其组合的作者身份。由于神经文本被恶意使用时的影响和潜在威胁,理解传统AA/AO解决方案的局限性和制定处理神经文本的新颖的A/AO解决方案变得至关重要。因此,在本次调查中,我们对最近关于神经文本属性归属和模糊化的文献(所谓的“神经系统研究方向)的文献进行了全面审查,从我们有希望的作者的角度,从数据采集和共享的文献限制的角度,对关于其空间研究方向的文献进行了数据分析。