This dissertation proposes a framework of user-centered security in Natural Language Processing (NLP), and demonstrates how it can improve the accessibility of related research. Accordingly, it focuses on two security domains within NLP with great public interest. First, that of author profiling, which can be employed to compromise online privacy through invasive inferences. Without access and detailed insight into these models' predictions, there is no reasonable heuristic by which Internet users might defend themselves from such inferences. Secondly, that of cyberbullying detection, which by default presupposes a centralized implementation; i.e., content moderation across social platforms. As access to appropriate data is restricted, and the nature of the task rapidly evolves (both through lexical variation, and cultural shifts), the effectiveness of its classifiers is greatly diminished and thereby often misrepresented. Under the proposed framework, we predominantly investigate the use of adversarial attacks on language; i.e., changing a given input (generating adversarial samples) such that a given model does not function as intended. These attacks form a common thread between our user-centered security problems; they are highly relevant for privacy-preserving obfuscation methods against author profiling, and adversarial samples might also prove useful to assess the influence of lexical variation and augmentation on cyberbullying detection.
翻译:这一论文提出了自然语言处理中以用户为中心的安全框架,并展示了如何改善相关研究的可获取性。因此,它侧重于自然语言处理中的两个安全领域,具有巨大的公众利益。首先,作者特征分析可以用来通过侵入性推断损害在线隐私。在没有访问和详细了解这些模型的预测的情况下,互联网用户可能用来保护自己不受这种推理的影响是没有道理的。其次,网络欺凌检测(默认情况下以集中实施为先决条件);即社会平台之间的内容节制。由于对适当数据的获取受到限制,以及任务的性质迅速演变(通过法律变异和文化转变),其分类者的有效性大为削弱,因此往往被歪曲。根据拟议框架,我们主要调查对语言使用对抗性攻击的情况,即改变给定的输入(产生对抗性样本),使特定模型无法发挥预期的作用。这些袭击是我们以用户为中心的安全平台之间的一个共同线索。这些攻击构成了我们以用户为中心的安全问题,而任务的性质也迅速演变(通过法律变换和文化变换),因此,其分类者的有效性被大大削弱,因此往往被曲解语言;我们主要调查了某种特定模式(即改变),从而无法发挥预期的作用。