Neural text detectors aim to decide the characteristics that distinguish neural (machine-generated) from human texts. To challenge such detectors, adversarial attacks can alter the statistical characteristics of the generated text, making the detection task more and more difficult. Inspired by the advances of mutation analysis in software development and testing, in this paper, we propose character- and word-based mutation operators for generating adversarial samples to attack state-of-the-art natural text detectors. This falls under white-box adversarial attacks. In such attacks, attackers have access to the original text and create mutation instances based on this original text. The ultimate goal is to confuse machine learning models and classifiers and decrease their prediction accuracy.
翻译:神经文本探测器旨在决定神经(机器生成的)与人文文本区别的特征。为了挑战这些检测器,对抗性攻击可以改变生成的文本的统计特征,使得探测任务越来越困难。在软件开发和测试的突变分析进展的启发下,在本文件中,我们提出基于字符和字的突变操作器,用于生成对抗性样本,以攻击最先进的自然文本探测器。这属于白箱对抗性攻击。在这种攻击中,攻击者可以接触原始文本,并创造基于原始文本的突变实例。最终目的是混淆机器学习模型和分类器,降低预测的准确性。