With news and information being as easy to access as they currently are, it is more important than ever to ensure that people are not mislead by what they read. Recently, the rise of neural fake news (AI-generated fake news) and its demonstrated effectiveness at fooling humans has prompted the development of models to detect it. One such model is the Grover model, which can both detect neural fake news to prevent it, and generate it to demonstrate how a model could be misused to fool human readers. In this work we explore the Grover model's fake news detection capabilities by performing targeted attacks through perturbations on input news articles. Through this we test Grover's resilience to these adversarial attacks and expose some potential vulnerabilities which should be addressed in further iterations to ensure it can detect all types of fake news accurately.
翻译:由于新闻和信息与目前一样容易获取,现在比以往任何时候都更加重要的是要确保人们不会被阅读的内容误导。 最近,神经假消息(AI产生的假消息)的兴起及其在愚弄人类方面的明显效力促使人们开发了模型来检测它。 其中一个模型就是格罗弗模型,它既能探测神经假消息来防止它,又能用来证明如何滥用模型来愚弄人类读者。 在这项工作中,我们探索了格罗弗模型的假新闻探测能力,通过对输入新闻文章进行干扰进行有针对性的攻击。 通过这个方法,我们测试格罗弗对这些对抗性攻击的承受力,并暴露出一些潜在的弱点,这些弱点应在进一步的重复中加以处理,以确保它能够准确检测所有类型的假消息。