Speech deepfakes are artificial voices generated by machine learning models. Previous literature has highlighted deepfakes as one of the biggest threats to security arising from progress in AI due to their potential for misuse. However, studies investigating human detection capabilities are limited. We presented genuine and deepfake audio to $n$ = 529 individuals and asked them to identify the deepfakes. We ran our experiments in English and Mandarin to understand if language affects detection performance and decision-making rationale. Detection capability is unreliable. Listeners only correctly spotted the deepfakes 73% of the time, and there was no difference in detectability between the two languages. Increasing listener awareness by providing examples of speech deepfakes only improves results slightly. The difficulty of detecting speech deepfakes confirms their potential for misuse and signals that defenses against this threat are needed.
翻译:深层的言语是机器学习模型产生的人工声音。以前的文献强调深层的假话是AI系统的进展对安全的最大威胁之一,因为它们有可能被滥用。然而,调查人类探测能力的研究是有限的。我们向529名个人展示了真实和深层的音频,要求他们识别深层假言。我们用英语和普通话进行了实验,以了解语言是否影响探测性能和决策原理。探测能力不可靠。倾听者只正确地发现了当时73%的深层假话,两种语言之间在可探测性方面没有区别。通过提供深层假言的例子提高听众的认识只能稍有改进。探测深层假言的难度证实了他们滥用的可能性,以及需要防范这一威胁的信号。