The proliferation of highly realistic singing voice deepfakes presents a significant challenge to protecting artist likeness and content authenticity. Automatic singer identification in vocal deepfakes is a promising avenue for artists and rights holders to defend against unauthorized use of their voice, but remains an open research problem. Based on the premise that the most harmful deepfakes are those of the highest quality, we introduce a two-stage pipeline to identify a singer's vocal likeness. It first employs a discriminator model to filter out low-quality forgeries that fail to accurately reproduce vocal likeness. A subsequent model, trained exclusively on authentic recordings, identifies the singer in the remaining high-quality deepfakes and authentic audio. Experiments show that this system consistently outperforms existing baselines on both authentic and synthetic content.
翻译:高度逼真的歌声深度伪造技术的泛滥,对保护艺术家形象和内容真实性构成了重大挑战。在声乐深度伪造中进行自动歌手识别,为艺术家和权利持有人提供了一条有前景的途径来防御其声音的未经授权使用,但这仍然是一个开放的研究问题。基于最具危害性的深度伪造是那些最高质量伪造的前提,我们引入了一个两阶段流程来识别歌手的声乐特征。该流程首先采用一个鉴别器模型来过滤掉那些未能准确复现声乐特征的低质量伪造品。随后,一个仅在真实录音上训练的模型,对剩余的高质量深度伪造品和真实音频进行歌手识别。实验表明,该系统在真实内容和合成内容上均持续优于现有基线方法。