The recent integration of generative neural strategies and audio processing techniques have fostered the widespread of synthetic speech synthesis or transformation algorithms. This capability proves to be harmful in many legal and informative processes (news, biometric authentication, audio evidence in courts, etc.). Thus, the development of efficient detection algorithms is both crucial and challenging due to the heterogeneity of forgery techniques. This work investigates the discriminative role of silenced parts in synthetic speech detection and shows how first digit statistics extracted from MFCC coefficients can efficiently enable a robust detection. The proposed procedure is computationally-lightweight and effective on many different algorithms since it does not rely on large neural detection architecture and obtains an accuracy above 90\% in most of the classes of the ASVSpoof dataset.
翻译:最近基因神经战略和音频处理技术的整合促进了合成言语合成或转换算法的普及,这种能力在许多法律和信息过程(新闻、生物鉴别认证、法院的声学证据等)中证明是有害的,因此,由于伪造技术的异质性,有效检测算法的发展既至关重要,也具有挑战性。这项工作调查了沉默部分在合成言语检测中的歧视性作用,并表明从MFCC系数中提取的第一位数字统计数据如何能够有效地进行有力的检测。拟议程序对许多不同的算法具有计算性轻量和有效性,因为它不依赖大型神经检测结构,在ASVSpoof数据集的大多数类别中,其准确性都超过了90。