Digital audio tampering detection can be used to verify the authenticity of digital audio. However, most current methods use standard electronic network frequency (ENF) databases for visual comparison analysis of ENF continuity of digital audio or perform feature extraction for classification by machine learning methods. ENF databases are usually tricky to obtain, visual methods have weak feature representation, and machine learning methods have more information loss in features, resulting in low detection accuracy. This paper proposes a fusion method of shallow and deep features to fully use ENF information by exploiting the complementary nature of features at different levels to more accurately describe the changes in inconsistency produced by tampering operations to raw digital audio. The method achieves 97.03% accuracy on three classic databases: Carioca 1, Carioca 2, and New Spanish. In addition, we have achieved an accuracy of 88.31% on the newly constructed database GAUDI-DI. Experimental results show that the proposed method is superior to the state-of-the-art method.
翻译:数字音频篡改检测可用于核实数字音频的真实性。然而,大多数现行方法都使用标准电子网络频率数据库,对数字音频的ENF连续性进行视觉比较分析,或进行特征提取,以便通过机器学习方法进行分类。电子音频数据库通常难以获取,视觉方法特征代表薄弱,机器学习方法具有更多的特征信息损失,导致检测准确性低。本文建议采用浅深特征集成方法,充分利用ENF信息,利用不同级别特征的互补性,更准确地描述对原始数字音频的篡改所产生的不一致变化。该方法在三个典型数据库(Carioca 1, Carioca 2, 和新西班牙语)上实现了97.03%的准确性。此外,我们在新建的GAUDI数据库中实现了88.31%的准确性。实验结果表明,拟议方法优于最先进的方法。