动态减缓数据差异与重弹攻击探测的平衡焦点损失之间的数据差异 (Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection)

It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at $\href{https://github.com/asvspoof/D3M}{\text{https://github.com/asvspoof/D3M}}$.

翻译：由于高质量的回放设备的进步,现在迫切需要为脆弱的扬声器自动核查系统设计有效的反嘲笑算法。因此,为了减少培训与推断之间的数据差异,我们建议D3M, 利用平衡的焦点损失功能作为培训目标,根据样本本身的特性,动态地衡量损失。此外,在实验中,我们选择了三种特征,既包含基于星级的信息,也包含基于阶段的信息,以形成互补和信息特性。ASVPOF2019的实验结果显示,通过比较我们的系统与表现最佳的系统,我们建议D3M, 利用平衡的焦点损失功能作为培训目标,根据样本本身的特性,动态地衡量损失。此外,我们选择了三种特征,既包含基于星级的信息,也包含基于阶段的信息,以形成互补和提供信息的特性。 ASVPOOF2019的实验结果显示拟议方法的优势,通过比较我们的系统与表现最佳状态的系统,我们做了更精确的 E5MDF5 的系统, 也分别进行了更精确地模拟了常规的系统。

相关内容

RetinaNet

关注 7

RetinaNet是2018年Facebook AI团队在目标检测领域新的贡献。它的重要作者名单中Ross Girshick与Kaiming He赫然在列。来自Microsoft的Sun Jian团队与现在Facebook的Ross/Kaiming团队在当前视觉目标分类、检测领域有着北乔峰、南慕容一般的独特地位。这两个实验室的文章多是行业里前进方向的提示牌。 RetinaNet只是原来FPN网络与FCN网络的组合应用，因此在目标网络检测框架上它并无特别亮眼创新。文章中最大的创新来自于Focal loss的提出及在单阶段目标检测网络RetinaNet（实质为Resnet + FPN + FCN）的成功应用。Focal loss是一种改进了的交叉熵(cross-entropy, CE)loss，它通过在原有的CE loss上乘了个使易检测目标对模型训练贡献削弱的指数式，从而使得Focal loss成功地解决了在目标检测时，正负样本区域极不平衡而目标检测loss易被大批量负样本所左右的问题。此问题是单阶段目标检测框架（如SSD/Yolo系列）与双阶段目标检测框架（如Faster-RCNN/R-FCN等）accuracy gap的最大原因。在Focal loss提出之前，已有的目标检测网络都是通过像Boot strapping/Hard example mining等方法来解决此问题的。作者通过后续实验成功表明Focal loss可在单阶段目标检测网络中成功使用，并最终能以更快的速率实现与双阶段目标检测网络近似或更优的效果。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日