Existing contrastive learning methods for anomalous sound detection refine the audio representation of each audio sample by using the contrast between the samples' augmentations (e.g., with time or frequency masking). However, they might be biased by the augmented data, due to the lack of physical properties of machine sound, thereby limiting the detection performance. This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample. The proposed two-stage method uses contrastive learning to pretrain the audio representation model by incorporating machine ID and a self-supervised ID classifier to fine-tune the learnt model, while enhancing the relation between audio features from the same ID. Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification in overall anomaly detection performance and stability on DCASE 2020 Challenge Task2 dataset.
翻译:---
对于异常声音检测,现有的对比学习方法通过使用样本数据增强(例如时间或频率遮罩)的样本对比来优化音频表示。然而,由于缺少机器声音的物理属性,这样的方法可能会受到增强数据的影响,从而限制检测性能。本文提出了一种两阶段方法,使用对比学习来为每个机器ID优化音频表示,而不是为每个音频样本进行优化。该方法利用对比学习预训练音频表示模型,并利用自监督ID分类器微调模型,同时增强来自相同ID的音频特征之间的相关性。实验结果表明,我们的方法在DCASE 2020 Challenge Task2数据集上在整体异常检测性能和稳定性方面均优于使用对比学习或自监督分类的现有方法。