Unsupervised anomalous sound detection aims to detect unknown abnormal sounds of machines from normal sounds. However, the state-of-the-art approaches are not always stable and perform dramatically differently even for machines of the same type, making it impractical for general applications. This paper proposes a spectral-temporal fusion based self-supervised method to model the feature of the normal sound, which improves the stability and performance consistency in detection of anomalous sounds from individual machines, even of the same type. Experiments on the DCASE 2020 Challenge Task 2 dataset show that the proposed method achieved 81.39%, 83.48%, 98.22% and 98.83% in terms of the minimum AUC (worst-case detection performance amongst individuals) in four types of real machines (fan, pump, slider and valve), respectively, giving 31.79%, 17.78%, 10.42% and 21.13% improvement compared to the state-of-the-art method, i.e., Glow_Aff. Moreover, the proposed method has improved AUC (average performance of individuals) for all the types of machines in the dataset.
翻译:未经监督的异常声音探测旨在从正常声音中探测出未知的机器异常声音。 但是,最先进的方法并不总是稳定,而且即使对同一类型的机器,其性能也大不相同,因此对于一般应用来说不切实际。本文建议采用基于光谱-时空聚变的自我监督方法来模拟正常声音的特征,这提高了从单个机器、甚至同一类型机器中探测异常声音的稳定性和性能一致性。在DCASE 2020挑战任务2数据集上进行的实验表明,拟议的方法在四种类型的实际机器(凡、泵、滑轮和阀)中分别达到了81.39 %、83.48%、98.22%和98.83%的最低AUC(个人最差的检测性能),与最新方法(即Glow_Aff)相比,提高了31.79%、17.78%、10.42 %和21.13%的改进率。此外,拟议的方法改进了数据组所有类型机器的AUC(个人的平均性能)。