Unsupervised anomalous sound detection aims to detect unknown abnormal sounds of machines from normal sounds. However, the state-of-the-art approaches are not always stable and perform dramatically differently even for machines of the same type, making it impractical for general applications. This paper proposes a spectral-temporal fusion based self-supervised method to model the feature of the normal sound, which improves the stability and performance consistency in detection of anomalous sounds from individual machines, even of the same type. Experiments on the DCASE 2020 Challenge Task 2 dataset show that the proposed method achieved 81.39\%, 83.48\%, 98.22\% and 98.83\% in terms of the minimum AUC (worst-case detection performance amongst individuals) in four types of real machines (fan, pump, slider and valve), respectively, giving 31.79\%, 17.78\%, 10.42\% and 21.13\% improvement compared to the state-of-the-art method, i.e., Glow\_Aff. Moreover, the proposed method has improved AUC (average performance of individuals) for all the types of machines in the dataset. The source codes are available at https://github.com/liuyoude/STgram_MFN
翻译:未经监督的异常声音探测旨在从正常声音中探测出未知的机器异常声音,然而,最先进的方法并不总是稳定,即使对同一类型的机器,其性能也极不相同,使一般应用不切实际;本文件建议采用基于光谱-时空聚变的自我监督方法,以模拟正常声音的特征,从而改进从个别机器、甚至同一类型机器中探测异常声音的稳定性和性能一致性;在DCASE2020挑战任务2数据集上进行的实验表明,拟议的方法在四种实际机器(fan、puper、lipper和阀门)的最低AUC(个人最差的检测性能)方面,已经达到了81.39 ⁇ 、83 ⁇ 、98.22 ⁇ 和98.83 ⁇ 。