Sound event detection is a core module for acoustic environmental analysis. Semi-supervised learning technique allows to largely scale up the dataset without increasing the annotation budget, and recently attracts lots of research attention. In this work, we study on two advanced semi-supervised learning techniques for sound event detection. Data augmentation is important for the success of recent deep learning systems. This work studies the audio-signal random augmentation method, which provides an augmentation strategy that can handle a large number of different audio transformations. In addition, consistency regularization is widely adopted in recent state-of-the-art semi-supervised learning methods, which exploits the unlabelled data by constraining the prediction of different transformations of one sample to be identical to the prediction of this sample. This work finds that, for semi-supervised sound event detection, consistency regularization is an effective strategy, especially the best performance is achieved when it is combined with the MeanTeacher model.
翻译:声音环境分析的核心模块是声音环境分析的正确事件探测。 半受监督的学习技术在不增加批注预算的情况下在很大程度上扩大了数据集的规模,最近吸引了大量的研究关注。 在这项工作中,我们研究了两种先进的半受监督的学习技术,以探测声音事件。 数据增强对于最近的深层学习系统的成功非常重要。 这项工作研究的是音频信号随机增强方法,它提供了一种能够处理大量不同音频变异的增强战略。 此外,在最近最先进的半受监督的学习方法中广泛采用了一致性规范,这种方法通过限制对一种样本的不同变异的预测与对样本的预测完全相同,从而利用了无标签数据。 这项工作发现,对于半受监督的事件检测来说,一致性调整是一种有效的战略,特别是当它与MayTecherer模型相结合时,实现的最佳性表现。