This paper describes that semi-supervised learning called peer collaborative learning (PCL) can be applied to the polyphonic sound event detection (PSED) task, which is one of the tasks in the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge. Many deep learning models have been studied to find out what kind of sound events occur where and for how long in a given audio clip. The characteristic of PCL used in this paper is the combination of ensemble-based knowledge distillation into sub-networks and student-teacher model-based knowledge distillation, which can train a robust PSED model from a small amount of strongly labeled data, weakly labeled data, and a large amount of unlabeled data. We evaluated the proposed PCL model using the DCASE 2019 Task 4 datasets and achieved an F1-score improvement of about 10% compared to the baseline model.
翻译:本文介绍半监督的学习,称为同侪合作学习(PCL),可适用于多声事件探测(PSED)任务,这是声学场景和事件探测和分类(DCASE)任务之一。许多深层学习模型已经研究,以找出在给定音频剪辑中发生何种声音事件的地点和时间。本文件使用的PCL的特征是将基于共同点的知识蒸馏成子网络和基于学生-教师模型的知识蒸馏相结合,这可以从少量贴有强烈标签的数据、贴有薄弱标签的数据和大量未贴标签的数据中,培养出一个强大的PSED模型。我们使用DCASE 2019任务4数据集对拟议的PCL模型进行了评估,并实现了与基线模型相比大约10%的F1核心改进。