Semi-supervised learning and domain adaptation techniques have drawn increasing attention in the field of domestic sound event detection thanks to the availability of large amounts of unlabeled data and the relative ease to generate synthetic strongly-labeled data. In a previous work, several semi-supervised learning strategies were designed to boost the performance of a mean-teacher model. Namely, these strategies include shift consistency training (SCT), interpolation consistency training (ICT), and pseudo-labeling. However, adversarial domain adaptation (ADA) did not seem to improve the event detection accuracy further when we attempt to compensate for the domain gap between synthetic and real data. In this research, we empirically found that ICT tends to pull apart the distributions of synthetic and real data in t-SNE plots. Therefore, ICT is abandoned while SCT, in contrast, is applied to train both the student and the teacher models. With these modifications, the system successfully integrates with an ADA network, and we achieve 47.2% in the F1 score on the DCASE 2020 task 4 dataset, which is 2.1% higher than what was reported in the previous work.
翻译:半监督的学习和领域适应技术已引起国内健全事件探测领域越来越多的注意,因为有大量未贴标签的数据,而且制作合成高标签数据相对容易。在以前的一项工作中,设计了若干半监督的学习战略,目的是提高中等教师模式的性能。即,这些战略包括转移一致性培训(SCT)、内插一致性培训(ICT)和假标签。然而,当我们试图弥补合成数据与真实数据之间的域差时,对抗性领域适应(ADA)似乎没有进一步提高事件探测的准确性。在这项研究中,我们从经验上发现信通技术往往拉散了在t-SNE地块上的合成数据和真实数据的分布。因此,在应用SCT来培训学生和教师模式时,信通技术被废弃了。有了这些修改,该系统成功地与ADA网络融合了起来,我们在DCASE 2020任务4数据集的F1分中取得了47.2%的成绩,这比以前的工作报告的数据高出了2.1%。