The S{\o}rensen--Dice Coefficient has recently seen rising popularity as a loss function (also known as Dice loss) due to its robustness in tasks where the number of negative samples significantly exceeds that of positive samples, such as semantic segmentation, natural language processing, and sound event detection. Conventional training of polyphonic sound event detection systems with binary cross-entropy loss often results in suboptimal detection performance as the training is often overwhelmed by updates from negative samples. In this paper, we investigated the effect of the Dice loss, intra- and inter-modal transfer learning, data augmentation, and recording formats, on the performance of polyphonic sound event detection systems with multichannel inputs. Our analysis showed that polyphonic sound event detection systems trained with Dice loss consistently outperformed those trained with cross-entropy loss across different training settings and recording formats in terms of F1 score and error rate. We achieved further performance gains via the use of transfer learning and an appropriate combination of different data augmentation techniques.
翻译:Sxo}rensen-Dice Covaltiative最近看到,由于在工作强度大大超过阳性样本(例如语义分解、自然语言处理和音效事件探测)的情况下,负面样本的数量大大超过阳性样本(例如语义分解、自然语言处理和音效事件探测),作为损失函数(又称Dice损失)的受欢迎程度不断上升。对多声声事件探测系统的常规培训,加上二进制交叉热带损失,往往导致检测工作不尽人意,因为培训往往被来自负面样本的更新资料所淹没。在本文中,我们研究了Dice损失、内部和模块间传输学习、数据增强和记录格式对多声事件探测系统在多声中投入的性能的影响。我们的分析表明,在Dice损失中接受培训的多声事件探测系统始终超越了在不同培训环境中受过交叉热带损失培训的人,在F1分和误差率方面记录格式。我们通过使用传输学习和不同数据增强技术的适当组合,取得了进一步的业绩收益。