Many datasets and approaches in ambient sound analysis use weakly labeled data.Weak labels are employed because annotating every data sample with a strong label is too expensive.Yet, their impact on the performance in comparison to strong labels remains unclear.Indeed, weak labels must often be dealt with at the same time as other challenges, namely multiple labels per sample, unbalanced classes and/or overlapping events.In this paper, we formulate a supervised learning problem which involves weak labels.We create a dataset that focuses on the difference between strong and weak labels as opposed to other challenges. We investigate the impact of weak labels when training an embedding or an end-to-end classifier.Different experimental scenarios are discussed to provide insights into which applications are most sensitive to weakly labeled data.
翻译:环境声音分析中的许多数据集和办法使用标签标签薄弱的数据。 使用错误的标签是因为说明每个带有强烈标签的数据样本太昂贵。 与强烈标签相比,它们对于性能的影响仍然不清楚。 事实上, 薄弱标签必须与其他挑战同时处理, 即每个样本的多重标签、 不平衡的等级和/ 或重叠的事件。 在本文中, 我们提出一个监管的学习问题, 涉及薄弱标签。 我们创建了一个数据集, 重点是强弱标签与其他挑战之间的差异。 我们在培训嵌入或端对端分类师时, 调查薄弱标签的影响。 讨论不同的实验情景, 以洞察哪些应用最敏感于标签薄弱的数据。