Access to large corpora with strongly labelled sound events is expensive and difficult in engineering applications. Much research turns to address the problem of how to detect both the types and the timestamps of sound events with weak labels that only specify the types. This task can be treated as a multiple instance learning (MIL) problem, and the key to it is the design of a pooling function. In this paper, we propose an adaptive power pooling function which can automatically adapt to various sound sources. On two public datasets, the proposed power pooling function outperforms the state-of-the-art linear softmax pooling on both coarsegrained and fine-grained metrics. Notably, it improves the event-based F1 score (which evaluates the detection of event onsets and offsets) by 11.4% and 10.2% relative on the two datasets. While this paper focuses on sound event detection applications, the proposed method can be applied to MIL tasks in other domains.
翻译:在工程应用中,使用贴有强烈标签的声学事件大型公司的费用昂贵且困难。 许多研究转而解决如何探测那些贴有微弱标签的声学事件的类型和时间标记的问题。 这项任务可以作为一个多实例学习( MIL) 问题处理, 关键是设计一个集合功能。 在本文中, 我们提议一个适应性电源集合功能, 可以自动适应各种声音源。 在两个公共数据集中, 拟议的电源集合功能优于在粗糙和细细微的测量标准上最先进的线性软式集合。 值得注意的是, 它将基于事件的F1评分( 评估事件发生和抵消的检测) 提高11.4% 和 10.2%, 与两个数据集相对。 虽然本文侧重于声音事件探测应用, 提议的方法可以适用于其他领域的MIL 任务 。