Sound event detection is to infer the event by understanding the surrounding environmental sounds. Due to the scarcity of rare sound events, it becomes challenging for the well-trained detectors which have learned too much prior knowledge. Meanwhile, few-shot learning methods promise a good generalization ability when facing a new limited-data task. Recent approaches have achieved promising results in this field. However, these approaches treat each support example independently, ignoring the information of other examples from the whole task. Because of this, most of previous methods are constrained to generate a same feature embedding for all test-time tasks, which is not adaptive to each inputted data. In this work, we propose a novel task-adaptive module which is easy to plant into any metric-based few-shot learning frameworks. The module could identify the task-relevant feature dimension. Incorporating our module improves the performance considerably on two datasets over baseline methods, especially for the transductive propagation network. Such as +6.8% for 5-way 1-shot accuracy on ESC-50, and +5.9% on noiseESC-50. We investigate our approach in the domain-mismatch setting and also achieve better results than previous methods.
翻译:通过了解周围环境声音来推断事件。 由于罕见的声音事件很少, 它对于受过良好训练的探测器来说具有挑战性, 这些探测器已经学到了太多先前的知识。 同时, 少见的学习方法在面对新的有限数据任务时, 有望带来良好的概括性能力。 最近的方法在这一领域取得了令人乐观的成果。 但是, 这些方法独立地对待每一种支持, 忽略了整个任务中其他例子的信息。 由于这个原因, 大多数以前的方法都被迫为所有测试时间任务生成一个相同的特性, 而这对输入的数据并不具有适应性。 在这项工作中, 我们提出了一个新颖的任务适应模块, 这个模块很容易植入任何基于标准的少见的学习框架。 这个模块可以确定任务相关的特性层面。 整合我们的模块可以大大改进两个数据集的性能, 而不是基线方法, 特别是传输传播网络。 例如, ESC- 50 的5way 1 直径精确度为+6.8%, 噪音ESC- 50 +5.9%。 我们调查我们在域- 设置方法中采用的方法, 并取得比以前更好的结果。