检测您想要的东西: 目标声音探测 (Detect what you want: Target Sound Detection)

Human beings can perceive a target sound type from a multi-source mixture signal by the selective auditory attention, however, such functionality was hardly ever explored in machine hearing. This paper addresses the target sound detection (TSD) task, which aims to detect the target sound signal from a mixture audio when a target sound's reference audio is given. We present a novel target sound detection network (TSDNet) which consists of two main parts: A conditional network which aims at generating a sound-discriminative conditional embedding vector representing the target sound, and a detection network which takes both the mixture audio and the conditional embedding vector as inputs and produces the detection result of the target sound. These two networks can be jointly optimized with a multi-task learning approach to further improve the performance. In addition, we study both strong-supervised and weakly-supervised strategies to train TSDNet and propose a data augmentation method by mixing two samples. To facilitate this research, we build a target sound detection dataset (\textit{i.e.} URBAN-TSD) based on URBAN-SED and UrbanSound8K datasets, and experimental results indicate our method could get the segment-based F scores of 76.3$\%$ and 56.8$\%$ on the strongly-labelled and weakly-labelled data respectively.

翻译：人类可以通过有选择的听觉注意到,从多源混合信号中看到目标声音类型,但这种功能在机器听觉中几乎从未探索过。本文涉及目标声音探测(TSD)任务,目的是在提供目标声音参考音频时从混合音频中探测目标声音信号。我们提出了一个由两个主要部分组成的新的目标声音探测网络(TSDNet),它由两个主要部分组成:一个有条件的网络,目的是产生一种代表目标声音的有声分辨的有条件嵌入矢量,一个检测网络,将混合物音频和有条件嵌入矢量作为投入,并产生目标声音的检测结果。这两个网络可以与多任务学习方法共同优化,以进一步改进性能。此外,我们还研究一个强力监控和弱力监控战略,以培训TSDNet,并通过混合两个样本提出数据增强数据的方法。为了便利这一研究,我们根据URBAN-SED$和城市SoundQQR8的低值数据部分和实验结果,可以有力地显示我们分别以76美元和城市SoundQQQ的标签数据评分数。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日