Sound event localization and detection (SELD) is a combined task of identifying the sound event and its direction. Deep neural networks (DNNs) are utilized to associate them with the sound signals observed by a microphone array. Although ambisonic microphones are popular in the literature of SELD, they might limit the range of applications due to their predetermined geometry. Some applications (including those for pedestrians that perform SELD while walking) require a wearable microphone array whose geometry can be designed to suit the task. In this paper, for the development of such a wearable SELD, we propose a dataset named Wearable SELD dataset. It consists of data recorded by 24 microphones placed on a head and torso simulators (HATS) with some accessories mimicking wearable devices (glasses, earphones, and headphones). We also provide experimental results of SELD using the proposed dataset and SELDNet to investigate the effect of microphone configuration.
翻译:声音事件定位和探测( SELD) 是确定声音事件及其方向的综合任务。深神经网络( DNNS) 用于将其与麦克风阵列所观测的音讯连接起来。 虽然在SELD文献中,氨基麦克风很受欢迎,但由于其预定的几何特征,它们可能会限制应用范围。有些应用(包括行走时执行SELD的行人)需要穿戴式麦克风阵列,这些麦克风阵列的几何可以设计为适合任务。在本文中,为了开发这种可磨损的 SELD,我们提议了一个称为wearable SELD数据集。它由24个放在头部和托尔索模拟器(HATS)的麦克风所记录的数据以及一些与可磨损装置(玻璃、耳机和耳机)相配合的配件所记录的数据组成。我们还利用提议的数据集和 SELDNet来调查麦克风配置的效果,提供SELD的实验结果。