Sound recognition is an important and popular function of smart devices. The location of sound is basic information associated with the acoustic source. Apart from sound recognition, whether the acoustic sources can be localized largely affects the capability and quality of the smart device's interactive functions. In this work, we study the problem of concurrently localizing multiple acoustic sources with a smart device (e.g., a smart speaker like Amazon Alexa). The existing approaches either can only localize a single source, or require deploying a distributed network of microphone arrays to function. Our proposal called Symphony is the first approach to tackle the above problem with a single microphone array. The insight behind Symphony is that the geometric layout of microphones on the array determines the unique relationship among signals from the same source along the same arriving path, while the source's location determines the DoAs (direction-of-arrival) of signals along different arriving paths. Symphony therefore includes a geometry-based filtering module to distinguish signals from different sources along different paths and a coherence-based module to identify signals from the same source. We implement Symphony with different types of commercial off-the-shelf microphone arrays and evaluate its performance under different settings. The results show that Symphony has a median localization error of 0.694m, which is 68% less than that of the state-of-the-art approach.
翻译:声音识别是智能设备的一个重要和受欢迎的功能。 声音的位置是声源的基本信息。 除了声音识别外, 声源是否可以局部化主要影响智能设备互动功能的能力和质量。 在这项工作中, 我们研究同时使用智能设备( 例如亚马逊亚历克萨这样聪明的演讲者)将多个声源本地化的问题。 现有的方法要么只能将单一源本地化, 要么需要部署分布式的麦克风阵列来运行。 我们的建议叫做交响乐, 是用单一麦克风阵列解决上述问题的第一个方法。 交响器背后的洞察力是: 阵列上的麦克风的几何布局决定了同一来源在同样到达路径上的信号之间的独特关系, 而源的位置则决定了不同到达路径上的信号的DoAs( 方向- 抵达) 。 因此, 交响乐包括一个基于几何测量的过滤模块, 以区分不同来源在不同路径上的信号, 以及一个基于一致性的模块来识别同一来源的信号。 我们用阵列的麦克风的几式布局布局将决定同一来源的信号在相同路径上的独特关系, 以不同类型 Sy- syphem- sal- sal- sal- sal- sal- sal- sal- sal- smaldxxxxxxxxxxx