Recording and annotating real sound events for a sound event localization and detection (SELD) task is time consuming, and data augmentation techniques are often favored when the amount of data is limited. However, how to augment the spatial information in a dataset, including unlabeled directional interference events, remains an open research question. Furthermore, directional interference events make it difficult to accurately extract spatial characteristics from target sound events. To address this problem, we propose an impulse response simulation framework (IRS) that augments spatial characteristics using simulated room impulse responses (RIR). RIRs corresponding to a microphone array assumed to be placed in various rooms are accurately simulated, and the source signals of the target sound events are extracted from a mixture. The simulated RIRs are then convolved with the extracted source signals to obtain an augmented multi-channel training dataset. Evaluation results obtained using the TAU-NIGENS Spatial Sound Events 2021 dataset show that the IRS contributes to improving the overall SELD performance. Additionally, we conducted an ablation study to discuss the contribution and need for each component within the IRS.
翻译:记录和说明正确事件定位和探测(SELD)任务的实际声音事件耗时,在数据数量有限时,数据增强技术往往得到偏好;然而,如何在数据集中增加空间信息,包括无标签的方向干扰事件,仍然是一个开放的研究问题;此外,定向干扰事件使得很难从目标声音事件中准确提取空间特征;为解决这一问题,我们提议一个脉冲反应模拟框架(IRS),利用模拟室脉冲反应增强空间特征;对假定放在不同房间的麦克风阵列的RIR进行精确模拟,并从混合物中提取目标声音事件的源信号;模拟RIR随后与提取的源信号混在一起,以获得强化的多频道培训数据集;利用TAU-NIGENS空间声音事件2021的评估结果显示,IRS有助于改进SLD的总体性能。此外,我们进行了一项模拟研究,以讨论IRS内部每个组成部分的贡献和需要。