This report describes our systems submitted to the DCASE2021 challenge task 3: sound event localization and detection (SELD) with directional interference. Our previous system based on activity-coupled Cartesian direction of arrival (ACCDOA) representation enables us to solve a SELD task with a single target. This ACCDOA-based system with efficient network architecture called RD3Net and data augmentation techniques outperformed state-of-the-art SELD systems in terms of localization and location-dependent detection. Using the ACCDOA-based system as a base, we perform model ensembles by averaging outputs of several systems trained with different conditions such as input features, training folds, and model architectures. We also use the event independent network v2 (EINV2)-based system to increase the diversity of the model ensembles. To generalize the models, we further propose impulse response simulation (IRS), which generates simulated multi-channel signals by convolving simulated room impulse responses (RIRs) with source signals extracted from the original dataset. Our systems significantly improved over the baseline system on the development dataset.
翻译:本报告介绍了我们提交DCASE2021挑战任务3的系统:健全的事件定位和检测(SELD),有方向性干扰。我们以前基于活动混合的笛卡尔抵达方向(ACCDOA)代表的系统,使我们能够用单一目标解决SELD任务。这个以ACCDOA为基础的系统,拥有高效的网络结构,称为RD3Net和数据增强技术,在本地化和根据位置进行检测方面优于最先进的SELD系统。我们利用以ACCDOA为基础的系统作为基地,通过平均使用若干系统的产出,这些系统经过不同的条件,例如输入特征、培训折叠和模型结构。我们还利用独立网络 v2 (EINV2) 系统来增加模型组合的多样性。为了推广模型,我们进一步提出脉冲反应模拟(IRS),通过使用从原始数据集中提取的源信号,生成模拟室脉冲反应信号,产生模拟多频道信号。我们的系统在开发数据集的基准系统上大大改进了我们的系统。