The L3DAS21 Challenge is aimed at encouraging and fostering collaborative research on machine learning for 3D audio signal processing, with particular focus on 3D speech enhancement (SE) and 3D sound localization and detection (SELD). Alongside with the challenge, we release the L3DAS21 dataset, a 65 hours 3D audio corpus, accompanied with a Python API that facilitates the data usage and results submission stage. Usually, machine learning approaches to 3D audio tasks are based on single-perspective Ambisonics recordings or on arrays of single-capsule microphones. We propose, instead, a novel multichannel audio configuration based multiple-source and multiple-perspective Ambisonics recordings, performed with an array of two first-order Ambisonics microphones. To the best of our knowledge, it is the first time that a dual-mic Ambisonics configuration is used for these tasks. We provide baseline models and results for both tasks, obtained with state-of-the-art architectures: FaSNet for SE and SELDNet for SELD. This report is aimed at providing all needed information to participate in the L3DAS21 Challenge, illustrating the details of the L3DAS21 dataset, the challenge tasks and the baseline models.
翻译:L3DAS21挑战旨在鼓励和促进关于3D音频信号处理机器学习的合作研究,特别侧重于3D语音增强(SE)和3D音响定位和检测(SELD)。除了挑战外,我们还发布了L3DAS21数据集,即65小时3D音箱,并配有便利数据使用和结果提交阶段的Python API。通常,3D音工作的机器学习方法以单透视双侧侧侧侧侧音录音或单臂麦克风阵列为基础。我们提议以多源和多透视双侧侧音扩音和探测(SELD)为主的新型多频道音频配置录音。报告的目的是提供所有所需的挑战性数据、LDA3 基准数据、LDA3 基准数据、LDS3 基准数据、LDA3 基准数据、LDS3 基准数据、LDS3 任务、LDS3 基准数据、LDS 任务、LDS3 基准数据。