Sound event localization and detection (SELD) involves identifying the direction-of-arrival (DOA) and the event class. The SELD methods with a class-wise output format make the model predict activities of all sound event classes and corresponding locations. The class-wise methods can output activity-coupled Cartesian DOA (ACCDOA) vectors, which enable us to solve a SELD task with a single target using a single network. However, there is still a challenge in detecting the same event class from multiple locations. To overcome this problem while maintaining the advantages of the class-wise format, we extended ACCDOA to a multi one and proposed auxiliary duplicating permutation invariant training (ADPIT). The multi- ACCDOA format (a class- and track-wise output format) enables the model to solve the cases with overlaps from the same class. The class-wise ADPIT scheme enables each track of the multi-ACCDOA format to learn with the same target as the single-ACCDOA format. In evaluations with the DCASE 2021 Task 3 dataset, the model trained with the multi-ACCDOA format and with the class-wise ADPIT detects overlapping events from the same class while maintaining its performance in the other cases. Also, the proposed method performed comparably to state-of-the-art SELD methods with fewer parameters.
翻译:稳妥事件本地化和检测( SELD) 涉及确定抵达方向和事件类别。 SELD 方法使用等级输出格式,使模型预测所有稳妥事件类别和相应地点的活动。类方法可输出活动混合的Cartesian DOA(ACCDOA)矢量,使我们能够用单一网络用单一网络用单一目标解决SELD任务。然而,在从多个地点探测同一事件类别方面仍然存在挑战。为了克服这一问题,同时保持等级格式的优势,我们将ACCDOA推广到一个多级和拟议的辅助性重复变换培训模式(ADPIT)。多级方法(ACCDOA(A)格式和跟踪性产出格式)使该模式能够解决与同一类别重叠的案件。类方法的ADPIT(A)使多级DOA格式的每一轨道都能够学习与单一ACCDOA格式相同的目标。在评价DCAS 2021 任务3数据集时,我们将ARCA 3 模式扩展为多级重复性重复的模型,同时以多级DODODA格式,同时检测其他类别模式。同时检测其他类型。