This study defines a new evaluation metric for audio tagging tasks to overcome the limitation of the conventional mean average precision (mAP) metric, which treats different kinds of sound as independent classes without considering their relations. Also, due to the ambiguities in sound labeling, the labels in the training and evaluation set are not guaranteed to be accurate and exhaustive, which poses challenges for robust evaluation with mAP. The proposed metric, ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation. Specifically, we reweight the false positive events in the model prediction based on the ontology graph distance to the target classes. The OmAP measure also provides more insights into model performance by evaluations with different coarse-grained levels in the ontology graph. We conduct human evaluations and demonstrate that OmAP is more consistent with human perception than mAP. To further verify the importance of utilizing the ontology information, we also propose a novel loss function (OBCE) that reweights binary cross entropy (BCE) loss based on the ontology distance. Our experiment shows that OBCE can improve both mAP and OmAP metrics on the AudioSet tagging task.
翻译:这项研究界定了一种新的音频标记任务评价指标,以克服传统平均精确度(mAP)标准的限制,该标准将不同类型的声音作为独立类别处理,而没有考虑到它们之间的关系。此外,由于音响标签的模糊性,培训和评价组的标签不能保证准确和详尽无遗,这给与MAP进行强有力的评价带来了挑战。拟议的指标,即本科学认知平均平均精确度(OmAP),在评估期间利用音频系统本体学信息,解决了MAP的弱点。具体地说,我们在模型预测中根据离目标类别距离的本体图图图图图图图图图图图图表上的距离,对各种声音作为独立类别处理的虚假积极事件进行重新加权。 OmAP 测量还提供了对模型的更多了解,通过在本体图图图图图上不同粗略程度的评价,我们进行人类评价,并证明OmAP比MAP更符合人类的认知度。为了进一步核实利用本体学信息的重要性,我们还提议一种新的损失功能(OBCE),即根据本体图图图距离对本体积交叉摄取损失进行重新加权计算。我们实验显示OMAAP的标签改进了OBC。