Although music is typically multi-label, many works have studied hierarchical music tagging with simplified settings such as single-label data. Moreover, there lacks a framework to describe various joint training methods under the multi-label setting. In order to discuss the above topics, we introduce hierarchical multi-label music instrument classification task. The task provides a realistic setting where multi-instrument real music data is assumed. Various hierarchical methods that jointly train a DNN are summarized and explored in the context of the fusion of deep learning and conventional techniques. For the effective joint training in the multi-label setting, we propose two methods to model the connection between fine- and coarse-level tags, where one uses rule-based grouped max-pooling, the other one uses the attention mechanism obtained in a data-driven manner. Our evaluation reveals that the proposed methods have advantages over the method without joint training. In addition, the decision procedure within the proposed methods can be interpreted by visualizing attention maps or referring to fixed rules.
翻译:虽然音乐通常是多标签的,但许多作品研究的是带有简化设置的等级音乐标签,如单标签数据;此外,缺乏一个框架来描述多标签设置下的各种联合培训方法;为了讨论上述议题,我们引入了等级性多标签音乐仪器分类任务;该任务提供了一个现实的环境,其中假设了多工具真实音乐数据;结合深层次学习和传统技术,总结并探讨联合培训DNN的各种等级方法;为了在多标签设置中进行有效的联合培训,我们建议两种方法来模拟精细和粗皮标签之间的联系,其中一种使用基于规则的集合最大组合,另一种使用以数据驱动方式获得的注意机制;我们的评估表明,在没有联合培训的情况下,拟议方法对方法具有优势;此外,拟议方法中的决策程序可以通过直观关注地图或提及固定规则来解释。