Instrument playing technique (IPT) is a key element of musical presentation. However, most of the existing works for IPT detection only concern monophonic music signals, yet little has been done to detect IPTs in polyphonic instrumental solo pieces with overlapping IPTs or mixed IPTs. In this paper, we formulate it as a frame-level multi-label classification problem and apply it to Guzheng, a Chinese plucked string instrument. We create a new dataset, Guzheng\_Tech99, containing Guzheng recordings and onset, offset, pitch, IPT annotations of each note. Because different IPTs vary a lot in their lengths, we propose a new method to solve this problem using multi-scale network and self-attention. The multi-scale network extracts features from different scales, and the self-attention mechanism applied to the feature maps at the coarsest scale further enhances the long-range feature extraction. Our approach outperforms existing works by a large margin, indicating its effectiveness in IPT detection.
翻译:演奏技巧是音乐演奏中的关键元素。然而,大多数现有的演奏技巧检测方法只关注单声道音乐信号,对于具有重叠演奏技巧或混合演奏技巧的多声道独奏曲目的演奏技巧检测几乎没有研究。本文将其作为帧级多标签分类问题并应用于具有重叠或混合演奏技巧的古筝声音信号中进行研究。我们创建了一个新的数据集 Guzheng_Tech99,其中包含古筝录音,每个音符的起始、终止、音高和演奏技巧注释。由于不同的演奏技巧在其长度上有很大差异,我们提出了一种利用多尺度网络和自注意力机制解决这个问题的新方法。多尺度网络从不同的尺度提取特征,而应用于最粗的尺度上的特征映射的自注意力机制进一步增强了长距离特征提取。我们的方法在演奏技巧检测方面优于现有作品,表明了其有效性。