Phonation mode is an essential characteristic of singing style as well as an important expression of performance. It can be classified into four categories, called neutral, breathy, pressed and flow. Previous studies used voice quality features and feature engineering for classification. While deep learning has achieved significant progress in other fields of music information retrieval (MIR), there are few attempts in the classification of phonation modes. In this study, a Residual Attention based network is proposed for automatic classification of phonation modes. The network consists of a convolutional network performing feature processing and a soft mask branch enabling the network focus on a specific area. In comparison experiments, the models with proposed network achieve better results in three of the four datasets than previous works, among which the highest classification accuracy is 94.58%, 2.29% higher than the baseline.
翻译:听觉模式是歌唱风格的一个基本特征,也是表演的一种重要表现,可以分为四类,称为中性、喘息、压抑和流动。以前的研究使用声音质量特征和特征工程进行分类。虽然在音乐信息检索(MIR)的其他领域已经取得了重大进步,但在对幻灯模式进行分类方面几乎没有什么尝试。在这项研究中,建议建立一个以剩余注意力为基础的网络,对幻灯模式进行自动分类。网络包括一个进行特征处理的革命性网络和一个软面罩分支,使网络以特定领域为重点。相比之下,在实验中,与拟议的网络模型相比,四个数据集中的三个取得了更好的结果,其中最高的分类精确度为94.58%,比基线高出2.29%。