Generating synthesised singing voice with models trained on speech data has many advantages due to the models' flexibility and controllability. However, since the information about the temporal relationship between segments and beats are lacking in speech training data, the synthesised singing may sound off-beat at times. Therefore, the availability of the information on the temporal relationship between speech segments and music beats is crucial. The current study investigated the segment-beat synchronisation in singing data, with hypotheses formed based on the linguistics theories of P-centre and sonority hierarchy. A Mandarin corpus and an English corpus of professional singing data were manually annotated and analysed. The results showed that the presence of musical beats was more dependent on segment duration than sonority. However, the sonority hierarchy and the P-centre theory were highly related to the location of beats. Mandarin and English demonstrated cross-linguistic variations despite exhibiting common patterns.
翻译:由于这些模型具有灵活性和可控性,合成的歌声与语言数据培训模型具有许多优势,但是,由于缺少关于各部分和节拍之间的时间关系的信息,因此合成的歌声有时可能听起来不尽人意,因此,能否获得关于各部分和音乐节之间的时间关系的信息至关重要。本研究报告调查了歌声数据中分部分播放同步的情况,根据P-Centre和男尊女卑等级语言学理论提出的假设。对普通文和英文专业歌唱数据汇编进行了人工注释和分析。结果显示,音乐节的出现更多地取决于各部分的时间,而不是男尊女卑。然而,男尊女尊女卑等级和P-Centre理论与节奏的位置密切相关。尽管展示了共同模式,但曼达林和英语展示了跨语言的变化。