We present a database of parallel recordings of speech and singing, collected and released by the Human Language Technology (HLT) laboratory at the National University of Singapore (NUS), that is called NUS-HLT Speak-Sing (NHSS) database. We release this database to the public to support research activities, that include, but not limited to comparative studies of acoustic attributes of speech and singing signals, cooperative synthesis of speech and singing voices, and speech-to-singing conversion. This database consists of recordings of sung vocals of English pop songs, the spoken counterpart of lyrics of the songs read by the singers in their natural reading manner, and manually prepared utterance-level and word-level annotations. The audio recordings in the NHSS database correspond to 100 songs sung and spoken by 10 singers, resulting in a total of 7 hours of audio data. There are 5 male and 5 female singers, singing and reading the lyrics of 10 songs each. In this paper, we discuss the design methodology of the database, analyze the similarities and dissimilarities in characteristics of speech and singing voices, and provide some strategies to address relationships between these characteristics for converting one to another. We develop benchmark systems for speech-to-singing alignment, spectral mapping and conversion using the NHSS database.
翻译:我们提供新加坡国立大学(NUS-HLT)人类语言技术(HLT)实验室收集并发布的语言和唱歌平行录音数据库,称为NUS-HLT语音-Sing(NHSS)数据库,我们向公众发布这一数据库,以支持研究活动,包括但不限于对语言和唱歌信号的声学属性进行比较研究、对语言和歌声的合作合成以及语音对声转换。这个数据库包括英语流行歌曲的歌声录音录音、歌唱以自然阅读方式阅读的歌曲歌词的口头对应词以及手动编写的发音级和字级说明。NHSS数据库的录音相当于100首歌的歌声和10名歌手的语音,导致总共7小时的音频数据。有5名男和5名女歌手,每个歌唱和读10首歌的歌词转换。在这个数据库中,我们讨论数据库的设计方法,分析语言和歌唱声音特征的异同之处,并提供一些战略来解决这些特征之间的关系,以便利用光谱调系统将NHS转换为另一个调制。我们为调制调制调制调制了NHS系统。