Shared challenges provide a venue for comparing systems trained on common data using a standardized evaluation, and they also provide an invaluable resource for researchers when the data and evaluation results are publicly released. The Blizzard Challenge and Voice Conversion Challenge are two such challenges for text-to-speech synthesis and for speaker conversion, respectively, and their publicly-available system samples and listening test results comprise a historical record of state-of-the-art synthesis methods over the years. In this paper, we revisit these past challenges and conduct a large-scale listening test with samples from many challenges combined. Our aims are to analyze and compare opinions of a large number of systems together, to determine whether and how opinions change over time, and to collect a large-scale dataset of a diverse variety of synthetic samples and their ratings for further research. We found strong correlations challenge by challenge at the system level between the original results and our new listening test. We also observed the importance of the choice of speaker on synthesis quality.
翻译:共有的挑战为比较利用标准化评价对共同数据进行训练的系统提供了一个场所,也为研究人员在公布数据和评价结果时提供了宝贵的资源。 Blizzard 挑战与语音转换挑战分别是文本合成和语音转换的两大挑战,分别是文本对语音合成和语音转换的挑战,其公开提供的系统样本和监听测试结果包括多年来最新综合方法的历史记录。在本文件中,我们再次审视这些以往的挑战,用许多挑战的样本进行大规模监听测试。我们的目的是共同分析和比较大量系统的意见,以确定意见是否和如何随时间变化,并收集各种合成样本的大规模数据集及其用于进一步研究的评级。我们发现,在系统一级,原始结果与我们新的监听测试之间面临挑战性很强的关联性。我们还注意到在合成质量上选择发言者的重要性。