To support machine learning of cross-language prosodic mappings and other ways to improve speech-to-speech translation, we present a protocol for collecting closely matched pairs of utterances across languages, a description of the resulting data collection, and some observations and musings. This report is intended for 1) people using the corpus, 2) people extending the corpus, and 3) people designing similar collections of bilingual dialog data.
翻译:为了支持跨语言预言绘图的机器学习以及改进语音对语音翻译的其他方法,我们提交了一份协议,以收集各种语言之间密切匹配的配对语句,描述由此而来的数据收集以及一些观察和感应,本报告针对:(1) 使用该文体的人,(2) 扩展该文体的人,(3) 设计类似双语对话数据集的人。