Entrainment is the phenomenon by which an interlocutor adapts their speaking style to align with their partner in conversations. It has been found in different dimensions as acoustic, prosodic, lexical or syntactic. In this work, we explore and utilize the entrainment phenomenon to improve spoken dialogue systems for voice assistants. We first examine the existence of the entrainment phenomenon in human-to-human dialogues in respect to acoustic feature and then extend the analysis to emotion features. The analysis results show strong evidence of entrainment in terms of both acoustic and emotion features. Based on this findings, we implement two entrainment policies and assess if the integration of entrainment principle into a Text-to-Speech (TTS) system improves the synthesis performance and the user experience. It is found that the integration of the entrainment principle into a TTS system brings performance improvement when considering acoustic features, while no obvious improvement is observed when considering emotion features.
翻译:培训是一个对话者调整其语言风格,使其与对话伙伴保持一致的现象,在声学、推理、法学或合成等不同层面被发现。在这项工作中,我们探索和利用内分层现象来改善语音助理的口语对话系统。我们首先研究人与人之间对话在声学特征方面存在的内分层现象,然后将分析扩大到情感特征。分析结果显示,在声学和情感特征方面都有很强的内分层迹象。根据这一结果,我们执行两种内分层政策,评估将内分层原则纳入文字对语言系统是否改进了合成性能和用户经验。我们发现,将内分层原则纳入TTS系统在考虑声学特征时会提高性能,但在考虑情感特征时没有观察到明显的改进。