This study investigated the effect of synthetic voice of conversational agent trained with spontaneous speech on human interactants. Specifically, we hypothesized that humans will exhibit more social responses when interacting with conversational agent that has a synthetic voice built on spontaneous speech. Typically, speech synthesizers are built on a speech corpus where voice professionals read a set of written sentences. The synthesized speech is clear as if a newscaster were reading a news or a voice actor were playing an anime character. However, this is quite different from spontaneous speech we speak in everyday conversation. Recent advances in speech synthesis enabled us to build a speech synthesizer on a spontaneous speech corpus, and to obtain a near conversational synthesized speech with reasonable quality. By making use of these technology, we examined whether humans produce more social responses to a spontaneously speaking conversational agent. We conducted a large-scale conversation experiment with a conversational agent whose utterances were synthesized with the model trained either with spontaneous speech or read speech. The result showed that the subjects who interacted with the agent whose utterances were synthesized from spontaneous speech tended to show shorter response time and a larger number of backchannels. The result of a questionnaire showed that subjects who interacted with the agent whose utterances were synthesized from spontaneous speech tended to rate their conversation with the agent as closer to a human conversation. These results suggest that speech synthesis built on spontaneous speech is essential to realize a conversational agent as a social actor.
翻译:这项研究调查了通过自发性言论培训的谈话代理人合成声音的影响。 具体地说, 我们假设在与以自发性言论为基础的合成声音的谈话代理人互动时, 人类将表现出更多的社会反应。 通常, 语音合成器建在语音资料库上, 供语音专业人员阅读一套书面句子。 合成的演讲显然像一个新闻播客正在阅读新闻, 或者一个语音演员正在玩动听的。 但是, 这与我们在日常对话中自发性言论所讲的自发性言论有很大不同。 近期的语音合成进展使我们能够在自发性言论资料库上建立一个语音合成器, 并获得一种接近于合理质量的谈话合成声音。 通过使用这些技术, 我们审视了人类是否对自发性谈话代理人做出更多的社会反应。 我们用一个大型的谈话实验, 与一个通过自发性言论或阅读性言论训练的演讲综合的交谈器进行了大规模交谈。 结果显示, 与从自发性言论合成的演讲所合成的演讲者, 显示一个更短的反应时间, 和数量更接近于对话合成的回声筒的演讲。 我们的谈话结果显示, 一个自发性谈话结果显示, 交接的谈话结果与一个更接近于自发性对话结果, 的合成者与一个自发性对话结果, 的合成的合成的合成的合成, 其自发性谈话结果与一个同步性对话结果 的合成, 与原感感动性对话结果显示, 的合成的循环性对话结果, 和感动性对话结果, 的合成结果显示, 和感动性对话结果, 和感 的合成, 与原动性对话结果显示, 的合成的原感 的交的原感反应反应反应反应反应反应反应反应反应反应反应反应反应反应反应力 和感反应反应反应反应反应反应反应 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的原感和感 和感 和感 和感 和感 和感 性反应反应反应反应反应反应反应的 的 的 的 的 的 的 的 的 的交的 的交 的交的 的 的 的 的