When people try to influence others to do something, they subconsciously adjust their speech to include appropriate emotional information. In order for a robot to influence people in the same way, the robot should be able to imitate the range of human emotions when speaking. To achieve this, we propose a speech synthesis method for imitating the emotional states in human speech. In contrast to previous methods, the advantage of our method is that it requires less manual effort to adjust the emotion of the synthesized speech. Our synthesizer receives an emotion vector to characterize the emotion of synthesized speech. The vector is automatically obtained from human utterances by using a speech emotion recognizer. We evaluated our method in a scenario when a robot tries to regulate an elderly person's circadian rhythm by speaking to the person using appropriate emotional states. For the target speech to imitate, we collected utterances from professional caregivers when they speak to elderly people at different times of the day. Then we conducted a subjective evaluation where the elderly participants listened to the speech samples generated by our method. The results showed that listening to the samples made the participants feel more active in the early morning and calmer in the middle of the night. This suggests that the robot may be able to adjust the participants' circadian rhythm and that the robot can potentially exert influence similarly to a person.
翻译:当人们试图影响他人做某些事情时,他们潜意识地调整自己的言辞,以包括适当的情感信息。为了使机器人能够以同样的方式影响人们,机器人应该能够模仿人类的情感。为了实现这一点,我们建议了模仿人类言语中的情绪状态的语音合成方法。与以前的方法不同,我们的方法的优点是,它要求用较少人工的努力来调整合成言语的情绪。我们的合成器得到一种情感矢量来描述合成言语的情绪。矢量器通过使用语音识别器自动从人类的言语中获取。当机器人试图用适当的情感状态来调控老年人的情调时,我们评估了我们的方法。对于要模仿的目标演讲,我们收集了专业护理人员在当天不同时间与老年人交谈时的言词。然后,我们进行了一个主观评价,让老年人的参与者聆听我们的方法所产生的言语样本。结果显示,当一个机器人试图调节老年人的清晨和在中间的节奏时,我们就会评估我们的方法。这说明,机器人可以使参与者感到更活跃,在中间的节奏。