In this paper, we present Duplex Conversation, a multi-turn, multimodal spoken dialogue system that enables telephone-based agents to interact with customers like a human. We use the concept of full-duplex in telecommunication to demonstrate what a human-like interactive experience should be and how to achieve smooth turn-taking through three subtasks: user state detection, backchannel selection, and barge-in detection. Besides, we propose semi-supervised learning with multimodal data augmentation to leverage unlabeled data to increase model generalization. Experimental results on three sub-tasks show that the proposed method achieves consistent improvements compared with baselines. We deploy the Duplex Conversation to Alibaba intelligent customer service and share lessons learned in production. Online A/B experiments show that the proposed system can significantly reduce response latency by 50%.
翻译:在本文中,我们展示了“双面对话”,这是一个多方向、多式对话系统,使电话代理能够像人类那样与客户互动。我们在电信中使用“全双面”的概念来展示什么是人式互动经验,以及如何通过三个子任务(用户状态检测、后通道选择和驳船探测)实现平稳转轨。此外,我们提议采用“半监督的多式数据扩增”学习来利用无标签数据来增加模型的概括化。三个子任务中的实验结果显示,拟议方法与基线相比取得了一致的改进。我们把“双面对话”应用到“阿里巴巴”智能客户服务,并分享生产过程中的经验教训。在线A/B实验显示,拟议系统可以显著减少50%的应对时间。