Agent assistance during human-human customer support spoken interactions requires triggering workflows based on the caller's intent (reason for call). Timeliness of prediction is essential for a good user experience. The goal is for a system to detect the caller's intent at the time the agent would have been able to detect it (Intent Boundary). Some approaches focus on predicting the output offline, i.e. once the full spoken input (e.g. the whole conversational turn) has been processed by the ASR system. This introduces an undesirable latency in the prediction each time the intent could have been detected earlier in the turn. Recent work on voice assistants has used incremental real-time predictions at a word-by-word level to detect intent before the end of a command. Human-directed and machine-directed speech however have very different characteristics. In this work, we propose to apply a method developed in the context of voice-assistant to the problem of online real time caller's intent detection in human-human spoken interactions. We use a dual architecture in which two LSTMs are jointly trained: one predicting the Intent Boundary (IB) and then other predicting the intent class at the IB. We conduct our experiments on our private dataset comprising transcripts of human-human telephone conversations from the telecom customer support domain. We report results analyzing both the accuracy of our system as well as the impact of different architectures on the trade off between overall accuracy and prediction latency.
翻译:人文客户支持期间的协助 电话互动需要根据打电话者的意图触发工作流程。 预测的及时性对于良好的用户经验至关重要。 目标是建立一个系统,在代理者能够检测的时候检测到打电话者的意图( 内在边界) 。 有些方法侧重于预测离线产出, 即一旦全话输入( 如整个谈话转转弯) 由ASR系统处理后, 就会在每次可以提前发现意图的预测中造成不可取的延迟。 最近的声音助理工作使用逐字逐字的递增实时预测, 以在命令结束之前检测意向。 人文指导和机器引导的演讲具有非常不同的特性 。 在这项工作中, 我们提议采用一种在语音辅助下开发的方法, 解决在线实时呼叫者的意图在人文谈话互动中检测的问题 。 我们使用一种双重结构, 即两个LSTMS 联合培训了两个LSTMS: 一次逐字级实时预测, 以在命令结束前检测意图。 人类域域域域图(IB) 预测我们客户域域内部的准确性分析结果, 以及我们内部域域图中的其他系统 。