Our project task was to create a model that, given a speaker ID, chat history, and an utterance query, can predict the response utterance in a conversation. The model is personalized for each speaker. This task can be a useful tool for building speech bots that talk in a human-like manner in a live conversation. Further, we succeeded at using dense-vector encoding clustering to be able to retrieve relevant historical dialogue context, a useful strategy for overcoming the input limitations of neural-based models when predictions require longer-term references from the dialogue history. In this paper, we have implemented a state-of-the-art model using pre-training and fine-tuning techniques built on transformer architecture and multi-headed attention blocks for the Switchboard corpus. We also show how efficient vector clustering algorithms can be used for real-time utterance predictions that require no training and therefore work on offline and encrypted message histories.
翻译:我们的项目任务是创建一种模型,根据发言者的身份、聊天历史和发声询问,可以预测对话中的反应表达。该模型对每个发言者都是个性化的。该模型可以是一个有用的工具,用来在现场对话中建立语音机器人,以人样的方式说话。此外,我们成功地利用了密闭的矢量编码组合,以便能够检索相关的历史对话背景,这是在预测需要对话历史的长期引用时克服神经模型输入限制的有用战略。在本文中,我们运用了在变压器结构上建立的预培训和微调技术,以及用于交换台的多头关注区。我们还展示了如何将高效的矢量组合算法用于实时的语音预测,而这种预测不需要培训,因此也无需在离线和加密信息史上工作。