Recent progress in deep learning has continuously improved the accuracy of dialogue response selection. In particular, sophisticated neural network architectures are leveraged to capture the rich interactions between dialogue context and response candidates. While remarkably effective, these models also bring in a steep increase in computational cost. Consequently, such models can only be used as a re-rank module in practice. In this study, we present a solution to directly select proper responses from a large corpus or even a nonparallel corpus that only consists of unpaired sentences, using a dense retrieval model. To push the limits of dense retrieval, we design an interaction layer upon the dense retrieval models and apply a set of tailor-designed learning strategies. Our model shows superiority over strong baselines on the conventional re-rank evaluation setting, which is remarkable given its efficiency. To verify the effectiveness of our approach in realistic scenarios, we also conduct full-rank evaluation, where the target is to select proper responses from a full candidate pool that may contain millions of candidates and evaluate them fairly through human annotations. Our proposed model notably outperforms pipeline baselines that integrate fast recall and expressive re-rank modules. Human evaluation results show that enlarging the candidate pool with nonparallel corpora improves response quality further.
翻译:最近深层学习的进展不断提高对话响应选择的准确性,特别是利用先进的神经网络结构来捕捉对话背景和应答候选人之间的丰富互动。这些模型虽然非常有效,但也带来计算成本的急剧增加。因此,这些模型只能作为实际的重新排序模块使用。在本研究中,我们提出了一个解决方案,从大型或甚至非平行的集合中直接选择适当的反应,这些反应仅包括未经修正的句子,使用密集的检索模型。为了推动密集检索的极限,我们设计了一个互动层,在密集检索模型上应用一套定制的学习战略。我们的模型显示在常规重新排序评价设置上优于强基线,鉴于其效率的显著,这些模型显示优于强基线。为了在现实情景中核查我们的方法的有效性,我们还进行全级评价,目标是从可能包含数百万候选人的完整候选人库中选择适当的反应,并通过人文说明加以公正评价。我们提议的模型明显地超越了将快速回顾和表达的重新排列模块整合的管道基线。人类评价结果显示,扩大候选人人才库与非分数公司改进了质量反应。