We focus on multi-turn response selection in a retrieval-based dialog system. In this paper, we utilize the powerful pre-trained language model Bi-directional Encoder Representations from Transformer (BERT) for a multi-turn dialog system and propose a highly effective post-training method on domain-specific corpus. Although BERT is easily adopted to various NLP tasks and outperforms previous baselines of each task, it still has limitations if a task corpus is too focused on a certain domain. Post-training on domain-specific corpus (e.g., Ubuntu Corpus) helps the model to train contextualized representations and words that do not appear in general corpus (e.g., English Wikipedia). Experimental results show that our approach achieves new state-of-the-art on two response selection benchmarks (i.e., Ubuntu Corpus V1, Advising Corpus) performance improvement by 5.9% and 6% on R@1.
翻译:在基于检索的对话框系统中,我们侧重于多方向响应选择。 在本文中,我们使用来自变换器(BERT)的强有力的预先培训语言模型双向编码器演示,用于多方向对话系统,并提出了一套非常有效的关于具体领域内容的培训后方法。虽然BERT很容易被适用于各种非目标任务,并且优于每项任务之前的基线,但如果任务内容过于侧重于某一领域,它仍然有局限性。关于特定领域内容(例如Ubuntu Corpus)的后培训有助于该模型培训背景化表述和未出现在一般内容(例如英语维基百科)中的文字。实验结果显示,我们的方法在两种响应选择基准(即Ubuntu Corpus V1, Advising Corpus)上实现了新的状态性改进,在R@1上提高了5.9%和6%。