We propose a method that can leverage unlabeled data to learn a matching model for response selection in retrieval-based chatbots. The method employs a sequence-to-sequence architecture (Seq2Seq) model as a weak annotator to judge the matching degree of unlabeled pairs, and then performs learning with both the weak signals and the unlabeled data. Experimental results on two public data sets indicate that matching models get significant improvements when they are learned with the proposed method.
翻译:我们提出一种方法,可以利用未贴标签的数据来学习在基于检索的聊天机中进行响应选择的匹配模式。 这种方法使用一个序列到序列结构模型(Seq2Seq),作为判断未贴标签配对的匹配程度的薄弱说明员,然后用薄弱信号和未贴标签数据进行学习。 两个公共数据集的实验结果表明,匹配模型在用拟议方法学习后会得到显著改进。