Response selection plays a vital role in building retrieval-based conversation systems. Despite that response selection is naturally a learning-to-rank problem, most prior works take a point-wise view and train binary classifiers for this task: each response candidate is labeled either relevant (one) or irrelevant (zero). On the one hand, this formalization can be sub-optimal due to its ignorance of the diversity of response quality. On the other hand, annotating grayscale data for learning-to-rank can be prohibitively expensive and challenging. In this work, we show that grayscale data can be automatically constructed without human effort. Our method employs off-the-shelf response retrieval models and response generation models as automatic grayscale data generators. With the constructed grayscale data, we propose multi-level ranking objectives for training, which can (1) teach a matching model to capture more fine-grained context-response relevance difference and (2) reduce the train-test discrepancy in terms of distractor strength. Our method is simple, effective, and universal. Experiments on three benchmark datasets and four state-of-the-art matching models show that the proposed approach brings significant and consistent performance improvements.
翻译:响应选择在建立基于检索的谈话系统方面起着关键作用。 尽管响应选择自然是一个从学习到学习的问题, 但大多数先前的工程都有一个点对点的视角, 并且为此任务培训了二进制分类器: 每个响应候选人被贴上相关( 1) 或无关( 零) 标签。 一方面, 这种正规化可能由于对响应质量多样性的无知而不够理想。 另一方面, 注意从学习到学习的灰尺度数据可能过于昂贵且具有挑战性。 在这项工作中, 我们显示灰尺度数据可以不经人类努力自动构建。 我们的方法使用现成的响应检索模型和响应生成模型作为自动灰尺度数据生成器。 我们用构建的灰尺度数据, 我们提出了多层次的培训目标, 它可以 (1) 教授匹配模型, 以捕捉到更细的因应变的因应变关联性差异, (2) 减少在分流力方面的火车测试差异。 我们的方法简单、有效和通用。 我们的方法是简单、有效和通用的。 我们的方法在三个基准数据集上进行了实验, 以及四个最先进的匹配模型显示拟议方法带来显著和一致的业绩改进。