In this work, we propose a novel and easy-to-apply data augmentation strategy, namely Bilateral Generation (BiG), with a contrastive training objective for improving the performance of ranking question answer pairs with existing labeled data. In specific, we synthesize pseudo-positive QA pairs in contrast to the original negative QA pairs with two pre-trained generation models, one for question generation, the other for answer generation, which are fine-tuned on the limited positive QA pairs from the original dataset. With the augmented dataset, we design a contrastive training objective for learning to rank question answer pairs. Experimental results on three benchmark datasets, namely TREC-QA, WikiQA, and ANTIQUE, show that our method significantly improves the performance of ranking models by making full use of existing labeled data and can be easily applied to different ranking models.
翻译:在这项工作中,我们提出了一个新颖的、易于应用的数据增强战略,即双边生成(BiG),其对比培训目标是改进现有标签数据中排名问答对的性能。 具体地说,我们综合了假正对QA配对,与最初的负QA配对形成两种经过预先培训的生成模型,一种是问题生成模型,另一种是问答生成模型,根据原始数据集中有限的正对QA配对进行微调调整。随着数据集的扩大,我们设计了一个对比培训目标,用于学习对问题回答的等级。 三个基准数据集(TREC-QA、WikiQA和ANTIQUE)的实验结果显示,我们的方法通过充分利用现有标签数据大大改进了排名模型的性能,并且很容易适用于不同的排名模型。