Question generation has recently shown impressive results in customizing question answering (QA) systems to new domains. These approaches circumvent the need for manually annotated training data from the new domain and, instead, generate synthetic question-answer pairs that are used for training. However, existing methods for question generation rely on large amounts of synthetically generated datasets and costly computational resources, which render these techniques widely inaccessible when the text corpora is of limited size. This is problematic as many niche domains rely on small text corpora, which naturally restricts the amount of synthetic data that can be generated. In this paper, we propose a novel framework for domain adaptation called contrastive domain adaptation for QA (CAQA). Specifically, CAQA combines techniques from question generation and domain-invariant learning to answer out-of-domain questions in settings with limited text corpora. Here, we train a QA system on both source data and generated data from the target domain with a contrastive adaptation loss that is incorporated in the training objective. By combining techniques from question generation and domain-invariant learning, our model achieved considerable improvements compared to state-of-the-art baselines.
翻译:问题生成近来在定制回答问题(QA)系统到新领域方面显示出令人印象深刻的结果,这些方法避免了从新领域人工提供附加说明的培训数据的必要性,而是生成了用于培训的合成问答对口;然而,问题生成的现有方法依赖于大量合成生成的数据集和昂贵的计算资源,这使得这些技术在文本组合规模有限的情况下普遍无法使用;这是个问题,因为许多利基域依赖小文本子公司,这自然限制了可生成的合成数据的数量;在本文件中,我们提出了一个新的领域适应框架,要求对QA(CAQA)进行对比的域适应。具体地说,CAQA将问题生成和域变量学习的技术结合起来,在有限的文本组合环境下回答外部问题。我们在这里就源数据和从目标领域生成的数据培训一个QA系统,将对比性适应损失纳入培训目标。通过将问题生成技术和域变量学习技术结合起来,我们的模式与州基线相比取得了相当大的改进。