Question answering (QA) has recently shown impressive results for answering questions from customized domains. Yet, a common challenge is to adapt QA models to an unseen target domain. In this paper, we propose a novel self-supervised framework called QADA for QA domain adaptation. QADA introduces a novel data augmentation pipeline used to augment training QA samples. Different from existing methods, we enrich the samples via hidden space augmentation. For questions, we introduce multi-hop synonyms and sample augmented token embeddings with Dirichlet distributions. For contexts, we develop an augmentation method which learns to drop context spans via a custom attentive sampling strategy. Additionally, contrastive learning is integrated in the proposed self-supervised adaptation framework QADA. Unlike existing approaches, we generate pseudo labels and propose to train the model via a novel attention-based contrastive adaptation method. The attention weights are used to build informative features for discrepancy estimation that helps the QA model separate answers and generalize across source and target domains. To the best of our knowledge, our work is the first to leverage hidden space augmentation and attention-based contrastive adaptation for self-supervised domain adaptation in QA. Our evaluation shows that QADA achieves considerable improvements on multiple target datasets over state-of-the-art baselines in QA domain adaptation.
翻译:问题解答(QA)最近为回答自定制域的问题展示了令人印象深刻的结果。然而,一个共同的挑战是如何将自控模式调整到一个看不见的目标域。在本文中,我们提议了一个名为QADA用于QA域适应的新型自我监督框架。QAD 引入了一个新的数据增强管道,用于加强培训质量A样本。与现有方法不同,我们通过隐藏的空间扩增来丰富样本。关于问题,我们引入了多点点同义词和样本,扩大了Dirichlet分布的象征性嵌入。关于环境,我们开发了一个增强方法,通过定制的仔细取样战略学习将背景拉下。此外,对比学习被纳入了拟议的自监管适应框架QAADA。与现有方法不同,我们制作了假标签,并提议通过新颖的基于关注的对比适应方法来培训模型。我们利用了关注权来构建差异估算信息特征,帮助QA模型的单独答案,并在不同源和目标领域进行概括。我们的知识中,我们的工作是首先利用隐藏的空间扩增和关注的版域调整,从而实现对基准的自我调整。