Most previous unsupervised domain adaptation (UDA) methods for question answering(QA) require access to source domain data while fine-tuning the model for the target domain. Source domain data may, however, contain sensitive information and may be restricted. In this study, we investigate a more challenging setting, source-free UDA, in which we have only the pretrained source model and target domain data, without access to source domain data. We propose a novel self-training approach to QA models that integrates a unique mask module for domain adaptation. The mask is auto-adjusted to extract key domain knowledge while trained on the source domain. To maintain previously learned domain knowledge, certain mask weights are frozen during adaptation, while other weights are adjusted to mitigate domain shifts with pseudo-labeled samples generated in the target domain. %As part of the self-training process, we generate pseudo-labeled samples in the target domain based on models trained in the source domain. Our empirical results on four benchmark datasets suggest that our approach significantly enhances the performance of pretrained QA models on the target domain, and even outperforms models that have access to the source data during adaptation.
翻译:过去大多数未经监督的回答问题(QA)的域适应方法(UDA)要求获取源域数据,同时对目标域模型进行微调。但是,源域数据可能包含敏感信息,并可能受到限制。在本研究中,我们调查了一个更具挑战性的、没有源的UDA环境,在这个环境中,我们只有经过事先培训的源模型和目标域数据,而没有获得源域数据。我们建议对QA模型采用一种新的自我培训方法,将独特的掩码模块整合到域适应中。这个掩码是自动调整的,以提取关键域知识,同时对源域进行训练。为了保持先前所学过的域知识,某些掩码重量在适应过程中被冻结,而其他重量则被调整,以减缓在目标域生成的假标签样本的域变化。%A是自我培训过程的一部分,我们在目标域中根据在源域所培训的模型生成假标签样本。我们在四个基准数据集上的经验结果表明,我们的方法极大地提高了在目标域上受过训练的QA模型的性能,甚至超越模型在调整过程中能够查阅源数据。