Pretrained language models have shown success in various areas of natural language processing, including reading comprehension tasks. However, when applying machine learning methods to new domains, labeled data may not always be available. To address this, we use supervised pretraining on source-domain data to reduce sample complexity on domain-specific downstream tasks. We evaluate zero-shot performance on domain-specific reading comprehension tasks by combining task transfer with domain adaptation to fine-tune a pretrained model with no labelled data from the target task. Our approach outperforms Domain-Adaptive Pretraining on downstream domain-specific reading comprehension tasks in 3 out of 4 domains.
翻译:预先培训的语言模式在自然语言处理的各个领域都取得了成功,包括阅读理解任务。然而,在对新领域应用机器学习方法时,可能并不总是有标签数据。为解决这一问题,我们使用源域数据监督培训前培训,以减少具体领域下游任务的样本复杂性。我们通过将任务转移与领域适应结合起来,对预先培训的模式进行微调,而目标任务中没有标签数据。我们的方法优于4个领域中3个领域下游域特定阅读理解任务Domain-Adaptaint培训。