Task-adaptive pre-training (TAPT) alleviates the lack of labelled data and provides performance lift by adapting unlabelled data to downstream task. Unfortunately, existing adaptations mainly involve deterministic rules that cannot generalize well. Here, we propose Clozer, a sequence-tagging based cloze answer extraction method used in TAPT that is extendable for adaptation on any cloze-style machine reading comprehension (MRC) downstream tasks. We experiment on multiple-choice cloze-style MRC tasks, and show that Clozer performs significantly better compared to the oracle and state-of-the-art in escalating TAPT effectiveness in lifting model performance, and prove that Clozer is able to recognize the gold answers independently of any heuristics.
翻译:任务调整前培训(TAPT)可以缓解标签数据缺乏的情况,并通过调整未标签数据以适应下游任务而提高绩效。不幸的是,现有的适应措施主要涉及不能全面概括的决定性规则。在这里,我们提议了Clozer,这是TAPT中使用的一种基于链条的凝聚回答抽取方法,可以用于适应任何凝块式机器阅读理解(MRC)下游任务。我们试验了多选择式的凝聚式MRC任务,并表明Clozer在提升TAPT模型性能效率方面比甲骨文和最新工艺表现要好得多,并且证明Clozer能够独立于任何超自然学的金质答案。