Current NLP models are predominantly trained through a pretrain-then-finetune pipeline, where models are first pretrained on a large text corpus with a masked-language-modelling (MLM) objective, then finetuned on the downstream task. Prior work has shown that inserting an intermediate pre-training phase, with heuristic MLM objectives that resemble downstream tasks, can significantly improve final performance. However, it is still unclear (1) in what cases such intermediate pre-training is helpful, (2) whether hand-crafted heuristic objectives are optimal for a given task, and (3) whether a MLM policy designed for one task is generalizable beyond that task. In this paper, we perform a large-scale empirical study to investigate the effect of various MLM policies in intermediate pre-training. Crucially, we introduce methods to automate discovery of optimal MLM policies, by learning a masking model through either direct supervision or meta-learning on the downstream task. We investigate the effects of using heuristic, directly supervised, and meta-learned MLM policies for intermediate pretraining, on eight selected tasks across three categories (closed-book QA, knowledge-intensive language tasks, and abstractive summarization). Most notably, we show that learned masking policies outperform the heuristic of masking named entities on TriviaQA, and masking policies learned on one task can positively transfer to other tasks in certain cases.
翻译:目前的国家学习计划模式主要通过先入为主、先入为主、先入为主、先入为主、先入为主、先入为主、先入为主、然后对下游任务进行微调; 先前的工作表明,插入一个中级培训前阶段,再入为主,再入为下游任务,可大大改善最后业绩; 然而,目前还不清楚:(1) 在哪些情况下,这种中级培训前阶段有帮助;(2) 手工制作的超额目标是否对某项任务最为理想;(3) 用于一项任务的MLM政策是否在大型文本材料堆中先入为主,先行经过预先培训,然后对下游任务进行详细培训;(3) 用于一项任务的MLM政策是否超越了这项任务; 在本文中,我们进行了大规模的经验性研究,以调查各种MLM政策在中期培训前的影响; 关键地说,我们采用自动发现最佳MLM政策,通过直接监督或对下游任务进行元学习来学习; 我们调查使用超额、直接监督和元学习MLM政策对中期培训的影响,对八项任务的影响; 在三种特定案例中,我们所学的Meximal-ma-mainleximalex A/exexexximexximlegleximalding eximalding exdududududududududududududududududududududududududucalting ex lection lection lexxxxxxxxxxxxxxx。