Multiple pre-training objectives fill the vacancy of the understanding capability of single-objective language modeling, which serves the ultimate purpose of pre-trained language models (PrLMs), generalizing well on a mass of scenarios. However, learning multiple training objectives in a single model is challenging due to the unknown relative significance as well as the potential contrariety between them. Empirical studies have shown that the current objective sampling in an ad-hoc manual setting makes the learned language representation barely converge to the desired optimum. Thus, we propose \textit{MOMETAS}, a novel adaptive sampler based on meta-learning, which learns the latent sampling pattern on arbitrary pre-training objectives. Such a design is lightweight with negligible additional training overhead. To validate our approach, we adopt five objectives and conduct continual pre-training with BERT-base and BERT-large models, where MOMETAS demonstrates universal performance gain over other rule-based sampling strategies on 14 natural language processing tasks.
翻译:培训前的多重目标填补了单一客观语言模型理解能力的空缺,这种模型服务于培训前语言模型(PrLMS)的最终目的,对各种假设进行全面概括。然而,由于一个单一模型中学习多种培训目标具有未知的相对重要性以及它们之间潜在的矛盾性,因此,在一个单一模型中学习多重培训目标具有挑战性。经验研究表明,在特设手册中目前的客观抽样表明,所学语言代表几乎无法达到理想的最佳程度。因此,我们提议采用基于元学习的新型适应性取样器,以学习关于任意培训前目标的潜在取样模式。这种设计是轻量的,附加培训间接费用微不足道。为了验证我们的方法,我们采用了五个目标,并持续与BERT基地和BERT大模型进行预先培训。 MOMETAS在14项自然语言处理任务方面展示了普遍业绩优于其他基于规则的取样战略。