The major paradigm of applying a pre-trained language model to downstream tasks is to fine-tune it on labeled task data, which often suffers instability and low performance when the labeled examples are scarce.~One way to alleviate this problem is to apply post-training on unlabeled task data before fine-tuning, adapting the pre-trained model to target domains by contrastive learning that considers either token-level or sequence-level similarity. Inspired by the success of sequence masking, we argue that both token-level and sequence-level similarities can be captured with a pair of masked sequences.~Therefore, we propose complementary random masking (CRM) to generate a pair of masked sequences from an input sequence for sequence-level contrastive learning and then develop contrastive masked language modeling (CMLM) for post-training to integrate both token-level and sequence-level contrastive learnings.~Empirical results show that CMLM surpasses several recent post-training methods in few-shot settings without the need for data augmentation.
翻译:在下游任务中应用预先培训语言模式的主要范例是将其微调到标签任务数据上,当标签例子稀少时,这些数据往往不稳定和性能低。 ~ 缓解这一问题的一个办法是在微调前对无标签任务数据进行后培训,通过反向学习,将预培训模式调整到目标领域,将象征性水平或序列级相似性考虑在内。 受序列掩码成功的影响,我们争辩说,代号级别和序列级的相似性可以用一对蒙面序列来捕捉到。 ~ 因此,我们提议通过随机补充来随机掩码(CRM),从一个输入序列中产生一对隐蔽的序列,用于序列级对比对比学习,然后开发对比式的隐蔽语言模型(CMLM),用于后培训,将象征性水平和序列级对比性对比性学习结合起来。 ~ 经验显示,代号级别MLM超越了少数环境中最近采用的培训后方法,无需数据增强。