Masked language modeling is widely used for pretraining large language models for natural language understanding (NLU). However, random masking is suboptimal, allocating an equal masking rate for all tokens. In this paper, we propose InforMask, a new unsupervised masking strategy for training masked language models. InforMask exploits Pointwise Mutual Information (PMI) to select the most informative tokens to mask. We further propose two optimizations for InforMask to improve its efficiency. With a one-off preprocessing step, InforMask outperforms random masking and previously proposed masking strategies on the factual recall benchmark LAMA and the question answering benchmark SQuAD v1 and v2.
翻译:隐蔽语言模型被广泛用于自然语言理解的大型语言模型(NLU)的预培训。然而,随机遮罩是次优化的,对所有象征物分配相同的遮罩率。在本文中,我们提出InforMask,这是用于培训隐蔽语言模型的一个新的不受监督的遮罩战略。InforMask 利用Pointwise 互通信息(PMI)选择最有信息的信息符号来遮罩。我们进一步建议 InforMask 两种优化方法来提高效率。 InforMask 以一次性的预处理步骤完成随机遮罩,并在事实回顾基准LAMA和回答问题的基准 SQuAD v1 和 v2 之前提出了掩罩策略。