Pre-training a language model and then fine-tuning it for downstream tasks has demonstrated state-of-the-art results for various NLP tasks. Pre-training is usually independent of the downstream task, and previous works have shown that this pre-training alone might not be sufficient to capture the task-specific nuances. We propose a way to tailor a pre-trained BERT model for the downstream task via task-specific masking before the standard supervised fine-tuning. For this, a word list is first collected specific to the task. For example, if the task is sentiment classification, we collect a small sample of words representing both positive and negative sentiments. Next, a word's importance for the task, called the word's task score, is measured using the word list. Each word is then assigned a probability of masking based on its task score. We experiment with different masking functions that assign the probability of masking based on the word's task score. The BERT model is further trained on MLM objective, where masking is done using the above strategy. Following this standard supervised fine-tuning is done for different downstream tasks. Results on these tasks show that the selective masking strategy outperforms random masking, indicating its effectiveness.
翻译:语言模式培训前,然后为下游任务微调语言模式。 语言模式培训前, 并随后为下游任务微调了语言模式。 预培训通常独立于下游任务, 先前的工作表明, 仅此培训前, 可能不足以捕捉任务特有的细微差别。 我们建议了一种方法, 在标准监督微调之前, 通过特定任务掩码, 来为下游任务定制经过预先培训的 BERT 模式。 为此, 首先收集了任务特有的单词列表。 例如, 如果任务为情绪分类, 我们收集了代表积极和消极情绪的少量词汇样本。 接下来, 使用单词列表衡量了该任务的重要性, 调用该词的任务评分。 每个字随后根据任务评分分配了遮掩的概率。 我们实验了不同的掩码功能, 赋予了基于该词任务评分的遮掩的概率。 BERT 模型在 MLM 目标上得到了进一步的培训, 在那里使用上述战略进行掩掩码。 在对不同的下游任务进行这一标准监督的微调整之后, 有关这些任务的结果显示其有选择的战略格式。