Pretrained Masked Language Models (MLMs) have revolutionised NLP in recent years. However, previous work has indicated that off-the-shelf MLMs are not effective as universal lexical or sentence encoders without further task-specific fine-tuning on NLI, sentence similarity, or paraphrasing tasks using annotated task data. In this work, we demonstrate that it is possible to turn MLMs into effective universal lexical and sentence encoders even without any additional data and without any supervision. We propose an extremely simple, fast and effective contrastive learning technique, termed Mirror-BERT, which converts MLMs (e.g., BERT and RoBERTa) into such encoders in less than a minute without any additional external knowledge. Mirror-BERT relies on fully identical or slightly modified string pairs as positive (i.e., synonymous) fine-tuning examples, and aims to maximise their similarity during identity fine-tuning. We report huge gains over off-the-shelf MLMs with Mirror-BERT in both lexical-level and sentence-level tasks, across different domains and different languages. Notably, in the standard sentence semantic similarity (STS) tasks, our self-supervised Mirror-BERT model even matches the performance of the task-tuned Sentence-BERT models from prior work. Finally, we delve deeper into the inner workings of MLMs, and suggest some evidence on why this simple approach can yield effective univeral lexical and sentence encoders.
翻译:在这项工作中,我们证明即使没有额外的数据,也有可能将MLMS转化为有效的通用词汇和句号编码器。我们提出了一种非常简单、快速和有效的对比学习技术,称为MAR-BERT,它将MLMS(例如,BERT和ROBERTA)在不到一分钟的时间里转换成这样的编码器,而没有额外的外部知识。 镜-BERT依靠完全相同或稍作修改的弦配对作为积极的(例如,同义)微调示例,目的是在身份微调期间使其相似性最大化。我们报告在标准级和判决级的不透明模型中,与IMAR-BERT(例如,BERT和RBBTA)相比,将MLMMMMS(例如,BERT)转换成这样的编码器,在不到一分钟的时间内,没有额外的外部知识。 镜中-BERT完全一样或略微修改的弦配对,目的是在身份微调时使其相似性最大化。我们报告在标准模型和判决级级级(SLERF)中,从这个标准、SLVLO级和不同语言中,可以建议我们的标准和SLERFLULDLDLDLDLDLULLULLLLLLLLLLLLLLLLLLLLLLLLLLLLLL(一些。