Recent pre-trained language models (PLMs) achieved great success on many natural language processing tasks through learning linguistic features and contextualized sentence representation. Since attributes captured in stacked layers of PLMs are not clearly identified, straightforward approaches such as embedding the last layer are commonly preferred to derive sentence representations from PLMs. This paper introduces the attention-based pooling strategy, which enables the model to preserve layer-wise signals captured in each layer and learn digested linguistic features for downstream tasks. The contrastive learning objective can adapt the layer-wise attention pooling to both unsupervised and supervised manners. It results in regularizing the anisotropic space of pre-trained embeddings and being more uniform. We evaluate our model on standard semantic textual similarity (STS) and semantic search tasks. As a result, our method improved the performance of the base contrastive learned BERT_base and variants.
翻译:最近经过培训的语言模式(PLM)通过学习语言特征和背景化的句子代表方式在许多自然语言处理任务方面取得了巨大成功。由于在堆叠的PLM层中捕获的属性没有被明确确定,通常倾向于采用直接的方法,例如嵌入最后一层,而不是从PLM中得出句子表述。本文介绍了基于关注的集合战略,使该模式能够保存在每一层中捕捉到的多层信号,并学习下游任务的消化语言特征。对比式学习目标可以使从层到层的注意力集中到不受监督和监督的两种方式。它导致预先培训的嵌入室的厌异空间正规化,并且更加统一。我们评估了我们的标准语义相似性和语义相似性搜索任务的模式。结果,我们的方法改进了基础对比性学习的BERT基准和变量的性能。