Contextualized representations from a pre-trained language model are central to achieve a high performance on downstream NLP task. The pre-trained BERT and A Lite BERT (ALBERT) models can be fine-tuned to give state-ofthe-art results in sentence-pair regressions such as semantic textual similarity (STS) and natural language inference (NLI). Although BERT-based models yield the [CLS] token vector as a reasonable sentence embedding, the search for an optimal sentence embedding scheme remains an active research area in computational linguistics. This paper explores on sentence embedding models for BERT and ALBERT. In particular, we take a modified BERT network with siamese and triplet network structures called Sentence-BERT (SBERT) and replace BERT with ALBERT to create Sentence-ALBERT (SALBERT). We also experiment with an outer CNN sentence-embedding network for SBERT and SALBERT. We evaluate performances of all sentence-embedding models considered using the STS and NLI datasets. The empirical results indicate that our CNN architecture improves ALBERT models substantially more than BERT models for STS benchmark. Despite significantly fewer model parameters, ALBERT sentence embedding is highly competitive to BERT in downstream NLP evaluations.
翻译:培训前语言模式的背景表现对于在下游国家语言平台任务中取得高绩效至关重要。经过培训的BERT和Lite BERT(ALBERT)模型可以进行微调,以在语义相似性(STS)和自然语言推论(NLI)等句式回归中取得最先进的结果。尽管基于BERT的模型将[CLS]象征性矢量作为合理的嵌入,但寻找最佳判决嵌入计划仍然是计算语言方面一个积极的研究领域。本文探讨了BERT和ALBERT的句内嵌模型。特别是,我们采用了一个经过修改的BERT网络,使用Siames和三重网络结构,称为SBERT(SBERT),用ALERT取代了B,以创建判决-ALERT(S) 。我们还试验了SBERT和SALB(SERB) 模型的外加固性调整网络。我们评估了所有采用STS和NLIT数据设置模型的指数模型的性能比STERERB高得多。