SML:一个新的语义内嵌嵌嵌一致变异器,用于高效的跨语言自然语言推断 (SML: a new Semantic Embedding Alignment Transformer for efficient cross-lingual Natural Language Inference)

from arxiv, This research is funded by the project CIVIC: Intelligent characterisation of the veracity of the information related to COVID-19, granted by BBVA FOUNDATION GRANTS FOR SCIENTIFIC RESEARCH TEAMS SARS-CoV-2 and COVID-19

The ability of Transformers to perform with precision a variety of tasks such as question answering, Natural Language Inference (NLI) or summarising, have enable them to be ranked as one of the best paradigms to address this kind of tasks at present. NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established a relation between a hypothesis and a premise. Nevertheless, these models suffer from incapacity to generalise to other domains or difficulties to face multilingual scenarios. The leading pathway in the literature to address these issues involve designing and training extremely large architectures, which leads to unpredictable behaviours and to establish barriers which impede broad access and fine tuning. In this paper, we propose a new architecture, siamese multilingual transformer (SML), to efficiently align multilingual embeddings for Natural Language Inference. SML leverages siamese pre-trained multi-lingual transformers with frozen weights where the two input sentences attend each other to later be combined through a matrix alignment method. The experimental results carried out in this paper evidence that SML allows to reduce drastically the number of trainable parameters while still achieving state-of-the-art performance.

翻译：变换者有能力精确地完成各种问题,例如回答问题、自然语言推断(NLI)或总结等,从而使他们能够被列为目前处理这类任务的最佳范例之一。新变换者是测试这些结构的最佳方案之一,因为了解了理解复杂句子所需的知识,并在假设和前提之间建立起了一种关系。然而,这些模型无法推广到其他领域或面对多语种情景的困难。文献中解决这些问题的主要途径是设计和培训极其庞大的建筑,导致无法预测的行为,并设置障碍,妨碍广泛访问和微调。我们在本文件中提议了一个新的结构,即Siames多语言变换器(SML),以有效地将多种语言嵌入自然语言变异器与语言变异器结合起来。SLML利用了经过预先训练的、具有冻结重量的两种输入的变异器,随后通过矩阵调整方法将其组合在一起。本文中的实验结果证明,SML允许大幅削减可训练参数的数量,同时仍然实现状态性能。