Combining several embeddings typically improves performance in downstream tasks as different embeddings encode different information. It has been shown that even models using embeddings from transformers still benefit from the inclusion of standard word embeddings. However, the combination of embeddings of different types and dimensions is challenging. As an alternative to attention-based meta-embeddings, we propose feature-based adversarial meta-embeddings (FAME) with an attention function that is guided by features reflecting word-specific properties, such as shape and frequency, and show that this is beneficial to handle subword-based embeddings. In addition, FAME uses adversarial training to optimize the mappings of differently-sized embeddings to the same space. We demonstrate that FAME works effectively across languages and domains for sequence labeling and sentence classification, in particular in low-resource settings. FAME sets the new state of the art for POS tagging in 27 languages, various NER settings and question classification in different domains.
翻译:将多个嵌入式组合在一起通常会改善下游任务的业绩,因为不同的嵌入式将不同的信息编码成不同的嵌入式。 已经显示,即使是使用变压器嵌入式的模型,也仍然得益于标准嵌入式嵌入式的纳入。 但是,不同类型和层面的嵌入式组合具有挑战性。 作为基于关注的元嵌入式组合的替代办法,我们提出了基于地貌的对抗式元构型(FAME),其关注功能以反映单词特性的特征(如形状和频率)为指导,并表明这有利于处理子字嵌入式。 此外,FAME还利用对抗性培训优化不同大小嵌入式嵌入式的映射到同一空间。我们证明,FAME在语言和领域之间有效运行了序列标签和句分类,特别是在低资源环境下。 FAME以27种语言、不同NER设置和不同领域的问题分类为POS标记设置了新的艺术状态。