We present Emu, a system that semantically enhances multilingual sentence embeddings. Our framework fine-tunes pre-trained multilingual sentence embeddings using two main components: a semantic classifier and a language discriminator. The semantic classifier improves the semantic similarity of related sentences, whereas the language discriminator enhances the multilinguality of the embeddings via multilingual adversarial training. Our experimental results based on several language pairs show that our specialized embeddings outperform the state-of-the-art multilingual sentence embedding model on the task of cross-lingual intent classification using only monolingual labeled data.
翻译:我们展示了Emu, 这是一种用语义强化多语种判决嵌入的系统。我们的框架通过两个主要组成部分,即语义分类器和语言歧视器,对经过培训的多语种判决嵌入进行了微调。语义分类器改进了相关判决的语义相似性,而语言歧视器则通过多语种对抗培训加强了嵌入的多语言性。我们基于多种语言对口的实验结果表明,我们的专业嵌入超过了最先进的多语种判决嵌入模式,即只使用单一语言的标签数据进行跨语种意图分类。