Distributed word representations are popularly used in many tasks in natural language processing, adding that pre-trained word vectors on huge text corpus achieved high performance in many different NLP tasks. This paper introduces multiple high quality word vectors for the French language where two of them are trained on huge crawled French data and the others are trained on an already existing French corpus. We also evaluate the quality of our proposed word vectors and the existing French word vectors on the French word analogy task. In addition, we do the evaluation on multiple real NLP tasks that show the important performance enhancement of the pre-trained word vectors compared to the existing and random ones. Finally, we created a demo web application to test and visualize the obtained word embeddings. The produced French word embeddings are available to the public, along with the fine-tuning code on the NLU tasks and the demo code.
翻译:在自然语言处理的许多任务中,广泛使用分布式文字表达方式,并补充说,在大量文本体上经过预先培训的文字矢量在许多不同的国家语言方案任务中取得了很高的成绩。本文介绍了法语语言的多种高质量的文字矢量,其中两人接受了关于大量爬行的法国数据的培训,其他人则接受了关于已经存在的法国数据的培训。我们还评估了我们提议的文字矢量的质量,以及法语类比任务上现有的法语文字矢量的质量。此外,我们评估了多种真实的国家语言方案任务,这些任务表明,与现有和随机任务相比,预先培训的文字矢量的性能得到了重要的提高。最后,我们创建了一个演示网络应用程序,测试和直观地展示所获得的文字嵌入。制作的法语词嵌入式向公众开放,同时提供关于国家语言系统任务和演示代码的微调码。