There are two types of approaches to solving cross-lingual transfer: multilingual pre-training implicitly aligns the hidden representations of different languages, while the translate-test explicitly translates different languages to an intermediate language, such as English. Translate-test has better interpretability compared to multilingual pre-training. However, the translate-test has lower performance than multilingual pre-training(Conneau and Lample, 2019; Conneau et al, 2020) and can't solve word-level tasks because translation rearranges the word order. Therefore, we propose a new Machine-created Universal Language (MUL) as a new intermediate language. MUL consists of a set of discrete symbols as universal vocabulary and NL-MUL translator for translating from multiple natural languages to MUL. MUL unifies common concepts from different languages into the same universal word for better cross-language transfer. And MUL preserves the language-specific words as well as word order, so the model can be easily applied to word-level tasks. Our experiments show that translating into MUL achieves better performance compared to multilingual pre-training, and our analyses show that MUL has good interpretability.
翻译:暂无翻译