Cross-lingual language tasks typically require a substantial amount of annotated data or parallel translation data. We explore whether language representations that capture relationships among languages can be learned and subsequently leveraged in cross-lingual tasks without the use of parallel data. We generate dense embeddings for 29 languages using a denoising autoencoder, and evaluate the embeddings using the World Atlas of Language Structures (WALS) and two extrinsic tasks in a zero-shot setting: cross-lingual dependency parsing and cross-lingual natural language inference.
翻译:跨语文任务通常需要大量附加说明的数据或平行翻译数据。我们探讨在不使用平行数据的情况下,是否可以在跨语文任务中学习并随后加以利用的语文表现,我们利用一个分解自动编码器为29种语文生成密集嵌入器,并使用世界语文结构图集(WALS)和两个零点设置的外在任务:跨语文的受扶养人划分和跨语文的自然语言推断。