Cross-lingual transfer is a leading technique for parsing low-resource languages in the absence of explicit supervision. Simple `direct transfer' of a learned model based on a multilingual input encoding has provided a strong benchmark. This paper presents a method for unsupervised cross-lingual transfer that improves over direct transfer systems by using their output as implicit supervision as part of self-training on unlabelled text in the target language. The method assumes minimal resources and provides maximal flexibility by (a) accepting any pre-trained arc-factored dependency parser; (b) assuming no access to source language data; (c) supporting both projective and non-projective parsing; and (d) supporting multi-source transfer. With English as the source language, we show significant improvements over state-of-the-art transfer models on both distant and nearby languages, despite our conceptually simpler approach. We provide analyses of the choice of source languages for multi-source transfer, and the advantage of non-projective parsing. Our code is available online.
翻译:在没有明确监督的情况下,跨语言传输是分析低资源语言的主要方法。基于多语种输入编码的学习模式的简单“直接转让”提供了一个强有力的基准。本文介绍了一种不受监督的跨语言转让方法,它通过将产出作为目标语言中未加标签文本的自我培训的一部分,将产出作为隐性监督,从而在直接转让系统的基础上有所改进。这种方法利用了最起码的资源,并提供了最大的灵活性:(a) 接受任何预先培训的由弧因素引起的依赖分析师;(b) 假设无法获得源语言数据;(c) 支持投影和非投影分析;(d) 支持多源转让。我们以英语作为源语言,在远程语言和附近语言上显示比最先进的传输模式有了很大的改进,尽管我们在概念上比较简单。我们分析了多种来源转让的来源语言的选择和非预测的优势。我们的代码可以在线查阅。