Typologically diverse languages offer systems of lexical and grammatical aspect that allow speakers to focus on facets of event structure in ways that comport with the specific communicative setting and discourse constraints they face. In this paper, we look specifically at captions of images across Arabic, Chinese, Farsi, German, Russian, and Turkish and describe a computational model for predicting lexical aspects. Despite the heterogeneity of these languages, and the salient invocation of distinctive linguistic resources across their caption corpora, speakers of these languages show surprising similarities in the ways they frame image content. We leverage this observation for zero-shot cross-lingual learning and show that lexical aspects can be predicted for a given language despite not having observed any annotated data for this language at all.
翻译:类型多样的语言提供了词汇学和语法学方面的系统,使发言者能够以与他们所面临的特定交流环境和谈话限制相适应的方式关注事件结构的方方面面。在本文中,我们特别查看阿拉伯文、中文、法西文、德文、俄文和土耳其文的图像标题,并描述预测词汇方面的计算模型。尽管这些语言各异,而且在其字幕中各语种都明显使用独特的语言资源,但这些语言的发言者在设定图像内容的方式上表现出惊人的相似之处。我们利用这一观察来进行零弹射的跨语言学习,并表明尽管没有观察到任何关于某种语言的附加说明数据,但仍可以预测某种语言的词汇方面。