We study politeness phenomena in nine typologically diverse languages. Politeness is an important facet of communication and is sometimes argued to be cultural-specific, yet existing computational linguistic study is limited to English. We create TyDiP, a dataset containing three-way politeness annotations for 500 examples in each language, totaling 4.5K examples. We evaluate how well multilingual models can identify politeness levels -- they show a fairly robust zero-shot transfer ability, yet fall short of estimated human accuracy significantly. We further study mapping the English politeness strategy lexicon into nine languages via automatic translation and lexicon induction, analyzing whether each strategy's impact stays consistent across languages. Lastly, we empirically study the complicated relationship between formality and politeness through transfer experiments. We hope our dataset will support various research questions and applications, from evaluating multilingual models to constructing polite multilingual agents.
翻译:我们用九种类型多样的语言研究礼貌现象。 礼貌是沟通的一个重要方面,有时被认为是文化特有的,但现有的计算语言研究仅限于英语。 我们创建了TyDiP,这是一个包含一种三向礼貌说明的数据集,每个语言有500个实例,总共有4.5K实例。 我们评估多种语言模式能够如何很好地确定礼貌水平 -- -- 它们显示出相当强的零射转移能力,但远远低于估计的人类准确性。我们进一步研究通过自动翻译和词汇介绍将英语礼貌战略词汇表绘制成九种语言,分析每项战略的影响是否在各种语言之间保持一致。最后,我们通过转移实验从经验上研究形式与礼貌之间的复杂关系。 我们希望我们的数据集将支持各种研究和应用,从评估多语言模式到构建礼貌的多语言代理物。