Natural Language Inference (NLI) and Semantic Textual Similarity (STS) are widely used benchmark tasks for compositional evaluation of pre-trained language models. Despite growing interest in linguistic universals, most NLI/STS studies have focused almost exclusively on English. In particular, there are no available multilingual NLI/STS datasets in Japanese, which is typologically different from English and can shed light on the currently controversial behavior of language models in matters such as sensitivity to word order and case particles. Against this background, we introduce JSICK, a Japanese NLI/STS dataset that was manually translated from the English dataset SICK. We also present a stress-test dataset for compositional inference, created by transforming syntactic structures of sentences in JSICK to investigate whether language models are sensitive to word order and case particles. We conduct baseline experiments on different pre-trained language models and compare the performance of multilingual models when applied to Japanese and other languages. The results of the stress-test experiments suggest that the current pre-trained language models are insensitive to word order and case marking.
翻译:尽管对语言通用性的兴趣日益浓厚,但大多数语言通用性/STS研究几乎完全集中在英语上,特别是日本语没有多种语言的NLI/STS数据集,该数据集与英文有典型的区别,可以说明目前语言模型在对文字顺序和案例粒子敏感度等事项上有争议的行为。在此背景下,我们介绍Jusick,这是日本的NLI/STS数据集,由英国数据集SICK人工翻译。我们还提供了一套用于合成推断的压力测试数据集,这是通过在JusICK中改变语句组合结构,以调查语言模型是否敏感于文字顺序和案例粒子。我们对不同的经过预先培训的语言模型进行基线实验,比较适用于日语和其他语言的多语种模型的性能。压力测试结果表明,目前经过培训的语文模型对文字顺序和案件标识不敏感。