We present IndoNLI, the first human-elicited NLI dataset for Indonesian. We adapt the data collection protocol for MNLI and collect nearly 18K sentence pairs annotated by crowd workers and experts. The expert-annotated data is used exclusively as a test set. It is designed to provide a challenging test-bed for Indonesian NLI by explicitly incorporating various linguistic phenomena such as numerical reasoning, structural changes, idioms, or temporal and spatial reasoning. Experiment results show that XLM-R outperforms other pre-trained models in our data. The best performance on the expert-annotated data is still far below human performance (13.4% accuracy gap), suggesting that this test set is especially challenging. Furthermore, our analysis shows that our expert-annotated data is more diverse and contains fewer annotation artifacts than the crowd-annotated data. We hope this dataset can help accelerate progress in Indonesian NLP research.
翻译:我们介绍印度尼西亚第一个人类获得的NLI数据集IndoNLI。 我们调整了MNLI的数据收集协议,并收集了近18K对的人群工人和专家附加说明的句子。 专家附加说明的数据完全作为测试集使用。 它旨在为印度尼西亚NLI提供一个具有挑战性的测试台,明确纳入各种语言现象,如数字推理、结构变化、语系变化或时间和空间推理。 实验结果显示, XLM-R优于我们数据中其他经过培训的模型。 专家附加说明的数据的最佳性能仍然远远低于人类的性能(13.4%) 准确度差距, 表明这一测试集尤其具有挑战性。 此外,我们的分析表明,我们的专家附加说明数据比众注数据更多样化,包含的注解装置也更少。 我们希望这一数据集有助于加快印度尼西亚NLP研究的进展。