We present three Natural Language Inference (NLI) challenge sets that can evaluate NLI models on their understanding of temporal expressions. More specifically, we probe these models for three temporal properties: (a) the order between points in time, (b) the duration between two points in time, (c) the relation between the magnitude of times specified in different units. We find that although large language models fine-tuned on MNLI have some basic perception of the order between points in time, at large, these models do not have a thorough understanding of the relation between temporal expressions.
翻译:我们提出了三套自然语言推断(NLI)挑战组,可以根据对时间表达式的理解来评估国家语言推断模型。更具体地说,我们为三种时间特性对这些模型进行了研究:(a)时间点之间的顺序,(b)两个时间点之间的时间长度,(c)不同单位所指定时间的大小之间的关系。我们发现,虽然对国家语言推断(NLI)进行微调的大型语言模型对一般时间点之间的顺序有一些基本认识,但这些模型对时间表达式之间的关系没有透彻的理解。