The goal of temporal relation extraction is to infer the temporal relation between two events in the document. Supervised models are dominant in this task. In this work, we investigate ChatGPT's ability on zero-shot temporal relation extraction. We designed three different prompt techniques to break down the task and evaluate ChatGPT. Our experiments show that ChatGPT's performance has a large gap with that of supervised methods and can heavily rely on the design of prompts. We further demonstrate that ChatGPT can infer more small relation classes correctly than supervised methods. The current shortcomings of ChatGPT on temporal relation extraction are also discussed in this paper. We found that ChatGPT cannot keep consistency during temporal inference and it fails in actively long-dependency temporal inference.
翻译:时态关系提取的目标是推断文本中两个事件的时间关系。在这项任务中,监督模型是主流技术。本研究探讨了ChatGPT在零-shot条件下进行时态关系提取的能力。我们设计了三种不同的提示技术,以拆分任务并评估ChatGPT的表现。我们的实验表明,ChatGPT的性能与监督方法相差较大,并且极大程度上依赖于提示的设计。我们进一步证明,相比监督方法,ChatGPT能够更准确地推断出较小的关系类别。本文还讨论了ChatGPT在时态推理过程中存在的缺陷。我们发现,ChatGPT不能在时态推理中保持一致性,在处理长期依赖关系时也容易出现问题。