This paper investigates models of event implications. Specifically, how well models predict entity state-changes, by targeting their understanding of physical attributes. Nominally, Large Language models (LLM) have been exposed to procedural knowledge about how objects interact, yet our benchmarking shows they fail to reason about the world. Conversely, we also demonstrate that existing approaches often misrepresent the surprising abilities of LLMs via improper task encodings and that proper model prompting can dramatically improve performance of reported baseline results across multiple tasks. In particular, our results indicate that our prompting technique is especially useful for unseen attributes (out-of-domain) or when only limited data is available.
翻译:本文调查了事件影响模型。 具体地说, 模型通过针对物理属性的理解,如何很好地预测实体国家的变化。 名义上,大语言模型(LLM)接触了有关物体相互作用的程序知识,但我们的基准显示,这些模型对世界没有道理。 相反,我们还表明,现有的方法往往通过不适当的任务编码来歪曲LLMs惊人的能力,而适当的模型能够极大地改善报告的基准结果在多项任务中的绩效。 特别是,我们的结果表明,我们的快速技术对看不见的属性(外)或只有有限的数据特别有用。