To what extent do language models (LMs) build "mental models" of a scene when answering situated questions (e.g., questions about a specific ethical dilemma)? While cognitive science has shown that mental models play a fundamental role in human problem-solving, it is unclear whether the high question-answering performance of existing LMs is backed by similar model building - and if not, whether that can explain their well-known catastrophic failures. We observed that Macaw, an existing T5-based LM, when probed provides somewhat useful but inadequate mental models for situational questions (estimated accuracy=43%, usefulness=21%, consistency=42%). We propose DREAM, a model that takes a situational question as input to produce a mental model elaborating the situation, without any additional task specific training data for mental models. It inherits its social commonsense through distant supervision from existing NLP resources. Our analysis shows that DREAM can produce significantly better mental models (estimated accuracy=67%, usefulness=37%, consistency=71%) compared to Macaw. Finally, mental models generated by DREAM can be used as additional context for situational QA tasks. This additional context improves the answer accuracy of a Macaw zero-shot model by between +1% and +4% (absolute) on three different datasets.
翻译:语言模型(LMS)在回答定位问题(例如关于特定道德困境的问题)时,在多大程度上能构建“心理模型”呢?认知科学已经表明,心理模型在解决人类问题方面起着根本作用。 虽然认知科学已经表明,心理模型在解决人类问题方面起着根本性的作用,但尚不清楚现有LM的高度问答性表现是否得到类似模型建设的支持,如果不是,这能否解释其众所周知的灾难性失败。我们发现,Macaw,一个以T5为基础的现有LM(当被调查时,它为形势问题(估计准确性=43%,有用性=21%,一致性=42%)提供了一些有用但不充分的心理模型。最后,我们建议DREAM所生成的心理模型可以作为提供描述情况的精神模型的投入,用以生成一种阐述情况的精神模型,而没有为心理模型提供任何额外的具体培训数据数据数据。它通过从现有的NLP资源进行远程监督而继承其社会常识。我们的分析表明,DREAM与Maaw相比,它能够产生显著更好的心理模型(估计准确性=67%,实用性=77%,一致性=71%)。最后,由DREAM生成的心理模型生成的心理模型可以用作额外的图像的另外的精确度1+零4A号的答案。