Humans use commonsense reasoning (CSR) implicitly to produce natural and coherent responses in conversations. Aiming to close the gap between current response generation (RG) models and human communication abilities, we want to understand why RG models respond as they do by probing RG model's understanding of commonsense reasoning that elicits proper responses. We formalize the problem by framing commonsense as a latent variable in the RG task and using explanations for responses as textual form of commonsense. We collect 6k annotated explanations justifying responses from four dialogue datasets and ask humans to verify them and propose two probing settings to evaluate RG models' CSR capabilities. Probing results show that models fail to capture the logical relations between commonsense explanations and responses and fine-tuning on in-domain data and increasing model sizes do not lead to understanding of CSR for RG. We hope our study motivates more research in making RG models emulate the human reasoning process in pursuit of smooth human-AI communication.
翻译:人类使用常识推理(CSR)暗含地在对话中产生自然和一致的反应。为了缩小当前反应生成模型与人类交流能力之间的差距,我们希望理解为什么RG模型通过探究RG模型对常识推理的理解做出反应,从而得出适当的反应。我们将常识推理作为常识推理的一个潜在变数,并利用对答复的解释作为常识解说的形式。我们收集了6k个附加说明的解释,证明四个对话数据集的反应是合理的,请人类核实它们,并提出两个预测环境来评价RG模型的CSR能力。 论证结果表明,模型未能捕捉常识解释与反应之间的逻辑关系,对内部数据进行微调,而且模型规模的扩大,并不能导致对RG的常识解说。我们希望我们的研究能激发更多的研究,使RG模型在寻求人类顺利的人类-AI交流时模仿人类推理过程。