Communication is a cooperative effort that requires reaching mutual understanding among the participants. Humans use commonsense reasoning implicitly to produce natural and logically-coherent responses. As a step towards fluid human-AI communication, we study if response generation (RG) models can emulate human reasoning process and use common sense to help produce better-quality responses. We aim to tackle two research questions: how to formalize conversational common sense and how to examine RG models capability to use common sense? We first propose a task, CEDAR: Causal common sEnse in DiAlogue Response generation, that concretizes common sense as textual explanations for what might lead to the response and evaluates RG models behavior by comparing the modeling loss given a valid explanation with an invalid one. Then we introduce a process that automatically generates such explanations and ask humans to verify them. Finally, we design two probing settings for RG models targeting two reasoning capabilities using verified explanations. We find that RG models have a hard time determining the logical validity of explanations but can identify grammatical naturalness of the explanation easily.
翻译:人类使用常识推理来做出自然和逻辑一致的反应。 作为人类-AI交流的一个步骤,我们研究反应生成模型能否模仿人类推理过程,并使用常识来帮助做出更高质量的反应。我们的目标是解决两个研究问题:如何使对话常识正规化,如何研究RG模型来使用常识?我们首先提出一个任务,即CEDAR:在对白反应生成中产生共同共性共性,将常识具体化为可能导致反应的文字解释,并通过将模型损失的正确解释与无效的解释进行比较来评估RG模型的行为。然后我们引入一个自动生成这种解释的过程,并请人来核实这些解释。最后,我们设计两个对RG模型进行检验的环境,以两种推理能力为对象,使用经核实的解释。我们发现RG模型很难确定解释的逻辑正确性,但能够很容易地确定解释的字法自然性。