Recent commonsense-reasoning tasks are typically discriminative in nature, where a model answers a multiple-choice question for a certain context. Discriminative tasks are limiting because they fail to adequately evaluate the model's ability to reason and explain predictions with underlying commonsense knowledge. They also allow such models to use reasoning shortcuts and not be "right for the right reasons". In this work, we present ExplaGraphs, a new generative and structured commonsense-reasoning task (and an associated dataset) of explanation graph generation for stance prediction. Specifically, given a belief and an argument, a model has to predict if the argument supports or counters the belief and also generate a commonsense-augmented graph that serves as non-trivial, complete, and unambiguous explanation for the predicted stance. We collect explanation graphs through a novel Create-Verify-And-Refine graph collection framework that improves the graph quality (up to 90%) via multiple rounds of verification and refinement. A significant 79% of our graphs contain external commonsense nodes with diverse structures and reasoning depths. Next, we propose a multi-level evaluation framework, consisting of automatic metrics and human evaluation, that check for the structural and semantic correctness of the generated graphs and their degree of match with ground-truth graphs. Finally, we present several structured, commonsense-augmented, and text generation models as strong starting points for this explanation graph generation task, and observe that there is a large gap with human performance, thereby encouraging future work for this new challenging task. ExplaGraphs will be publicly available at https://explagraphs.github.io.
翻译:最近的共性理性任务通常具有歧视性质, 模型能解答特定背景的多重选择问题。 差异性任务之所以有局限性, 是因为它们未能充分评估模型的理性能力, 并用普通知识来解释预测。 它们还允许这些模型使用推理快捷键, 而不是“ 正确的理由正确 ” 。 在此工作中, 我们展示了 ExplaGraphs, 一个新的和结构化的共性理性任务( 以及相关数据集 ), 用于定位预测。 具体地说, 鉴于一种信念和论点, 一个模型必须预测该模型是否支持或抵消了该模型的信念, 并且未能充分评价模型的判断能力, 并生成出一个非三进制、 完整和清晰的预知度图表。 我们通过一个创新- Verififify- And- Refphine 图表收集框架来提高图表质量( 高达90% ), 用于多轮校验和精度 。 我们的图表中79 % 含有大型的外部共性常识度, 结构结构结构和深度。 因此, 我们提出一个多层次的模拟的计算,, 最后, 将产生一个多层次的图表, 数据级的计算, 以校正的校正。