Transformers have been showing near-human performance on a variety of tasks, but they are not without their limitations. We discuss the issue of conflating results of transformers that are instructed to do multiple tasks simultaneously. In particular, we focus on the domain of commonsense reasoning within story prose, which we call contextual commonsense inference (CCI). We look at the GLUCOSE (Mostafazadeh et al. 2020) dataset and task for predicting implicit commonsense inferences between story sentences. Since the GLUCOSE task simultaneously generates sentences and predicts the CCI relation, there is a conflation in the results. Is the model really measuring CCI or is its ability to generate grammatical text carrying the results? In this paper, we introduce the task contextual commonsense inference in sentence selection ($\rm{C {\small IS}}^2$), a simplified task that avoids conflation by eliminating language generation altogether. Our findings emphasize the necessity of future work to disentangle language generation from the desired NLP tasks at hand.
翻译:变异器在各种任务上表现出接近人的性能,但并非没有限制。 我们讨论了同时执行多项任务的变异器的混成结果问题。 特别是, 我们注重故事文稿中的常识推理领域, 我们称之为背景常识推理( CCI ) 。 我们查看GLUCOSE ( Mostafazadeh et al. 2020) 的数据集, 以及预测故事句子之间隐含的常识推论的任务。 由于 GLUCOSE 任务同时生成句子并预测 CCI 关系, 结果中存在混杂。 模型是真正测量 CCI, 还是它能够生成包含结果的语法文本? 在本文中, 我们介绍在句选中的常识推理( $\ rm{ C small IS ⁇ 2$ ), 这是一项简化的任务, 避免通过完全删除语言生成来混为一体。 我们的调查结果强调未来工作的必要性, 要把语言的生成与预期的 NLP 任务区分开来。