We examine LMs' competence of directional predicate entailments by supervised fine-tuning with prompts. Our analysis shows that contrary to their apparent success on standard NLI, LMs show limited ability to learn such directional inference; moreover, existing datasets fail to test directionality, and/or are infested by artefacts that can be learnt as proxy for entailments, yielding over-optimistic results. In response, we present BoOQA (Boolean Open QA), a robust multi-lingual evaluation benchmark for directional predicate entailments, extrinsic to existing training sets. On BoOQA, we establish baselines and show evidence of existing LM-prompting models being incompetent directional entailment learners, in contrast to entailment graphs, however limited by sparsity.
翻译:我们通过有监督的微调来检查LMS的方向性上游影响的能力。我们的分析表明,与其在标准NLI上的明显成功相反,LMS表明学习这种方向性推断的能力有限;此外,现有的数据集未能测试方向性,和/或被能够学习作为结果的代理而产生过度乐观结果的手工艺品所侵扰。作为回应,我们提出了BoOQA(Boolean Open QA),这是一个强有力的多语言评价基准,用以评估方向性上游影响,与现有的培训组合相脱离。在BoOQA,我们建立了基准,并展示了现有LM-M型快速模型缺乏能力的方向性要求学习者的证据,这与隐含的图形形成对比,尽管受偏狭的限制。