Recent developments in pre-trained neural language modeling have led to leaps in accuracy on commonsense question-answering benchmarks. However, there is increasing concern that models overfit to specific tasks, without learning to utilize external knowledge or perform general semantic reasoning. In contrast, zero-shot evaluations have shown promise as a more robust measure of a model's general reasoning abilities. In this paper, we propose a novel neuro-symbolic framework for zero-shot question answering across commonsense tasks. Guided by a set of hypotheses, the framework studies how to transform various pre-existing knowledge resources into a form that is most effective for pre-training models. We vary the set of language models, training regimes, knowledge sources, and data generation strategies, and measure their impact across tasks. Extending on prior work, we devise and compare four constrained distractor-sampling strategies. We provide empirical results across five commonsense question-answering tasks with data generated from five external knowledge resources. We show that, while an individual knowledge graph is better suited for specific tasks, a global knowledge graph brings consistent gains across different tasks. In addition, both preserving the structure of the task as well as generating fair and informative questions help language models learn more effectively.
翻译:培训前神经语言模型的近期发展导致常识问答基准的准确性突飞猛进。然而,人们日益关切的是,模型在不学习利用外部知识或进行一般语义推理的情况下,超越了具体任务,而没有学习如何利用外部知识或进行一般语义推理推理。相反,零点评价显示,作为模型一般推理能力的一种更强有力的衡量方法,前景大有希望。在本文件中,我们提出了一个新颖的神经同步框架,用于在共识任务中回答零点问题。在一套假设的指导下,框架研究如何将各种先前存在的知识资源转化为对培训前模式最为有效的形式。我们改变了一套语言模式、培训制度、知识来源和数据生成战略,并衡量其跨任务的影响。在以往的工作中,我们设计并比较了四种受限制的分散式抽样战略。我们用五个外部知识资源生成的数据提供五个常见的问答任务的经验结果。我们表明,虽然个人知识图表更适合具体的任务,但全球知识图表能够在不同任务中取得一致的收益。此外,我们还有效地保留了任务的结构,作为公正的学习。