In this paper we motivate the causal mechanisms behind sample selection induced collider bias (selection collider bias) that can cause Large Language Models (LLMs) to learn unconditional dependence between entities that are unconditionally independent in the real world. We show that selection collider bias can become amplified in underspecified learning tasks, and although difficult to overcome, we describe a method to exploit the resulting spurious correlations for determination of when a model may be uncertain about its prediction. We demonstrate an uncertainty metric that matches human uncertainty in tasks with gender pronoun underspecification on an extended version of the Winogender Schemas evaluation set, and we provide an online demo where users can apply our uncertainty metric to their own texts and models.
翻译:在本文中,我们激励抽样选择背后的因果机制,即可能导致大语言模型(LLMs)学习在现实世界中无条件独立的实体之间无条件依赖的对撞机制。我们表明,选择对撞机制的偏向可以在未详细说明的学习任务中扩大,虽然难以克服,但我们描述了一种方法,利用由此产生的虚假关联来确定一个模型何时可能不确定其预测。我们展示了一种不确定的衡量标准,用以匹配在延长Winogender Schemas评价组的性别定型方面人类的不确定性,我们提供了一个在线演示,用户可以在网上将不确定性衡量标准应用于自己的文本和模型。