The term `spurious correlations' has been used in NLP to informally denote any undesirable feature-label correlations. However, a correlation can be undesirable because (i) the feature is irrelevant to the label (e.g. punctuation in a review), or (ii) the feature's effect on the label depends on the context (e.g. negation words in a review), which is ubiquitous in language tasks. In case (i), we want the model to be invariant to the feature, which is neither necessary nor sufficient for prediction. But in case (ii), even an ideal model (e.g. humans) must rely on the feature, since it is necessary (but not sufficient) for prediction. Therefore, a more fine-grained treatment of spurious features is needed to specify the desired model behavior. We formalize this distinction using a causal model and probabilities of necessity and sufficiency, which delineates the causal relations between a feature and a label. We then show that this distinction helps explain results of existing debiasing methods on different spurious features, and demystifies surprising results such as the encoding of spurious features in model representations after debiasing.
翻译:在《国家劳工政策》中,“纯正的关联”一词被用来非正式地表示任何不可取的特征标签相关关系,但是,这种关联可能是不可取的,因为(一) 特征与标签无关(例如审查中的标点),或(二) 特征对标签的影响取决于上下文(例如审查中的否定词),语言任务无处不在。在(一) 情况下,我们希望模型对特征不起作用,而这对预测来说既不必要,也不足够。但是,在(二) 情况下,即使是理想模型(例如人类)也必须依赖特征,因为预测中有必要(但不够充分),因此,需要更精细地处理虚假特征,以具体说明理想的模型行为。我们用因果关系模型和必要性和充足性的概率将这种区分正规化,以描述特征和标签之间的因果关系。我们然后表明,这种区分有助于解释关于不同特征的模型不偏差方法的结果,在不同的表面特征上,以及令人惊讶的图像性之后,这些特征是令人惊讶的。