Neural networks leverage both causal and correlation-based relationships in data to learn models that optimize a given performance criterion, such as classification accuracy. This results in learned models that may not necessarily reflect the true causal relationships between input and output. When domain priors of causal relationships are available at the time of training, it is essential that a neural network model maintains these relationships as causal, even as it learns to optimize the performance criterion. We propose a causal regularization method that can incorporate such causal domain priors into the network and which supports both direct and total causal effects. We show that this approach can generalize to various kinds of specifications of causal priors, including monotonicity of causal effect of a given input feature or removing a certain influence for purposes of fairness. Our experiments on eleven benchmark datasets show the usefulness of this approach in regularizing a learned neural network model to maintain desired causal effects. On most datasets, domain-prior consistent models can be obtained without compromising on accuracy.
翻译:神经网络在数据中利用因果和基于关联的关系,以学习优化特定性能标准的模型,如分类准确性。这导致学习的模型不一定反映投入和产出之间真正的因果关系。当培训时存在因果关系领域前科时,神经网络模型必须将这些关系作为因果关系加以维持,即使它学会优化性能标准。我们提议了一种因果规范化方法,可以将这种因果领域前科纳入网络,并且支持直接和总的因果效应。我们表明,这一方法可以概括到各种因果前科的规格,包括特定输入特性的因果效应的单一性,或为了公平的目的消除某种影响。我们在11个基准数据集上进行的实验表明,这一方法有助于将一个学到性能的神经网络模型正规化,以保持预期的因果效应。在大多数数据集中,可以在不损害准确性的前提下获得域位一致模型。