A trained neural network can be interpreted as a structural causal model (SCM) that provides the effect of changing input variables on the model's output. However, if training data contains both causal and correlational relationships, a model that optimizes prediction accuracy may not necessarily learn the true causal relationships between input and output variables. On the other hand, expert users often have prior knowledge of the causal relationship between certain input variables and output from domain knowledge. Therefore, we propose a regularization method that aligns the learned causal effects of a neural network with domain priors, including both direct and total causal effects. We show that this approach can generalize to different kinds of domain priors, including monotonicity of causal effect of an input variable on output or zero causal effect of a variable on output for purposes of fairness. Our experiments on twelve benchmark datasets show its utility in regularizing a neural network model to maintain desired causal effects, without compromising on accuracy. Importantly, we also show that a model thus trained is robust and gets improved accuracy on noisy inputs.
翻译:受过训练的神经网络可以被解释为一个结构性因果模型(SCM),该模型提供不断变化的输入变量对模型输出结果的影响。但是,如果培训数据包含因果关系和关联关系,那么优化预测准确性的模式不一定能了解输入变量和输出变量之间的真实因果关系。另一方面,专家用户通常事先了解某些输入变量与域知识输出之间的因果关系。因此,我们建议一种正规化方法,将神经网络的已知因果效应与领域前科(包括直接和总因果效应)相匹配。我们表明,这一方法可以概括不同种类的域前科,包括输入变量对产出的因果效应的单一性,或为公平目的对产出的零因果效应。我们在12个基准数据集上进行的实验表明,它有助于将神经网络模型正规化,以保持预期的因果效应,同时不损害准确性。我们还表明,所培训的模型是稳健的,在噪音投入方面提高了准确性。