Prediction models often fail if train and test data do not stem from the same distribution. Out-of-distribution (OOD) generalization to unseen, perturbed test data is a desirable but difficult-to-achieve property for prediction models and in general requires strong assumptions on the data generating process (DGP). In a causally inspired perspective on OOD generalization, the test data arise from a specific class of interventions on exogenous random variables of the DGP, called anchors. Anchor regression models, introduced by Rothenhaeusler et al. (2021), protect against distributional shifts in the test data by employing causal regularization. However, so far anchor regression has only been used with a squared-error loss which is inapplicable to common responses such as censored continuous or ordinal data. Here, we propose a distributional version of anchor regression which generalizes the method to potentially censored responses with at least an ordered sample space. To this end, we combine a flexible class of parametric transformation models for distributional regression with an appropriate causal regularizer under a more general notion of residuals. In an exemplary application and several simulation scenarios we demonstrate the extent to which OOD generalization is possible.
翻译:如果火车和测试数据不是来自同一分布分布,则预测模型往往会失败。从分布(OOOD)到分散(OOOD)的概括性测试数据对于预测模型来说是可取的,但很难实现,一般要求对数据生成过程(DGP)有严格的假设。从因果启发的角度,OOOD一般化的测试数据来自关于DGP外源随机变量(称为锚)的特定干预类别。Rotheheiusler等人(2021年)引进的Anchor回归模型,通过采用因果正规化,保护试验数据中的分布性转移。然而,到目前为止,锚回归仅用于一个方位-eroror损失,这不适用于诸如受审查的连续或正态数据等共同反应。在这里,我们提出一个分布式的锚回归法,将可能受审查的响应方法(至少是定序的样本空间)归纳。为此,我们将一个灵活的分布式回归变异模型与一个更普遍的残留概念下的适当的因果正规化器结合起来。在一种示范性应用和几种模拟假设中,我们展示了可能的普通的O。