Prediction models often fail if train and test data do not stem from the same distribution. Out-of-distribution (OOD) generalization to unseen, perturbed test data is a desirable but difficult-to-achieve property for prediction models and in general requires strong assumptions on the data generating process (DGP). In a causally inspired perspective on OOD generalization, the test data arise from a specific class of interventions on exogenous random variables of the DGP, called anchors. Anchor regression models, introduced by Rothenh\"ausler et al. (2018), protect against distributional shifts in the test data by employing causal regularization. However, so far anchor regression has only been used with a squared-error loss which is inapplicable to common responses such as censored continuous or ordinal data. Here, we propose a distributional version of anchor regression which generalizes the method to potentially censored responses with at least an ordered sample space. To this end, we combine a flexible class of parametric transformation models for distributional regression with an appropriate causal regularizer under a more general notion of residuals. In an exemplary application and several simulation scenarios we demonstrate the extent to which OOD generalization is possible.
翻译:如果列车和测试数据不是来自同一分布,则预测模型往往会失败。在分布(OOOD)到不可见的概括性测试数据对于预测模型来说是可取的,但难以达到的特性,一般要求对数据生成过程(DGP)有强烈的假设。从因果启发的角度,OOOD一般化的测试数据来自关于DGP外源随机变量(称为锚)的特定干预类别。Rotheh\'ausler等人(2018年)引进的Anchor回归模型,通过使用因果正规化来保护测试数据中的分布性转移。然而,到目前为止,固定回归仅用于一个方形的机载损失,而这种损失不适用于诸如被审查的连续或正态数据等常见反应。在这里,我们提出一个分布式回归法的分布式版本,将可能用至少一个定有样本空间来审查反应的方法加以概括。为此,我们将一个灵活的分布性回归模型与一个更普遍的残留概念下的适当的因果正规化器结合起来。在一种堪称性的O的假设性应用和若干模拟情景中,我们展示了可能达到一般的状态。