Spurious correlations allow flexible models to predict well during training but poorly on related test populations. Recent work has shown that models that satisfy particular independencies involving correlation-inducing \textit{nuisance} variables have guarantees on their test performance. Enforcing such independencies requires nuisances to be observed during training. However, nuisances, such as demographics or image background labels, are often missing. Enforcing independence on just the observed data does not imply independence on the entire population. Here we derive \acrshort{mmd} estimators used for invariance objectives under missing nuisances. On simulations and clinical data, optimizing through these estimates achieves test performance similar to using estimators that make use of the full data.
翻译:纯净的相互关系使得灵活的模型能够在培训期间很好地预测,但在相关测试人群方面却差强人意。 最近的工作表明,满足与相关诱导\ textit{nisance}变量相关的特定依赖性的模型可以保证其测试性能。 执行这种不依赖性要求在培训期间观察到麻烦。 但是,诸如人口统计或图像背景标签等的麻烦往往会消失。 仅仅维护所观察到的数据的独立性并不意味着整个人口的独立性。 在这里,我们得出了在缺失的骚扰下用于变量目标的估算值。 在模拟和临床数据上,通过这些估算优化实现测试性能与使用全部数据的估算值相似。