Many datasets are collected from multiple environments (e.g. different labs, perturbations, etc.), and it is often advantageous to learn models and relations that are invariant across environments. Invariance can improve robustness to unknown confounders and improve generalization to new domains. We develop a novel framework -- KL regression -- to reliably estimate regression coefficients in a challenging multi-environment setting, where latent confounders affect the data from each environment. KL regression is based on a new objective of simultaneously minimizing the KL- divergence between a parametric model and the observed data from each environment. We prove that KL regression recovers the true invariant factors under a flexible confounding setup. Moreover, it is computationally efficient as we derive an analytic solution for its global optimum. In systematic experiments, we validate the improved performance of KL regression compared to commonly used approaches.
翻译:许多数据集是从多种环境中收集的(例如不同的实验室、扰动等),学习各种环境之间变化不定的模式和关系往往有好处。不小心可以提高未知混杂者的稳健性,改进对新域的概括性。我们开发了一个新的框架 -- -- KL回归性 -- -- 以便在具有挑战性的多环境环境中可靠地估计回归系数,在这种环境中,潜伏的共振者影响来自每个环境的数据。KL回归性基于一个新的目标,即同时尽量减少一个参数模型与每个环境的观测数据之间的KL差异。我们证明,KL回归性在灵活组合的设置下恢复了真实的变量因素。此外,随着我们为其全球最佳性得出分析性解决方案,它具有计算效率。在系统实验中,我们验证了与常用方法相比,KL回归性回归性效果的改进。