Invariant learning methods try to find an invariant predictor across several environments and have become popular in OOD generalization. However, in situations where environments do not naturally exist in the data, they have to be decided by practitioners manually. Environment partitioning, which splits the whole training dataset into environments by algorithms, will significantly influence the performance of invariant learning and has been left undiscussed. A good environment partitioning method can bring invariant learning to applications with more general settings and improve its performance. We propose to split the dataset into several environments by finding low-correlated data subsets. Theoretical interpretations and algorithm details are both introduced in the paper. Through experiments on both synthetic and real data, we show that our Decorr method can achieve outstanding performance, while some other partitioning methods may lead to bad, even below-ERM results using the same training scheme of IRM.
翻译:然而,在数据中环境并非自然存在的情形下,它们必须由实践者手工决定。环境分割将整个培训数据集分成各种算法,这将极大地影响不变化学习的绩效,并且没有受到讨论。良好的环境分割方法可以使不同应用在更一般的设置下学习并改进其性能。我们提议通过寻找低孔相关数据子集,将数据集分成若干环境。理论解释和算法细节都引入了本文。我们通过对合成数据和真实数据的实验,表明我们的装饰师方法可以取得杰出的性能,而其他一些分割方法则可能带来不良的结果,甚至使用同样的IRM培训计划,导致低于ERM的结果。