Recent interest in dataset shift has produced many methods for finding invariant distributions for prediction in new, unseen environments. However, these methods consider different types of shifts and have been developed under disparate frameworks, making it difficult to theoretically analyze how solutions differ with respect to stability and accuracy. Taking a causal graphical view, we use a flexible graphical representation to express various types of dataset shifts. We show that all invariant distributions correspond to a causal hierarchy of graphical operators which disable the edges in the graph that are responsible for the shifts. The hierarchy provides a common theoretical underpinning for understanding when and how stability to shifts can be achieved, and in what ways stable distributions can differ. We use it to establish conditions for minimax optimal performance across environments, and derive new algorithms that find optimal stable distributions. Using this new perspective, we empirically demonstrate that that there is a tradeoff between minimax and average performance.
翻译:最近对数据集转换的兴趣产生了许多方法,用于寻找在新的、看不见的环境中预测的变异分布。然而,这些方法考虑到不同类型的变化,是在不同的框架下开发的,因此难以从理论上分析在稳定性和准确性方面解决办法有何不同。从因果图形角度看,我们使用灵活的图形表达方式来表示各种类型的数据集变化。我们显示,所有变异分布都与因果的图形操作者等级相对应,这些变化使图表中的边缘无法发挥作用。这种等级提供了共同的理论基础,有助于了解何时以及如何实现变异稳定,以及稳定分布如何可以不同。我们利用它来为各种环境的微量最大最佳性能创造条件,并得出找到最佳稳定分布的新算法。我们利用这一新的观点,从经验上证明,最小质量和平均性能之间存在平衡。