We give a method for proactively identifying small, plausible shifts in distribution which lead to large differences in model performance. These shifts are defined via parametric changes in the causal mechanisms of observed variables, where constraints on parameters yield a "robustness set" of plausible distributions and a corresponding worst-case loss over the set. While the loss under an individual parametric shift can be estimated via reweighting techniques such as importance sampling, the resulting worst-case optimization problem is non-convex, and the estimate may suffer from large variance. For small shifts, however, we can construct a local second-order approximation to the loss under shift and cast the problem of finding a worst-case shift as a particular non-convex quadratic optimization problem, for which efficient algorithms are available. We demonstrate that this second-order approximation can be estimated directly for shifts in conditional exponential family models, and we bound the approximation error. We apply our approach to a computer vision task (classifying gender from images), revealing sensitivity to shifts in non-causal attributes.
翻译:我们给出了一种方法来主动确定导致模型性能巨大差异的分布上小的、可信的变化,这些变化是通过观察到的变量因果机制的参数变化来界定的,在这种变化中,对参数的限制会产生一个“粗糙的一组”合理分布和相应的最坏损失。虽然单项参数变化之下的损失可以通过诸如重要取样等重新加权技术来估计,但由此产生的最坏情况优化问题是非混凝土的,而且估计可能存在很大的差异。但是,对于小变化,我们可以构建一个与轮值损失的本地二阶近似,并造成找到最坏情况转变的问题,作为特定的非电离子的二次优化问题,对此我们可以利用有效的算法。我们证明,对于有条件的指数式家庭模型的转变,可以直接估计这种二阶的近似值,我们把近似错误绑在一起。我们的方法应用于计算机的视觉任务(将性别从图像中分类),显示对非因因果关系的变化的敏感度。