Spatial Transformer Networks (STNs) estimate image transformations that can improve downstream tasks by `zooming in' on relevant regions in an image. However, STNs are hard to train and sensitive to mis-predictions of transformations. To circumvent these limitations, we propose a probabilistic extension that estimates a stochastic transformation rather than a deterministic one. Marginalizing transformations allows us to consider each image at multiple poses, which makes the localization task easier and the training more robust. As an additional benefit, the stochastic transformations act as a localized, learned data augmentation that improves the downstream tasks. We show across standard imaging benchmarks and on a challenging real-world dataset that these two properties lead to improved classification performance, robustness and model calibration. We further demonstrate that the approach generalizes to non-visual domains by improving model performance on time-series data.
翻译:空间变换网络(STNs)估计图像变异能够通过图像中的相关区域“分离”改进下游任务。然而,STN很难培训和敏感地注意变异的错误风险。为绕过这些限制,我们提议了一种概率扩展,即估计随机变异而不是确定性变异。边缘化变异使我们能够以多种形态来考虑每个图像,从而使本地化任务更加容易,培训更加有力。作为一种额外的好处,随机变异是一种本地化的、有知识的数据扩增,可以改进下游任务。我们展示了标准成像基准和具有挑战性的现实世界数据集,表明这两种特性可以提高分类性能、稳健性和模型校准。我们进一步表明,这种方法通过改进时间序列数据的模型性能,将一般化到非视觉领域。