Deep learning has led to remarkable advances in computer vision. Even so, today's best models are brittle when presented with variations that differ even slightly from those seen during training. Minor shifts in the pose, color, or illumination of an object can lead to catastrophic misclassifications. State-of-the art models struggle to understand how a set of variations can affect different objects. We propose a framework for instilling a notion of how objects vary in more realistic settings. Our approach applies the formalism of Lie groups to capture continuous transformations to improve models' robustness to distributional shifts. We apply our framework on top of state-of-the-art self-supervised learning (SSL) models, finding that explicitly modeling transformations with Lie groups leads to substantial performance gains of greater than 10% for MAE on both known instances seen in typical poses now presented in new poses, and on unknown instances in any pose. We also apply our approach to ImageNet, finding that the Lie operator improves performance by almost 4%. These results demonstrate the promise of learning transformations to improve model robustness.
翻译:深层学习导致计算机视野的显著进步。 即便如此, 当今最佳模型在展示与培训期间所看到的变化差异甚至略有不同时, 其最佳模型就变得非常脆弱。 一个对象的表面、 颜色或光化的微小变化可能导致灾难性的分类错误。 最先进的模型很难理解一系列变异如何影响不同对象。 我们提出了一个框架来灌输一个概念, 说明在更现实的环境中天体如何不同。 我们的方法是运用“ 谎言” 组的正规主义来捕捉连续的变换, 以提高模型对分布式变换的稳健性。 我们在最先进的自我监督学习模型( SSL) 上应用了我们的框架, 发现与“ 谎言” 组进行明确的变迁的模型在已知的变形中都取得了超过10%的显著绩效收益。 我们还对图像网络应用了我们的方法, 发现“ 谎言操作者” 提高了近4%的性能。 这些结果显示学习变形的希望改善模型的稳健性。