In this work we investigate how to achieve equivariance to input transformations in deep networks, purely from data, without being given a model of those transformations. Convolutional Neural Networks (CNNs), for example, are equivariant to image translation, a transformation that can be easily modelled (by shifting the pixels vertically or horizontally). Other transformations, such as out-of-plane rotations, do not admit a simple analytic model. We propose an auto-encoder architecture whose embedding obeys an arbitrary set of equivariance relations simultaneously, such as translation, rotation, colour changes, and many others. This means that it can take an input image, and produce versions transformed by a given amount that were not observed before (e.g. a different point of view of the same object, or a colour variation). Despite extending to many (even non-geometric) transformations, our model reduces exactly to a CNN in the special case of translation-equivariance. Equivariances are important for the interpretability and robustness of deep networks, and we demonstrate results of successful re-rendering of transformed versions of input images on several synthetic and real datasets, as well as results on object pose estimation.
翻译:在这项工作中,我们调查如何实现深度网络输入变异的等同,纯粹从数据中实现,而没有给这些变异的模式。例如,进化神经网络(CNNs)对于图像转换是等同的,这种变异很容易(通过垂直或水平移动像素)模拟。其他变异,例如飞机外旋转,并不接受简单的解析模型。我们提议一个自动编码结构,其嵌入符合任意的变异关系,例如翻译、旋转、颜色变化和其他许多变异模式。这意味着它可以采用输入图像,并产生由以前没有观察到的一定数量(例如同一对象的不同观点或颜色变异)转换的版本。尽管我们的模式延伸到许多变异(甚至非测地)变异模式,但在翻译变变的特殊情况下却完全下降到CNN。 等变异对于深网络的可解释性和稳健性十分重要,这意味着它可以采用输入图像,并产生以前没有观察到的变异的版本(例如同一对象的不同观点,或颜色变异性) 。尽管我们的模式在翻译变异的特殊情况下,我们的变异性模型完全下降到CNN。 。 等变对于深网络的可解释性和稳性对深网络的网络的网络十分重要性非常重要非常重要非常重要,我们展示了一些图像的合成结果的结果。