Representation learning approaches typically rely on images of objects captured from a single perspective that are transformed using affine transformations. Additionally, self-supervised learning, a successful paradigm of representation learning, relies on instance discrimination and self-augmentations which cannot always bridge the gap between observations of the same object viewed from a different perspective. Viewing an object from multiple perspectives aids holistic understanding of an object which is particularly important in situations where data annotations are limited. In this paper, we present an approach that combines self-supervised learning with a multi-perspective matching technique and demonstrate its effectiveness on learning higher quality representations on data captured by a robotic vacuum with an embedded camera. We show that the availability of multiple views of the same object combined with a variety of self-supervised pretraining algorithms can lead to improved object classification performance without extra labels.
翻译:代表制学习方法通常依赖于从单一角度拍摄的物体图像,这些物体使用方形变换方式转变了。此外,自我监督学习是代表制学习的成功范例,它依赖实例歧视和自我推荐,这并不总是能够弥合从不同角度观察同一对象之间的差别。从多种角度观察一个对象有助于全面理解一个在数据说明有限的情况下特别重要的物体。在本文中,我们提出一种方法,将自我监督学习与多视角匹配技术结合起来,并展示其在学习机器人真空和嵌入相机所捕取的数据的更高质量的表述方面的有效性。我们表明,同一对象的多种观点的提供,加上各种自我监督的预培训算法,可以导致在没有额外标签的情况下改进目标分类的性能。