Biological vision systems are unparalleled in their ability to learn visual representations without supervision. In machine learning, contrastive learning (CL) has led to major advances in forming object representations in an unsupervised fashion. These systems learn representations invariant to augmentation operations over images, like cropping or flipping. In contrast, biological vision systems exploit the temporal structure of the visual experience. This gives access to augmentations not commonly used in CL, like watching the same object from multiple viewpoints or against different backgrounds. Here, we systematically investigate and compare the potential benefits of such time-based augmentations for learning object categories. Our results show that time-based augmentations achieve large performance gains over state-of-the-art image augmentations. Specifically, our analyses reveal that: 1) 3-D object rotations drastically improve the learning of object categories; 2) viewing objects against changing backgrounds is vital for learning to discard background-related information. Overall, we conclude that time-based augmentations can greatly improve contrastive learning, narrowing the gap between artificial and biological vision systems.
翻译:在机器学习中,对比式学习(CL)导致在以不受监督的方式形成对象表达方式方面取得重大进展。这些系统通过图像(如裁剪或翻转)学习增殖操作的变异性。相反,生物视觉系统利用视觉经验的时间结构。这让生物视觉系统能够利用CL中常用的增强功能,就像从多种角度或不同背景观察同一对象一样。在这里,我们系统地调查和比较这种基于时间的增强功能对学习对象类别的潜在好处。我们的结果显示,基于时间的增强功能在最新图像增强方面取得了很大的绩效收益。具体地说,我们的分析表明:1)3D对象的旋转极大地改进了对象类别的学习;2)根据不断变化的背景观察对象对于学习抛弃背景信息至关重要。总体而言,我们的结论是,基于时间的增强可以大大改进反向学习,缩小人造和生物视觉系统之间的差距。