自我监督学习的背景原因 (Reason from Context with Self-supervised Learning)

A tiny object in the sky cannot be an elephant. Context reasoning is critical in visual recognition, where current inputs need to be interpreted in the light of previous experience and knowledge. To date, research into contextual reasoning in visual recognition has largely proceeded with supervised learning methods. The question of whether contextual knowledge can be captured with self-supervised learning regimes remains under-explored. Here, we established a methodology for context-aware self-supervised learning. We proposed a novel Self-supervised Learning Method for Context Reasoning (SeCo), where the only inputs to SeCo are unlabeled images with multiple objects present in natural scenes. Similar to the distinction between fovea and periphery in human vision, SeCo processes self-proposed target object regions and their contexts separately, and then employs a learnable external memory for retrieving and updating context-relevant target information. To evaluate the contextual associations learned by the computational models, we introduced two evaluation protocols, lift-the-flap and object priming, addressing the problems of "what" and "where" in context reasoning. In both tasks, SeCo outperformed all state-of-the-art (SOTA) self-supervised learning methods by a significant margin. Our network analysis revealed that the external memory in SeCo learns to store prior contextual knowledge, facilitating target identity inference in lift-the-flap task. Moreover, we conducted psychophysics experiments and introduced a Human benchmark in Object Priming dataset (HOP). Our quantitative and qualitative results demonstrate that SeCo approximates human-level performance and exhibits human-like behavior. All our source code and data are publicly available here.

翻译：天空中的一个小天体不可能是大象。背景推理在视觉识别中至关重要, 当前的投入需要根据先前的经验和知识来解释。至今, 视觉识别中背景推理的研究主要在监督的学习方法下进行。是否可以通过自我监督的学习制度来捕捉背景知识的问题仍然没有得到充分探讨。在这里, 我们为从上下文中学习的自我监督学习建立了一种方法。我们提出了一个新的“ 自我监督的背景解释方法 ” 。我们提出了两种评估程序, 提升和对象解释背景解释, 向Seco提供的唯一投入是没有在自然场中存在多个对象的定性图像标记。类似人类视觉中叶子与边缘的区别, Seco 单独处理自导目标目标区域及其背景。然后使用可学习的外部记忆。在Secocial 数据库中,Secocial-dealalal- excial excial- excial excience exliversal- exliferations, sublical- sublical- sublical- exliveral- exal- exlical- subliveral- exal- laview- exlical- exal- exal- laview- liclegal- laview- legal- labal- labal- ex- labal- labololololol- ex- lablegal- ladeal- laview- ex- labal- labal- ex- lex- leglegal- labal- laxal- labal- 工作, 我们cal- sal- 工作, 工作,我们内部学习系统, 我们工作, 我们学会, 学会,我们内部学,我们的学习系统,我们的理论和Slial- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- slal- sal- leal- 和SOIal-