Self-supervised representation learning is a fundamental problem in computer vision with many useful applications (e.g., image search, instance level recognition, copy detection). In this paper we present a new contrastive self-supervised representation learning algorithm in the context of Copy Detection in the 2021 Image Similarity Challenge hosted by Facebook AI Research. Previous work in contrastive self-supervised learning has identified the importance of being able to optimize representations while ``pushing'' against a large number of negative examples. Representative previous solutions either use large batches enabled by modern distributed training systems or maintain queues or memory banks holding recently evaluated representations while relaxing some consistency properties. We approach this problem from a new angle: We directly learn a query model and a key model jointly and push representations against a very large number (e.g., 1 million) of negative representations in each SGD step. We achieve this by freezing the backbone on one side and by alternating between a Q-optimization step and a K-optimization step. During the competition timeframe, our algorithms achieved a micro-AP score of 0.3401 on the Phase 1 leaderboard, significantly improving over the baseline $\mu$AP of 0.1556. On the final Phase 2 leaderboard, our model scored 0.1919, while the baseline scored 0.0526. Continued training yielded further improvement. We conducted an empirical study to compare the proposed approach with a SimCLR style strategy where the negative examples are taken from the batch only. We found that our method ($\mu$AP of 0.3403) significantly outperforms this SimCLR-style baseline ($\mu$AP of 0.2001).
翻译:由Facebook AI Research 主办、 2021 图像相似性挑战 的复制检测中, 自我监督的代表学习是计算机愿景中一个根本性的问题。 由Facebook AI Research 主办、 由自我监督的自我监督的学习是许多有用应用程序( 例如图像搜索、 试级识别、 复制检测) 的计算机愿景中的一个根本性问题 。 由Facebook AI Research 主办、 2021 图像相似性挑战 的复制检测中, 我们展示了一个新的对比鲜明的自我监督的代表学习算法 。 由对比式自我监督的学习确定了能够优化演示的同时“ 推动” 与大量负面实例的重要性 。 以往的解决方案要么使用现代分布式培训系统提供的大批批量, 要么维持最近评估过的队列或记忆库, 一方面保持了最近评价过的演示面, 一方面放松了最近评价的某些一致性特性。 我们直接学习了一个自 0. 3401 和关键模型, 联合推行一个演示模式, 与我们 0. 5 SA AM 标准 战略 进一步改进了 。