Humans have a remarkably large capacity to store detailed visual information in long-term memory even after a single exposure, as demonstrated by classic experiments in psychology. For example, Standing (1973) showed that humans could recognize with high accuracy thousands of pictures that they had seen only once a few days prior to a recognition test. In deep learning, the primary mode of incorporating new information into a model is through gradient descent in the model's parameter space. This paper asks whether deep learning via gradient descent can match the efficiency of human visual long-term memory to incorporate new information in a rigorous, head-to-head, quantitative comparison. We answer this in the negative: even in the best case, models learning via gradient descent appear to require approximately 10 exposures to the same visual materials in order to reach a recognition memory performance humans achieve after only a single exposure. Prior knowledge induced via pretraining and bigger model sizes improve performance, but these improvements are not very visible after a single exposure (it takes a few exposures for the improvements to become apparent), suggesting that simply scaling up the pretraining data size or model size might not be enough for the model to reach human-level memory efficiency.
翻译:人类在长期记忆中存储详细视觉信息的能力非常巨大,即使在一次接触后也是如此,正如心理学经典实验所证明的那样。例如,Pandi(1973)显示,人类在认知测试前几天只见过一次,就能以高精度识别数千张照片。在深层次的学习中,将新信息纳入模型的主要模式是通过模型参数空间的梯度下降。本文询问,通过梯度下降的深度学习能否与人类视觉长期记忆在严格、头部对头部和数量比较中包含新信息的效率相匹配。我们从否定的角度回答这个问题:即使最理想的情况是,通过梯度下降的模型学习似乎需要大约10次接触相同的视觉材料,才能在一次接触后达到认知记忆性表现。以前通过训练前和更大模型大小而获得的知识提高了性能,但这些改进在一次接触后并不十分明显(需要几处暴露才能使改进变得明显),这意味着仅仅扩大培训前数据大小或模型大小可能不足以使模型达到人类的记忆效率。