Humans have a remarkably large capacity to store detailed visual information in long-term memory even after a single exposure, as demonstrated by classic experiments in psychology. For example, Standing (1973) showed that humans could recognize with high accuracy thousands of pictures that they had seen only once a few days prior to a recognition test. In deep learning, the primary mode of incorporating new information into a model is through gradient descent in the model's parameter space. This paper asks whether deep learning via gradient descent can match the efficiency of human visual long-term memory to incorporate new information in a rigorous, head-to-head, quantitative comparison. We answer this in the negative: even in the best case, models learning via gradient descent require approximately 10 exposures to the same visual materials in order to reach a recognition memory performance humans achieve after only a single exposure. Prior knowledge induced via pretraining and bigger model sizes improve performance, but these improvements are not very visible after a single exposure (it takes a few exposures for the improvements to become apparent), suggesting that simply scaling up the pretraining data size or model size might not be a feasible strategy to reach human-level memory efficiency.
翻译:人类在长期记忆中存储详细视觉信息的能力非常巨大,即使在一次接触后也是如此,正如心理学经典实验所证明的那样。例如,Pandi(1973)显示,人类在确认测试前几天只看到过几天,就能以高精度识别数千张照片。在深层次的学习中,将新信息纳入模型的主要模式是通过模型参数空间的梯度下降。本文询问,通过梯度下降的深度学习能否与人类视觉长期记忆在严格、头部对头部、数量比较中包含新信息的效率相匹配。我们从否定的角度回答这个问题:即使是在最好的情况下,通过梯度下降进行模型学习需要大约10次接触相同的视觉材料,才能在一次接触后达到认知记忆性表现。 以往通过培训前和更大模型大小来引导的知识提高了性能,但这些改进在一次接触后并不十分明显(这需要几次暴露才能使改进变得明显),这意味着仅仅扩大培训前的数据大小或模型大小可能不是达到人类记忆效率的可行战略。