Artificial intelligence (AI) systems struggle to generalize beyond their training data and abstract general properties from the specifics of the training examples. We propose a model that reproduces the apparent human ability to come up with a number sense through unsupervised everyday experience. The ability to understand and manipulate numbers and quantities emerges during childhood, but the mechanism through which humans acquire and develop this ability is still poorly understood. In particular, it is not known whether acquiring such a number sense is possible without supervision from a teacher. We explore this question through a model, assuming that the learner is able to pick and place small objects and will spontaneously engage in undirected manipulation. We assume that the learner's visual system will monitor the changing arrangements of objects in the scene and will learn to predict the effects of each action by comparing perception with the efferent signal of the motor system. We model perception using standard deep networks for feature extraction and classification. We find that, from learning the unrelated task of action prediction, an unexpected image representation emerges exhibiting regularities that foreshadow the perception and representation of numbers. These include distinct categories for the first few natural numbers, a strict ordering of the numbers, and a one-dimensional signal that correlates with numerical quantity. As a result, our model acquires the ability to estimate numerosity and subitize. Remarkably, subitization and numerosity estimation extrapolate to scenes containing many objects, far beyond the three objects used during training. We conclude that important aspects of a facility with numbers and quantities may be learned without teacher supervision.
翻译:暂无翻译