We provide a theoretical framework to study a phenomenon that we call one-shot generalization. This phenomenon refers to the ability of an algorithm to perform transfer learning within a single task, meaning that it correctly classifies a test point that has a single exemplar in the training set. We propose a simple data model and use it to study this phenomenon in two ways. First, we prove a non-asymptotic base-line -- kernel methods based on nearest-neighbor classification cannot perform one-shot generalization, independently of the choice of the kernel and the size of the training set. Second, we empirically show that the most direct neural network architecture for our data model performs one-shot generalization almost perfectly. This stark differential leads us to believe that the one-shot generalization mechanism is partially responsible for the empirical success of neural networks.
翻译:我们提供了一个理论框架来研究一种我们称之为一次性的概括现象。 这种现象是指算法在单一任务中进行转移学习的能力, 这意味着它正确地分类了一个测试点, 测试点在培训集中有一个单一的例子。 我们提出了一个简单的数据模型, 并且用它用两种方式来研究这一现象。 首先, 我们证明一种基于近邻分类的非无症状的基线 -- -- 内核方法不能进行一次性的概括, 独立于内核的选择和培训集的规模。 其次, 我们从经验上表明, 我们数据模型最直接的神经网络结构几乎完美地完成了一发一般化。 这一鲜明的差异让我们相信, 一发一般化机制对神经网络的经验成功负有部分责任。