Robust generalization to new concepts has long remained a distinctive feature of human intelligence. However, recent progress in deep generative models has now led to neural architectures capable of synthesizing novel instances of unknown visual concepts from a single training example. Yet, a more precise comparison between these models and humans is not possible because existing performance metrics for generative models (i.e., FID, IS, likelihood) are not appropriate for the one-shot generation scenario. Here, we propose a new framework to evaluate one-shot generative models along two axes: sample recognizability vs. diversity (i.e., intra-class variability). Using this framework, we perform a systematic evaluation of representative one-shot generative models on the Omniglot handwritten dataset. We first show that GAN-like and VAE-like models fall on opposite ends of the diversity-recognizability space. Extensive analyses of the effect of key model parameters further revealed that spatial attention and context integration have a linear contribution to the diversity-recognizability trade-off. In contrast, disentanglement transports the model along a parabolic curve that could be used to maximize recognizability. Using the diversity-recognizability framework, we were able to identify models and parameters that closely approximate human data.
翻译:对新概念的严格概括长期以来一直是人类智力的一个独特特征。然而,在深层基因模型方面最近的进展,现已导致神经结构的形成,能够综合单一培训实例中未知视觉概念的新事例。然而,这些模型和人类之间更精确的比较是不可能的,因为现有的基因模型(即FID、IS、可能性)的性能衡量标准不适合一粒子的一代情景。在这里,我们对关键模型参数的影响进行的广泛分析进一步表明,空间关注和背景整合对多样性可识别性贸易(即阶级内部变异性)两个轴的单一基因化模型有线性贡献。我们利用这个框架,对Omniglot手写数据集中具有代表性的一发基因模型进行系统评估。我们首先表明,类似GAN和VAE的模型与多样性可辨识性空间相悖。对关键模型参数的影响进行广泛分析后发现,空间关注和背景整合对多样性可辨识性贸易(即分类变异性)多样性贸易(即变异性变异性)具有线性贡献。在对比、混杂性模型中,我们使用的可辨性模型可辨性模型可辨化性可辨性。