Human vision involves parsing and representing objects and scenes using structured representations based on part-whole hierarchies. Computer vision and machine learning researchers have recently sought to emulate this capability using capsule networks, reference frames and active predictive coding, but a generative model formulation has been lacking. We introduce Recursive Neural Programs (RNPs), which, to our knowledge, is the first neural generative model to address the part-whole hierarchy learning problem. RNPs model images as hierarchical trees of probabilistic sensory-motor programs that recursively reuse learned sensory-motor primitives to model an image within different reference frames, forming recursive image grammars. We express RNPs as structured variational autoencoders (sVAEs) for inference and sampling, and demonstrate parts-based parsing, sampling and one-shot transfer learning for MNIST, Omniglot and Fashion-MNIST datasets, demonstrating the model's expressive power. Our results show that RNPs provide an intuitive and explainable way of composing objects and scenes, allowing rich compositionality and intuitive interpretations of objects in terms of part-whole hierarchies.
翻译:计算机视觉和机器学习研究人员最近试图利用胶囊网络、参照框架和积极的预测编码来效仿这一能力,但缺乏一种基因模型。我们引入了神经精密程序(RNPs),据我们所知,这是解决全层次部分学习问题的首个神经基因化模型。RNP的模型图像是概率感官-运动程序高层次树的级结构,这种程序反复利用学到的感官-运动原始生物来在不同参照框架内模拟图像,形成循环图像语法。我们把RNPs作为结构化的变异自动编码器(VAEs)来表示,用于推断和取样,并展示基于部分的分解、取样和一发传输学习,用于MNIST、Omniglot和Fashion-MNIST的数据集,展示模型的显性力量。我们的成果显示RNPs提供了一种直观和可解释的立式物体和图像的立式解释方式。