This paper does not describe a working system. Instead, it presents a single idea about representation which allows advances made by several different groups to be combined into an imaginary system called GLOM. The advances include transformers, neural fields, contrastive representation learning, distillation and capsules. GLOM answers the question: How can a neural network with a fixed architecture parse an image into a part-whole hierarchy which has a different structure for each image? The idea is simply to use islands of identical vectors to represent the nodes in the parse tree. If GLOM can be made to work, it should significantly improve the interpretability of the representations produced by transformer-like systems when applied to vision or language
翻译:本文没有描述一个工作系统。 相反,本文没有描述一个工作系统,而是提出了一个关于代表的单一概念,它允许将若干不同群体的进展合并成一个称为GLOM的想象系统。这些进步包括变压器、神经场、对比式代表学习、蒸馏和胶囊。 GLOM回答了这样一个问题:一个具有固定结构的神经网络如何将图像分析成一个全半层结构,每个图像的结构不同?这个概念只是利用相同矢量的岛屿来代表剖析树中的节点。如果GLOM能够发挥作用,它应该大大改进变压器式系统在应用视觉或语言时产生的表达的可解释性。