Deep learning has advanced from fully connected architectures to structured models organized into components, e.g., the transformer composed of positional elements, modular architectures divided into slots, and graph neural nets made up of nodes. In structured models, an interesting question is how to conduct dynamic and possibly sparse communication among the separate components. Here, we explore the hypothesis that restricting the transmitted information among components to discrete representations is a beneficial bottleneck. The motivating intuition is human language in which communication occurs through discrete symbols. Even though individuals have different understandings of what a ``"cat" is based on their specific experiences, the shared discrete token makes it possible for communication among individuals to be unimpeded by individual differences in internal representation. To discretize the values of concepts dynamically communicated among specialist components, we extend the quantization mechanism from the Vector-Quantized Variational Autoencoder to multi-headed discretization with shared codebooks and use it for discrete-valued neural communication (DVNC). Our experiments show that DVNC substantially improves systematic generalization in a variety of architectures -- transformers, modular architectures, and graph neural networks. We also show that the DVNC is robust to the choice of hyperparameters, making the method very useful in practice. Moreover, we establish a theoretical justification of our discretization process, proving that it has the ability to increase noise robustness and reduce the underlying dimensionality of the model.
翻译:深层次的学习已经从完全相连的结构结构发展到结构化的模型,从完全相连的结构结构发展到结构化的构成组成部分,例如由位置要素组成的变压器、模块结构分为空格和由节点组成的螺旋神经网。在结构化模型中,一个有趣的问题是如何在不同的组成部分之间进行动态和可能分散的交流。在这里,我们探讨一个假设,即将各组成部分之间传送的信息限于离散的表达方式是一个有利的瓶颈。激励直觉是人类语言,通过离散符号进行交流。尽管个人对“猫”基于其具体经验的“猫”有不同的理解,但共同的离散象征使得个人之间的交流有可能不受个人内部代表差异的阻碍。为了将概念的价值观在专家组成部分之间动态化,我们把四分化机制从矢量定量的自动电解调器扩大到多角度的离散化,并将它用于分立的代码通信(DVNC)。我们的实验表明,DVNC大大改进了各种结构的系统化集成集成 -- 变压器、模块结构的模块化能力,我们更深入地展示了高层次的系统化方法。