Deep learning has advanced from fully connected architectures to structured models organized into components, e.g., the transformer composed of positional elements, modular architectures divided into slots, and graph neural nets made up of nodes. In structured models, an interesting question is how to conduct dynamic and possibly sparse communication among the separate components. Here, we explore the hypothesis that restricting the transmitted information among components to discrete representations is a beneficial bottleneck. The motivating intuition is human language in which communication occurs through discrete symbols. Even though individuals have different understandings of what a "cat" is based on their specific experiences, the shared discrete token makes it possible for communication among individuals to be unimpeded by individual differences in internal representation. To discretize the values of concepts dynamically communicated among specialist components, we extend the quantization mechanism from the Vector-Quantized Variational Autoencoder to multi-headed discretization with shared codebooks and use it for discrete-valued neural communication (DVNC). Our experiments show that DVNC substantially improves systematic generalization in a variety of architectures -- transformers, modular architectures, and graph neural networks. We also show that the DVNC is robust to the choice of hyperparameters, making the method very useful in practice. Moreover, we establish a theoretical justification of our discretization process, proving that it has the ability to increase noise robustness and reduce the underlying dimensionality of the model.
翻译:深层次的学习已经从完全相连的结构结构发展到结构化的模型,从完全相连的结构结构发展到结构化的构成组成部分,例如由位置要素组成的变压器、模块结构分为空格,以及由节点组成的图形神经网。在结构化模型中,一个有趣的问题是如何进行动态的和不同组成部分之间可能稀少的交流。在这里,我们探讨一个假设,即将各组成部分之间传送的信息限制在离散的表达形式上是一种有益的瓶颈。激励直觉是人类语言,通过离散的符号进行交流。尽管个人对“猫”基于其具体经验的内容有不同的理解,但共同的离散标志使得个人之间的交流能够不受个人在内部代表方面的差异的影响。为了将概念的价值观在专家组成部分之间动态地传播,我们把从矢量定量的自动电解调器的四分化机制扩大到多头分解机制,与共享的代码库使用它来进行沟通。我们的实验表明,DVNC大大改进了各种结构的系统化 -- -- 变压器、模块结构结构结构结构的分明性,我们更能化的理论性更能地解释。