Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modalities. General purpose models typically make few assumptions about the underlying data-structure and are known to perform well in the large-data regime. At the same time, there has been growing interest in modular neural architectures that represent the data using sparsely interacting modules. These models can be more robust out-of-distribution, computationally efficient, and capable of sample-efficient adaptation to new data. However, they tend to make domain-specific assumptions about the data, and present challenges in how module behavior (i.e., parameterization) and connectivity (i.e., their layout) can be jointly learned. In this work, we introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs) that jointly learns the parameterization and a sparse connectivity of neural modules without using domain knowledge. NACs are best understood as the combination of two systems that are jointly trained end-to-end: one that determines the module configuration and the other that executes it on an input. We demonstrate qualitatively that NACs learn diverse and meaningful module configurations on the NLVR2 dataset without additional supervision. Quantitatively, we show that by incorporating modularity in this way, NACs improve upon a strong non-modular baseline in terms of low-shot adaptation on CIFAR and CUBs dataset by about 10%, and OOD robustness on Tiny ImageNet-R by about 2.5%. Further, we find that NACs can achieve an 8x speedup at inference time while losing less than 3% performance. Finally, we find NACs to yield competitive results on diverse data modalities spanning point-cloud classification, symbolic processing and text-classification from ASCII bytes, thereby confirming its general purpose nature.
翻译:近期的工作已经看到通用神经结构的发展,这些结构可以被训练来完成不同数据模式的任务。一般目的模型通常对基础数据结构做出很少的假设,而且已知在大数据系统中运行良好。与此同时,对模块神经结构的兴趣日益浓厚,这些模块结构代表了数据,使用鲜为互动模块。这些模型可以更强有力地在分配之外,计算效率更高,并且能够对新数据进行抽样高效的适应。然而,它们往往对数据做出特定领域的假设,并提出了模块行为(即参数化)和连通(即其版式)如何可以共同学习的挑战。在这项工作中,我们引入了一个总目标,而模块神经神经神经神经结构则称为神经加速电路。这些模型可以在不使用域知识的情况下共同学习神经同步模块的参数化和松散连通性。NAC最能被理解的是,由共同培训的终端到终端的两种系统的组合:一个决定模块配置的多样化,另一个决定着模块在不断下降的输入。我们用NAAC在NR2的模型中,在不进行量化的升级的模型中,在不更新和升级的模型中,我们通过不断的IMFILADAD的升级的升级的模型中,在10的模型中,最终的运行中学习到一个有意义的数据。