Generative Autoregressive Neural Networks (ARNN) have recently demonstrated exceptional results in image and language generation tasks, contributing to the growing popularity of generative models in both scientific and commercial applications. This work presents a physical interpretation of the ARNNs by reformulating the Boltzmann distribution of binary pairwise interacting systems into autoregressive form. The resulting ARNN architecture has weights and biases of its first layer corresponding to the Hamiltonian's couplings and external fields, featuring widely used structures like the residual connections and a recurrent architecture with clear physical meanings. However, the exponential growth, with system size, of the number of parameters of the hidden layers makes its direct application unfeasible. Nevertheless, its architecture's explicit formulation allows using statistical physics techniques to derive new ARNNs for specific systems. As examples, new effective ARNN architectures are derived from two well-known mean-field systems, the Curie-Weiss and Sherrington-Kirkpatrick models, showing superior performances in approximating the Boltzmann distributions of the corresponding physics model than other commonly used ARNNs architectures. The connection established between the physics of the system and the ARNN architecture provides a way to derive new neural network architectures for different interacting systems and interpret existing ones from a physical perspective.
翻译:生成自动递增神经网络(ARNN)最近在图像和语言生成任务方面展示出非同寻常的结果,有助于在科学和商业应用中日益普及基因模型。这项工作通过将布尔茨曼双向互动系统的二进制配对互动系统重新配制成自动递减形式,对ARNN作了物理解释。由此形成的ARNN结构的第一层具有与汉密尔顿的组合和外部领域相对应的分量和偏差,其第一层与汉密尔顿的组合和外部领域相对应,其特点是广泛使用的结构,如残余连接和具有明确物理意义的经常结构。然而,随着系统规模的指数增长,隐藏层参数数量的指数增长使得其直接应用变得不可行。然而,其结构的清晰配置允许使用统计物理技术为特定系统开发新的ARNNNS。例如,新的有效的ARNN结构源自两个众所周知的中位系统,即Curie-Weiss和Sherrington-Kirkpatrick模型,在接近波尔茨曼的物理模型分布上表现出优异性的表现,而不是通常使用的ARNNIS结构,为ARN的新的结构提供不同的连接。