具有矩阵代表的经常神经网络的记忆能力 (Memory Capacity of Recurrent Neural Networks with Matrix Representation)

It is well known that canonical recurrent neural networks (RNNs) faced limitations in learning long-term dependencies which has been addressed by memory structures in long short-term memory (LSTM) networks. Neural Turing machines (NTMs) are novel RNNs that implement the notion of programmable computers with neural network controllers which can learn simple algorithmic tasks. Matrix neural networks feature matrix representation which inherently preserves the spatial structure of data when compared to canonical neural networks that use vector-based representation. The matrix-representation of neural networks also have the potential to provide better memory capacity. \textcolor{black}{In this paper, we define and study a probabilistic notion of memory capacity based on Fisher information for matrix-based RNNs. We find bounds on memory capacity for such networks under various hypotheses and compare them with their vector counterparts. In particular, we show that the memory capacity of such networks is bounded by $N^2$ for $N\times N$ state matrix which generalizes the one known for vector networks. We also show and analyze the increase in memory capacity for such networks which is introduced when one exhibits an external state memory, such as Neural Turing Machines (NTMs). This motivates us to construct NTMs with RNN controllers with matrix-based representation of external memory, leading us to introduce Matrix NTMs. We demonstrate the performance of this class of memory networks under certain algorithmic learning tasks such as copying and recall and compare it with Matrix RNNs. We find an improvement in the performance of Matrix NTMs by the addition of external memory.

翻译：众所周知,光学经常性神经网络(RNNS)在学习长期依赖性方面受到限制,而长期短期内存(LSTM)网络的内存结构已经解决了这种学习长期依赖性。神经外观机器(NTMS)是新颖的RNNS,它可以实施带有神经网络控制器的可编程计算机概念,可以学习简单的算法任务。矩阵神经网络具有矩阵代表性,它与使用矢量代表的内向神经网络相比,能够保护数据的空间结构。神经网络的矩阵代表性也有可能提供更好的内存能力。在本文件中,我们根据基于矩阵的内存信息的渔业信息定义和研究内存能力的概率概念。我们在各种假设下发现这类网络的内存能力,并将它们与矢量对应方的对应方进行比较。我们显示,与内向内向内端网络的内存能力有一般的内存能力。我们用内存的内存的内存数据矩阵将内存能力与内存的内存能力进行总体化,我们用内存的内存的内存的内存能力通过内存的内存能力进行这种内存的内存的内存的内存能力,我们通过内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存能力向内存的内存的内存的内存的内存的内存能力将显示的内存的内存的内存能力将显示。