Simple recurrent neural networks (RNNs) and their more advanced cousins LSTMs etc. have been very successful in sequence modeling. Their theoretical understanding, however, is lacking and has not kept pace with the progress for feedforward networks, where a reasonably complete understanding in the special case of highly overparametrized one-hidden-layer networks has emerged. In this paper, we make progress towards remedying this situation by proving that RNNs can learn functions of sequences. In contrast to the previous work that could only deal with functions of sequences that are sums of functions of individual tokens in the sequence, we allow general functions. Conceptually and technically, we introduce new ideas which enable us to extract information from the hidden state of the RNN in our proofs -- addressing a crucial weakness in previous work. We illustrate our results on some regular language recognition problems.
翻译:简单的经常性神经网络(RNNS)及其较先进的堂兄弟LSTMs等在序列建模方面非常成功。 但是,它们的理论理解缺乏,也跟不上进化网络的进展,在进化网络方面出现了合理的全面理解,在高度超分的单层网络这一特殊情况下,出现了合理的全面理解。在本文件中,我们通过证明RNS能够学习序列功能,在纠正这种情况方面取得了进展。与以前的工作相比,只有处理序列中单个符号功能的序列函数,我们才允许一般功能。在概念上和技术上,我们提出了新的想法,使我们能够从我们证据中隐藏的RNN的状态中提取信息 -- -- 解决以前工作中的一个关键弱点。我们用一些经常性的语言识别问题来说明我们的结果。