Recurrent neural networks are widely used in speech and language processing. Due to dependency on the past, standard algorithms for training these models, such as back-propagation through time (BPTT), cannot be efficiently parallelised. Furthermore, applying these models to more complex structures than sequences requires inference time approximations, which introduce inconsistency between inference and training. This paper shows that recurrent neural networks can be reformulated as fixed-points of non-linear equation systems. These fixed-points can be computed using an iterative algorithm exactly and in as many iterations as the length of any given sequence. Each iteration of this algorithm adds one additional Markovian-like order of dependencies such that upon termination all dependencies modelled by the recurrent neural networks have been incorporated. Although exact fixed-points inherit the same parallelization and inconsistency issues, this paper shows that approximate fixed-points can be computed in parallel and used consistently in training and inference including tasks such as lattice rescoring. Experimental validation is performed in two tasks, Penn Tree Bank and WikiText-2, and shows that approximate fixed-points yield competitive prediction performance to recurrent neural networks trained using the BPTT algorithm.
翻译:经常神经网络被广泛用于语言和语言处理。由于对过去的依赖性,培训这些模型的标准算法,如时反向分析(BBTT),无法有效平行。此外,将这些模型应用于比序列更复杂的结构比序列更复杂的结构需要推导时间近似,这就造成推论和培训之间的不一致。本文表明,经常性神经网络可以重新拟订为非线性方程系统的固定点。这些固定点可以精确地使用迭代算法和任何特定序列长度的迭代法来计算。这种算法的每一次迭代法都增加了一种类似于Markovian的依附顺序,因此在终止时,所有由经常性神经网络模拟的依附关系都得到吸收。虽然精确的固定点继承了相同的平行和不一致问题,但本文表明,大约固定点可以同时计算,在培训和推论中,包括拉蒂斯重线等任务。实验性验证是在两个任务中进行的,即Penn Trik和WikPext-2, 并显示,使用经过训练的固定点的经常结果预测,以经过训练的固定点竞争性算法进行。