We present an algorithm for extracting a subclass of the context free grammars (CFGs) from a trained recurrent neural network (RNN). We develop a new framework, pattern rule sets (PRSs), which describe sequences of deterministic finite automata (DFAs) that approximate a non-regular language. We present an algorithm for recovering the PRS behind a sequence of such automata, and apply it to the sequences of automata extracted from trained RNNs using the L* algorithm. We then show how the PRS may converted into a CFG, enabling a familiar and useful presentation of the learned language. Extracting the learned language of an RNN is important to facilitate understanding of the RNN and to verify its correctness. Furthermore, the extracted CFG can augment the RNN in classifying correct sentences, as the RNN's predictive accuracy decreases when the recursion depth and distance between matching delimiters of its input sequences increases.
翻译:我们提出了一个从训练有素的经常性神经网络中提取上下文自由语法的亚类算法。 我们开发了一个新的框架、 模式规则集( 模式规则集), 描述一种非常规语言的确定性有限自动数据序列( DFAs) 。 我们提出了一个在这种自动数据序列背后恢复 PRS 的算法, 并将其应用到使用 L* 算法从受过训练的 RNS 中提取的自动磁数据序列中。 然后我们展示了 PRS 如何转换成一个能让人熟悉和有用地展示所学语言的 。 提取一个 RNN 的学习语言对于促进理解并核实其正确性非常重要。 此外, 提取的 CFG 可以在对正确的句子进行分类时增加 RNN 的预测精度, 当输入序列的递解深度和匹配的调试器之间的距离增加时, RNN 的预测精度会降低 。