Recurrent Neural Networks (RNNs) have achieved tremendous success in sequential data processing. However, it is quite challenging to interpret and verify RNNs' behaviors directly. To this end, many efforts have been made to extract finite automata from RNNs. Existing approaches such as exact learning are effective in extracting finite-state models to characterize the state dynamics of RNNs for formal languages, but are limited in the scalability to process natural languages. Compositional approaches that are scablable to natural languages fall short in extraction precision. In this paper, we identify the transition sparsity problem that heavily impacts the extraction precision. To address this problem, we propose a transition rule extraction approach, which is scalable to natural language processing models and effective in improving extraction precision. Specifically, we propose an empirical method to complement the missing rules in the transition diagram. In addition, we further adjust the transition matrices to enhance the context-aware ability of the extracted weighted finite automaton (WFA). Finally, we propose two data augmentation tactics to track more dynamic behaviors of the target RNN. Experiments on two popular natural language datasets show that our method can extract WFA from RNN for natural language processing with better precision than existing approaches.
翻译:经常性神经网络(NealNetworks)在连续处理数据方面取得了巨大的成功。然而,直接解释和核实RNNs的行为是相当困难的。为此,我们已作出许多努力,从RNS中提取有限的自动数据。现有的方法,例如精确学习,对于为正式语言提取确定RNS状态动态的限定状态模型是有效的,但对于处理自然语言的可缩放性却有限。对自然语言的可复制性方法在提取精度方面是不足的。在本文中,我们找出了对提取精度有重大影响的过渡性偏移问题。为了解决这个问题,我们提出了过渡性规则抽取方法,可以伸缩到自然语言处理模式,并有效地提高提取精确度。具体地说,我们提出了一个实验方法,以补充过渡图中缺失的规则。此外,我们进一步调整过渡矩阵,以提高抽取的加权有限自动马顿(WFA)的背景觉悟能力。最后,我们提出两种数据增强能力策略,以追踪目标RNNE的更动态行为。在两种流行的自然语言数据处理中,用两种方法可以比RNNFA更好的自然精确方法提取WFA。