This paper develops a general framework for learning interpretable data representation via Long Short-Term Memory (LSTM) recurrent neural networks over hierarchal graph structures. Instead of learning LSTM models over the pre-fixed structures, we propose to further learn the intermediate interpretable multi-level graph structures in a progressive and stochastic way from data during the LSTM network optimization. We thus call this model the structure-evolving LSTM. In particular, starting with an initial element-level graph representation where each node is a small data element, the structure-evolving LSTM gradually evolves the multi-level graph representations by stochastically merging the graph nodes with high compatibilities along the stacked LSTM layers. In each LSTM layer, we estimate the compatibility of two connected nodes from their corresponding LSTM gate outputs, which is used to generate a merging probability. The candidate graph structures are accordingly generated where the nodes are grouped into cliques with their merging probabilities. We then produce the new graph structure with a Metropolis-Hasting algorithm, which alleviates the risk of getting stuck in local optimums by stochastic sampling with an acceptance probability. Once a graph structure is accepted, a higher-level graph is then constructed by taking the partitioned cliques as its nodes. During the evolving process, representation becomes more abstracted in higher-levels where redundant information is filtered out, allowing more efficient propagation of long-range data dependencies. We evaluate the effectiveness of structure-evolving LSTM in the application of semantic object parsing and demonstrate its advantage over state-of-the-art LSTM models on standard benchmarks.
翻译:本文为通过长期短期内存(LSTM) 经常性神经神经网络在高层次图形结构中学习可解释的数据演示开发了一个总体框架。 我们建议, 而不是通过在预固定结构中学习 LSTM 模型。 我们建议, 以渐进和随机的方式, 从 LSTM 网络优化期间的数据中进一步学习中间可解释的多层次图形结构。 因此, 我们将此模型称为结构动态LSTM 模型。 特别是, 最初的元素级图形演示, 每个节点都是一个小数据元素, 结构- 不断演变的LSTM 结构会逐渐改变多层次的图形表达方式。 结构- 结构- 结构- 不断演化的LSTM 模型, 在堆叠式 LSTM 结构中, 将两个连接的节点的兼容性从相应的 LSTM 门户输出出来, 用来产生一个合并的概率。 因此, 候选人的图形结构是生成的, 节点会组合成一个小层次的数据结构, 然后我们生成新的图表结构, 以地平面结构- 结构- 将更高层次结构- 结构- 将一个更精确的比重的图像化过程, 将一个最高级的变化的概率,, 逐渐地算成一个 。