Machine learning on trees has been mostly focused on trees as input to algorithms. Much less research has investigated trees as output, which has many applications, such as molecule optimization for drug discovery, or hint generation for intelligent tutoring systems. In this work, we propose a novel autoencoder approach, called recursive tree grammar autoencoder (RTG-AE), which encodes trees via a bottom-up parser and decodes trees via a tree grammar, both learned via recursive neural networks that minimize the variational autoencoder loss. The resulting encoder and decoder can then be utilized in subsequent tasks, such as optimization and time series prediction. RTG-AEs are the first model to combine variational autoencoders, grammatical knowledge, and recursive processing. Our key message is that this unique combination of all three elements outperforms models which combine any two of the three. In particular, we perform an ablation study to show that our proposed method improves the autoencoding error, training time, and optimization score on synthetic as well as real datasets compared to four baselines. We further prove that RTG-AEs parse and generate trees in linear time and are expressive enough to handle all regular tree grammars.
翻译:在树上学习的机器大多集中在树上,作为算法的输入。远没有研究把树木作为产出来调查,结果有许多应用,例如药物发现分子优化,或智能辅导系统的提示生成。在这项工作中,我们提议了一种新型自动解码法,称为递归树语语法自动解码器(RTG-AE),它通过自下而上的剖析器将树木编码成co,并通过树语法解码将树木解码,两者都通过循环神经网络来最大限度地减少变异自动解码器损失。由此产生的编码器和解码器可以在随后的工作中使用,例如优化和时间序列预测。在这项工作中,RTG-AEs是第一个将变异自动解码器、语法知识和循环处理结合起来的模型。我们的关键信息是,所有三个元素的独特组合都通过三者中的任何一个组合来解码模型。特别是,我们进行一项通缩研究,以显示我们提议的方法可以改进自动解码错误、培训时间和对合成和直径分析的分数分数分数,作为正常的基线,我们可以证明所有合成和直线式的序列的序列都足够地进行。