Machine learning on tree data has been mostly focused on trees as input. Much less research has investigates trees as output, like in molecule optimization for drug discovery or hint generation for intelligent tutoring systems. In this work, we propose a novel autoencoder approach, called recursive tree grammar autoencoder (RTG-AE), which encodes trees via a bottom-up parser and decodes trees via a tree grammar, both controlled by neural networks that minimize the variational autoencoder loss. The resulting encoding and decoding functions can then be employed in subsequent tasks, such as optimization and time series prediction. RTG-AE combines variational autoencoders, grammatical knowledge, and recursive processing. Our key message is that this combination improves performance compared to only combining two of these three components. In particular, we show experimentally that our proposed method improves the autoencoding error, training time, and optimization score on four benchmark datasets compared to baselines from the literature.
翻译:在树上学习的机器大多侧重于作为投入的树木。 远没有研究将树木作为输出来调查, 如用于药物发现分子优化或智能辅导系统的提示生成。 在这项工作中,我们提议了一种新型自动编码方法,叫做递归树语法自动编码器(RTG-AE),它通过自下而上的剖析器将树木编码,并通过树语法解码树,两者都由神经网络控制,以尽量减少变异自动编码器损失。由此产生的编码和解码功能随后可以用于其他任务,例如优化和时间序列预测。 RTG- AE 结合了变异自动编码器、 语法学知识以及递归处理。 我们的关键信息是, 这种组合可以提高性能,而只是将这三个组成部分中的两个合并起来。 特别是, 我们实验地显示, 我们拟议的方法改善了自动编码错误、 培训时间和四个基准数据集的优化得分, 与文献的基线比较 。