Almost 30 years ago, Zhang and Shasha (1989) published a seminal paper describing an efficient dynamic programming algorithm computing the tree edit distance, that is, the minimum number of node deletions, insertions, and replacements that are necessary to transform one tree into another. Since then, the tree edit distance has been widely applied, for example in biology and intelligent tutoring systems. However, the original paper of Zhang and Shasha can be challenging to read for newcomers and it does not describe how to efficiently infer the optimal edit script. In this contribution, we provide a comprehensive tutorial to the tree edit distance algorithm of Zhang and Shasha. We further prove metric properties of the tree edit distance, and describe efficient algorithms to infer the cheapest edit script, as well as a summary of all cheapest edit scripts between two trees.
翻译:近30年前,张和沙沙(1989年)发表了一份重要文件,描述了一种高效的动态编程算法,计算了树编辑距离,即将一棵树转化为另一棵树所必需的最低限度节点删除、插入和替换数。自那时以来,树编辑距离被广泛应用,例如在生物学和智能辅导系统。然而,张和沙沙(1989年)的原始文件对于新来者来说可能具有挑战性,它并没有描述如何有效地推算最佳编辑脚本。在这个文件中,我们为张和沙沙(Zhang)的树编辑距离算法提供了全面的辅导。我们进一步证明了树编辑距离的量化特性,并描述了用于推断最廉价编辑脚本的有效算法,以及两棵树之间所有最廉价编辑脚本的概要。