Tree kernels have been proposed to be used in many areas as the automatic learning of natural language applications. In this paper, we propose a new linear time algorithm based on the concept of weighted tree automata for SubTree kernel computation. First, we introduce a new class of weighted tree automata, called Root-Weighted Tree Automata, and their associated formal tree series. Then we define, from this class, the SubTree automata that represent compact computational models for finite tree languages. This allows us to design a theoretically guaranteed linear-time algorithm for computing the SubTree Kernel based on weighted tree automata intersection. The key idea behind the proposed algorithm is to replace DAG reduction and nodes sorting steps used in previous approaches by states equivalence classes computation allowed in the weighted tree automata approach. Our approach has three major advantages: it is output-sensitive, it is free sensitive from the tree types (ordered trees versus unordered trees), and it is well adapted to any incremental tree kernel based learning methods. Finally, we conduct a variety of comparative experiments on a wide range of synthetic tree languages datasets adapted for a deep algorithm analysis. The obtained results show that the proposed algorithm outperforms state-of-the-art methods.
翻译:提议在许多区域使用树内核作为自然语言应用的自动学习。 在本文中, 我们基于 SubTree 的加权树自动计算概念提出一个新的线性时间算法。 首先, 我们引入了一个新的加权树内核, 叫做 root- Weighted 树自动化, 及其相关的正式树系列。 然后我们从这个类别中定义了代表有限树语言的紧凑计算模型的 SubTree 自动数据。 这使我们能够设计一种理论上有保障的线性线性时间算法, 用于在加权树内核交叉点的基础上计算 SubTree Kernel 。 提议的算法背后的关键想法是替换以前方法中使用的带宽树内核减法和节性排序步骤, 在加权树内核内核法方法中允许进行等等值计算。 我们的方法有三大优点: 它具有产出敏感性, 它不受树型( 定型树与非定型树型树型树类), 并且它完全适应任何基于增量树内核的学习方法。 最后, 我们对一系列的合成树内核算法进行了各种比较性实验, 显示深度的模型分析结果。