The graph traversal edit distance (GTED) is an elegant distance measure defined as the minimum edit distance between strings reconstructed from Eulerian trails in two edge-labeled graphs. GTED can be used to infer evolutionary relationships between species by comparing de Bruijn graphs directly without the computationally costly and error-prone process of genome assembly. Ebrahimpour Boroojeny et al.~(2018) suggest two ILP formulations for GTED and claim that GTED is polynomially solvable because the linear programming relaxation of one of the ILP always yields optimal integer solutions. The result that GTED is polynomially solvable is contradictory to the complexity results of existing string-to-graph matching problems. We resolve this conflict in complexity results by proving that GTED is NP-complete and showing that the ILPs proposed by Ebrahimpour Boroojeny et al. do not solve GTED but instead solve for a lower bound of GTED and are not solvable in polynomial time. In addition, we provide the first two, correct ILP formulations of GTED and evaluate their empirical efficiency. These results provide solid algorithmic foundations for comparing genome graphs and point to the direction of approximation heuristics. The source code to reproduce experimental results is available at https://github.com/Kingsford-Group/gtednewilp/.
翻译:暂无翻译