Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics and can be approximated with the help of similarity measures between gene sequences, albeit not without errors. The corresponding graph editing problem can be used as a means of error correction. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data. Since BMGs have a characterization in terms of consistency of a certain set of rooted triples, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho's supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing.
翻译:最佳匹配图形( BMGs) 是一组彩色的测算, 自然出现在数学的血压中, 并且可以近似于基因序列之间的相似度度量, 尽管不是没有差错。 相应的图表编辑问题可以用作纠正错误的手段。 由于 BMGs 的弧设置修改问题是NP- 完整的, 因此, 如果BMGs 用于生物序列数据的实际分析, 就需要高效的超常性。 由于 BMGs 具有一定一组有根的三重的定型一致性特征, 我们考虑三重制的超常性。 作为替代办法, 我们显示它与一个设定的分解问题有着密切的联系, 这会导致类似于 Aho 超级树的算法的自上至下递归回算法, 并产生 BMG 的编辑算法, 与它们离开 BMGs 的变量一致。 广泛的基准显示, 分区步骤的社区检测算法最适合 BMG 编辑 。