Reconstructing the evolutionary history of a set of species is a central task in computational biology. In real data, it is often the case that some information is missing: the Incomplete Directed Perfect Phylogeny (IDPP) problem asks, given a collection of species described by a set of binary characters with some unknown states, to complete the missing states in such a way that the result can be explained with a perfect directed phylogeny. Pe'er et al. proposed a solution that takes $\tilde{O}(nm)$ time for $n$ species and $m$ characters. Their algorithm relies on pre-existing dynamic connectivity data structures: a computational study recently conducted by Fern{\'a}ndez-Baca and Liu showed that, in this context, complex data structures perform worse than simpler ones with worse asymptotic bounds. This gives us the motivation to look into the particular properties of the dynamic connectivity problem in this setting, so as to avoid the use of sophisticated data structures as a blackbox. Not only are we successful in doing so, and give a much simpler $\tilde{O}(nm)$-time algorithm for the IDPP problem; our insights into the specific structure of the problem lead to an asymptotically faster algorithm, that runs in optimal $O(nm)$ time.
翻译:重新构建一组物种的进化历史是计算生物学的一项核心任务。 在实际数据中,常常会出现缺少某些信息的情况:不完全的定向完美极极极性(IDPP)问题要求,鉴于一系列由一组二进制字符描述的物种和一些未知的状态,要完成缺失的状态,这样就可以以完美的定向植物素养来解释结果。 Pe'er et al. 提议了一个需要$\tilde{O}(nm) 时间的解决方案,用于美元物种和美元字符。它们的算法依赖于先前存在的动态连接数据结构:最近由Fern_'a}Dez-Baca和Liu进行的一项计算研究显示,在此背景下,复杂的数据结构的运行状况比更简单,且有更差的线条。这使我们有动机来研究这个环境中动态连接问题的具体属性,以避免使用复杂的数据结构作为黑箱。我们不仅成功地这样做,而且给我们的快速的IMIMIQ($P) 问题带来一个更简单的时间分析。