Evolutionary relationships between species are usually inferred through phylogenetic analysis, which provides phylogenetic trees computed from allelic profiles built by sequencing specific regions of the sequences and abstracting them to categorical indexes. With growing exchanges of people and merchandise, epidemics have become increasingly important, and combining information of country-specific datasets can now reveal unknown spreading patterns. The phylogenetic analysis workflow is mainly composed of four consecutive steps, the distance calculation, distance correction, inference algorithm, and local optimization steps. There are many phylogenetic tools out there, however most implement only some of these steps and serve only one single purpose, such as one type of algorithm. Another problem with these is that they are often hard to use and integrate, since each of them has its own API. This project resulted a library that implements the four steps of the workflow, and is highly performant, extensible, reusable, and portable, while providing common APIs and documentation. It also differs from other tools in the sense that, it is able to stop and resume the workflow whenever the user desires, and it was built to be continuously extended and not just serve a single purpose. The time benchmarks conducted on this library show that its implementations of the algorithms conform to their theoretical time complexity. Meanwhile, the memory benchmarks showcase that the implementations of the NJ algorithms follow a linear memory complexity, while the implementations of the MST and GCP algorithms follow a quadratic memory complexity.
翻译:物种之间的进化关系通常通过植物基因分析来推断,这种分析通过顺序顺序的顺序排列和将其抽取为绝对指数等绝对指数等特定区域,从成形图中计算出植物基因树。随着人和商品的交流日益增多,流行病变得日益重要,将特定国家数据集的信息合并起来,现在可以揭示出未知的传播模式。植物基因分析工作流程主要由四个连续步骤组成:距离计算、距离校正、推断算法和地方优化步骤。那里有许多植物基因工具,但大多数只执行其中一些步骤,只服务于一个单一目的,例如一种算法。另一个问题是,由于这些步骤往往难以使用和整合,因为每一种流行病都有自己的API。这个项目产生了一个图书馆,可以执行工作流程的四个步骤,而且高度性、可扩展、可再使用和可移植,同时提供共同的API和文件。从这个意义上讲,它与其他工具不同,即只要用户的愿望是能够停止和恢复工作流程,而且只服务于一种单一目的的复杂程度,而其精度的算法的算法则可以持续地使用和符合一个单一的缩缩缩缩缩缩算基准。