通过整数线性线性方案规划解决流动分解问题安全框架 (A Safety Framework for Flow Decomposition Problems via Integer Linear Programming)

Many important problems in Bioinformatics (e.g., assembly or multi-assembly) admit multiple solutions, while the final objective is to report only one. A common approach to deal with this uncertainty is finding safe partial solutions (e.g., contigs) which are common to all solutions. Previous research on safety has focused on polynomially-time solvable problems, whereas many successful and natural models are NP-hard to solve, leaving a lack of "safety tools" for such problems. We propose the first method for computing all safe solutions for an NP-hard problem, minimum flow decomposition. We obtain our results by developing a "safety test" for paths based on a general Integer Linear Programming (ILP) formulation. Moreover, we provide implementations with practical optimizations aimed to reduce the total ILP time, the most efficient of these being based on a recursive group-testing procedure. Results: Experimental results on the transcriptome datasets of Shao and Kingsford (TCBB, 2017) show that all safe paths for minimum flow decompositions correctly recover up to 90% of the full RNA transcripts, which is at least 25% more than previously known safe paths, such as (Caceres et al. TCBB, 2021), (Zheng et al., RECOMB 2021), (Khan et al., RECOMB 2022, ESA 2022). Moreover, despite the NP-hardness of the problem, we can report all safe paths for 99.8% of the over 27,000 non-trivial graphs of this dataset in only 1.5 hours. Our results suggest that, on perfect data, there is less ambiguity than thought in the notoriously hard RNA assembly problem. Availability: https://github.com/algbio/mfd-safety

翻译：生物信息学( 例如, 组装或多条路径) 中的许多重要问题都承认多种解决方案, 而最终目标是只报告一个。处理这种不确定性的一个共同方法是找到所有解决方案都共有的安全部分解决方案( 比如, contigs ) 。以往的安全研究侧重于多种时间可溶解的问题, 而许多成功和自然模型则很难解决, 使得这类问题缺乏“ 安全工具 ” 。我们建议了计算所有安全解决方案的安全性解决方案的第一种方法: 为NP- 硬问题, 最小流分解。我们通过开发一个基于通用 Integer 线性程序( ILP) 的路径的“ 安全测试 ” 获得结果。此外, 我们提供实际的优化, 旨在减少整个 ILP 时间, 而最高效的这些模式是基于一个循环集体测试程序。结果: Shao 和 Kingsford ( TCB, 2017) 的正统数据集数据存储器的实验结果显示, 最低流 20- RB 的 20 和 RNB 数据流的精确性, 数据在 20- recommal 上正确恢复到 20% 。