Effective selection of the potential candidates that meet certain conditions in a tremendously large search space has been one of the major concerns in many real-world applications. In addition to the nearly infinitely large search space, rigorous evaluation of a sample based on the reliable experimental or computational platform is often prohibitively expensive, making the screening problem more challenging. In such a case, constructing a high-throughput screening (HTS) pipeline that pre-sifts the samples expected to be potential candidates through the efficient earlier stages, results in a significant amount of savings in resources. However, to the best of our knowledge, despite many successful applications, no one has studied optimal pipeline design or optimal pipeline operations. In this study, we propose two optimization frameworks, applying to most (if not all) screening campaigns involving experimental or/and computational evaluations, for optimally determining the screening thresholds of an HTS pipeline. We validate the proposed frameworks on both analytic and practical scenarios. In particular, we consider the optimal computational campaign for the long non-coding RNA (lncRNA) classification as a practical example. To accomplish this, we built the high-throughput virtual screening (HTVS) pipeline for classifying the lncRNA. The simulation results demonstrate that the proposed frameworks significantly reduce the effective selection cost per potential candidate and make the HTS pipelines less sensitive to their structural variations. In addition to the validation, we provide insights on constructing a better HTS pipeline based on the simulation results.
翻译:在一个巨大的搜索空间中,符合某些条件的潜在候选人的有效选择一直是许多现实世界应用的主要关切之一。除了几乎无限的搜索空间外,对可靠实验或计算平台的样本的严格评估往往过于昂贵,使筛选问题更具挑战性。在这种情况下,建立一个高通量筛选(HTS)管道,通过高效的早期阶段预先筛选样本,预计这些样本将成为潜在候选人,从而节省大量资源。然而,根据我们的知识,尽管有许多成功的应用,但没有人研究最佳管道设计或最佳管道操作。在本研究中,我们提出两个优化框架,适用于大多数(如果不是全部)涉及实验或/计算评估的筛选运动,以便以最佳方式确定高通量筛选管道的筛选阈值。我们验证关于分析性和实际情景的拟议框架。我们尤其认为,为长期非编码的RNA(IncRNA)分类的最佳计算运动提供了一个实际例子。为了达到这一目的,我们建立了两种优化框架,即适用于大多数(如果不是所有的话)涉及试验或计算评估的筛选活动,以便最佳地确定HTS管道的筛选阈值阈值阈值阈值阈值。我们建立了一种更好的虚拟模拟模型,从而大大降低候选人的模拟成本。