Pattern detection and string matching are fundamental problems in computer science and the accelerated expansion of bioinformatics and computational biology have made them a core topic for both disciplines. The SARS-CoV-2 pandemic has made such problems more demanding with hundreds or thousands of new genome variants discovered every week, because of constant mutations, and there is a desperate need for fast and accurate analyses. The requirement for computational tools for genomic analyses, such as sequence alignment, is very important, although, in most cases the resources and computational power required are enormous. The presented Multiple Genome Analytics Framework combines data structures and algorithms, specifically built for text mining and pattern detection, that can help to efficiently address several computational biology and bioinformatics problems concurrently with minimal resources. A single execution of advanced algorithms, with space and time complexity O(nlogn), is enough to acquire knowledge on all repeated patterns that exist in multiple genome sequences and this information can be used from other meta-algorithms for further meta-analyses. The potential of the proposed framework is demonstrated with the analysis of more than 300,000 SARS-CoV-2 genome sequences and the detection of all repeated patterns with length up to 60 nucleotides in these sequences. These results have been used to provide answers to questions such as common patterns among all variants, sequence alignment, palindromes and tandem repeats detection, different organism genome comparisons, polymerase chain reaction primers detection, etc.
翻译:计算机科学以及生物信息学和计算生物学的加速扩展是计算机科学和生物信息学和计算生物学的根本性问题。SARS-COV-2大流行使这类问题更加艰巨,因为由于不断的突变,每周每星期都会发现数百或数千个新的基因变异变体,因为不断突变,因此每周都发现数百或数千个新的基因变异,因此,SARS-COV-2大流行使这类问题更加艰巨,由于不断突变,迫切需要快速和准确的分析。对于基因组分析的计算工具,如序列调整等的计算工具的要求非常重要,尽管在大多数情况下,所需要的资源和计算能力是巨大的。提出的多基因组分析框架将数据结构和算法结合了数据结构和算法,特别是用于文本采矿和模式探测的文本和模式探测,有助于有效解决与最少的资源同时发现数或数千个新的基因组变异基因组问题。单项的先进算算算,加上空间和时间复杂的O(nlognlogn),足以获得关于多种基因序列中存在的所有重复模式的知识,这种信息可以从其他元-数值测算法用于进一步的进一步进行元分析分析。拟议的框架的潜力通过对60至60以上(S-CS-C-C-CO-C-C-C-CO-C-C-C-C-C-C-C-C-C-C-C-C-C-CV-2-C-C-C-C-CL-2-2-II)的分析、使用这些序列、这些序列、这些序列的反复测序、这些序列的探测提供这些序列的反复的、这些序列的探测和结果提供所有的、提供这些结果的探测和结果提供所有的、提供这些结果的反复的解、提供这些序列的探测和结果,提供所有的解、提供所有的、提供这些结果的、提供所有的解、提供所有的解的解、提供、提供、提供、提供所有的、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供、提供这些的、提供这些的、提供这些的、提供这些的、提供这些的、提供这些的、提供这些的、提供这些的、提供这些