Some genes can change their relative locations in a genome. Thus for different individuals of the same species, the orders of genes might be different. Such jumping genes are called transposons. A practical problem is to determine transposons in given gene sequences. Through an intuitive rule, we transform the biological problem of determining transposons into a rigorous mathematical problem of determining the longest common subsequence. Depending on whether the gene sequence is linear (each sequence has a fixed head and tail) or circular (we can choose any gene as the head, and the previous one is the tail), and whether genes have multiple copies, we classify the problem of determining transposons into four scenarios: (1) linear sequences without duplicated genes; (2) circular sequences without duplicated genes; (3) linear sequences with duplicated genes; (4) circular sequences with duplicated genes. With the help of graph theory, we design fast algorithms for different scenarios. We also derive some results that might be of theoretical interests in combinatorics.
翻译:一些基因可以在基因组中改变其相对位置。 因此,对于同一物种的不同个体来说,基因的顺序可能不同。 这种跳跃基因被称为转基因人。 一个实际的问题是如何确定特定基因序列中的转基因人。 我们通过直觉规则,将确定转基因人的生物学问题转化成一个严格的数学问题,确定最长的共同子序列。取决于基因序列是线性(每个序列都有固定头部和尾部)还是圆形(我们可以选择任何基因作为头部,而前一个序列是尾部),以及基因是否有多个副本,我们把确定转基因的问题分为四种假设:(1) 没有复制基因的线性序列;(2) 没有复制基因的循环序列;(3) 带复制基因的线性序列;(3) 有复制基因的圆性序列;(4) 有复制基因的圆性序列。在图表理论的帮助下,我们为不同情景设计快速算法。我们还得出了一些可能具有理论意义的结果。