We design alignment-free techniques for comparing a sequence or word, called a target, against a set of words, called a reference. A target-specific factor of a target $T$ against a reference $R$ is a factor $w$ of a word in $T$ which is not a factor of a word of $R$ and such that any proper factor of $w$ is a factor of a word of $R$. We first address the computation of the set of target-specific factors of a target $T$ against a reference $R$, where $T$ and $R$ are finite sets of sequences. The result is the construction of an automaton accepting the set of all considered target-specific factors. The construction algorithm runs in linear time according to the size of $T\cup R$. The second result consists of the design of an algorithm to compute all the occurrences in a single sequence $T$ of its target-specific factors against a reference $R$. The algorithm runs in real-time on the target sequence, independently of the number of occurrences of target-specific factors.
翻译:我们设计了一种基于无需比对的技术,来比较一个称为目标(target)的序列或字符串与一组称为参考(reference)的词语。目标序列$T$对于参考序列$R$的特异性因子是指对于一个单词中的因子$w$,它既不是$T$中单词的因子,也不是$R$中单词的因子,且满足$w$的任何真因子都是$R$中某个单词的因子。我们首先解决了计算一个目标$T$相对于参考$R$的目标特异性因子集合的问题,其中$T$和$R$都是有限的序列集合。其结果是构造出一个可接收所有考虑的目标特异性因子的自动机。构造算法根据$T\cup R$的大小线性运行。第二个结果是设计一个算法,用于计算单一序列$T$中其目标特异性因子相对于$R$的出现情况。该算法在目标序列中实时运行,不受目标特异性因子出现次数的影响。