Sequence comparison is a basic task to capture similarities and differences between two or more sequences of symbols, with countless applications such as in computational biology. An alignment is a way to compare sequences, where a giving scoring function determines the degree of similarity between them. Many scoring functions are obtained from scoring matrices. However,not all scoring matrices induce scoring functions which are distances, since the scoring function is not necessarily a metric. In this work we establish necessary and sufficient conditions for scoring matrices to induce each one of the properties of a metric in weighted edit distances. For a subset of scoring matrices that induce normalized edit distances, we also characterize each class of scoring matrices inducing normalized edit distances. Furthermore, we define an extended edit distance, which takes into account a set of editing operations that transforms one sequence into another regardless of the existence of a usual corresponding alignment to represent them, describing a criterion to find a sequence of edit operations whose weight is minimum. Similarly, we determine the class of scoring matrices that induces extended edit distances for each of the properties of a metric.
翻译:序列比较是获取两个或两个以上符号序列之间的相似和差异的基本任务,有无数的应用程序,如计算生物学。对齐是比较序列的一种方法,给定的评分功能决定它们之间的相似程度。许多评分功能来自评分矩阵。但并非所有评分矩阵都产生距离的评分功能,因为评分函数不一定是衡量尺度。在这项工作中,我们为评分矩阵创造必要和充分的条件,以诱导加权编辑距离中度量度的每个特性。对于一组促成正常编辑距离的评分矩阵,我们也给每一类评分矩阵定性。此外,我们定义了扩大的编辑距离,其中考虑到一套编辑操作,将一个序列转换成另一个序列,而不论是否存在通常的相应校正来代表它们,描述找到一个最轻的编辑操作序列的标准。同样,我们确定一个评分矩阵的类别,为每个指标特性带来延长的编辑距离。</s>