In this paper, we study for the first time the Diverse Longest Common Subsequences (LCSs) problem under Hamming distance. Given a set of a constant number of input strings, the problem asks to decide if there exists some subset $\mathcal X$ of $K$ longest common subsequences whose diversity is no less than a specified threshold $\Delta$, where we consider two types of diversities of a set $\mathcal X$ of strings of equal length: the Sum diversity and the Min diversity defined as the sum and the minimum of the pairwise Hamming distance between any two strings in $\mathcal X$, respectively. We analyze the computational complexity of the respective problems with Sum- and Min-diversity measures, called the Max-Sum and Max-Min Diverse LCSs, respectively, considering both approximation algorithms and parameterized complexity. Our results are summarized as follows. When $K$ is bounded, both problems are polynomial time solvable. In contrast, when $K$ is unbounded, both problems become NP-hard, while Max-Sum Diverse LCSs problem admits a PTAS. Furthermore, we analyze the parameterized complexity of both problems with combinations of parameters $K$ and $r$, where $r$ is the length of the candidate strings to be selected. Importantly, all positive results above are proven in a more general setting, where an input is an edge-labeled directed acyclic graph (DAG) that succinctly represents a set of strings of the same length. Negative results are proven in the setting where an input is explicitly given as a set of strings. The latter results are equipped with an encoding such a set as the longest common subsequences of a specific input string set.
翻译:暂无翻译